How to Create a Multimodal Copilot Studio Agent that Analyzes File Attachments using AI Builder

This is a continuation of my previous article, where I’ll demonstrate how to build a multimodal agent in Copilot Studio that can read and analyze conversation attachments.

スポンサーリンク

What I Want to Achieve

This is about asking questions about files attached to conversations using AI Builder (GPT4o).
Screenshot showing a file attachment in a Copilot conversation

Image Analysis is Already Available

Actually, image analysis functionality is already available, though still in preview. Screenshot showing image analysis options in Copilot Studio

However, this feature requires turning on [Generative AI], which is not yet available in non-English Copilot. That’s why in this article, I’ll create an agent that can read images and PDFs by combining Copilot Studio with AI Builder.

Implementation

Just like in the previous example, we’ll only target the first attached file.
*Note: If you want to handle multiple files, treat them as a Table type or use Foreach.

Creating the Prompt Action

Create a new prompt from [AI hub] > [Prompts],
Screenshot showing AI hub navigation to create a new prompt
Screenshot showing the prompt creation screen
Add an appropriate prompt and “Image or Document” input.
*Note: The accuracy of this prompt has not been verified
Screenshot showing prompt configuration with image/document input option

This completes the creation of the prompt action.

Building Power Automate Flow

According to the official documentation, at the time of writing this article, Copilot Studio doesn’t yet support “File” format inputs when calling prompt actions, so we’ll need to go through Power Automate.
プロンプトにテキスト、画像、またはドキュメント入力を追加する
プロンプトにテキスト、画像、またはドキュメント入力を追加する方法を学習します。
Create a flow that adds the user’s message and data string as arguments, and passes them directly to AI Builder. Remember to convert the data string to binary using the base64ToBinary function.
Screenshot showing Power Automate flow configuration with base64ToBinary conversion for AI Builder
Then set the received message as the return value, and the Power Automate flow construction is complete.
Screenshot showing the return value configuration in Power Automate flow

Building in Copilot Studio

We’ll start the conversation using “Conversation boosting” (formerly known as Conversation boosting).
*Note: This example only assumes a single conversation. For multiple ongoing conversations, additional testing would be required.

When a conversation begins, get the number of attached files,
Screenshot showing how to get the number of attachments in a conversation
If attachments exist, redirect to a dedicated topic.
Screenshot showing the condition to redirect to a dedicated topic when attachments are present
Within the redirected topic, retrieve the contentUrl as described in the previous article,
Screenshot showing how to retrieve the contentUrl from the attachment
Extract the data portion using Split and Index functions.
Screenshot showing the use of Split and Index functions to extract the data portion
Then add the Power Automate flow we created earlier, and send the user’s message (Activity.Text) along with the file data portion.
Screenshot showing how to call the Power Automate flow with the message and file data
Finally, display the message returned from Power Automate to complete the implementation.
Screenshot showing how to display the response returned from Power Automate

Testing the Implementation

First, I sent an image, and it successfully identified it as a cat.
Screenshot showing the agent correctly identifying a cat image in the conversation
As of April 18, 2025, AI Builder’s file input supports “PNG, JPG, JPEG, PDF” formats, so I created this PDF,
Screenshot showing a sample PDF document that was created for testing
When I asked a question about it, it provided an appropriate answer.
Screenshot showing the agent correctly answering questions about the PDF content

If you want to enable reading of Excel, PowerPoint, or other file formats, you would need to integrate with Document Intelligence or similar services.

Related Articles

コメント

Copied title and URL