How to Create a Multimodal Copilot Studio Agent that Analyzes File Attachments using AI Builder
This is a continuation of my previous article, where I’ll demonstrate how to build a multimodal agent in Copilot Studio that can read and analyze conversation attachments.
This is about asking questions about files attached to conversations using AI Builder (GPT4o).
Image Analysis is Already Available
Actually, image analysis functionality is already available, though still in preview.
However, this feature requires turning on [Generative AI], which is not yet available in non-English Copilot. That’s why in this article, I’ll create an agent that can read images and PDFs by combining Copilot Studio with AI Builder.
Just like in the previous example, we’ll only target the first attached file.
*Note: If you want to handle multiple files, treat them as a Table type or use Foreach.
Creating the Prompt Action
Create a new prompt from [AI hub] > [Prompts],
Add an appropriate prompt and “Image or Document” input.
*Note: The accuracy of this prompt has not been verified
This completes the creation of the prompt action.
Building Power Automate Flow
According to the official documentation, at the time of writing this article, Copilot Studio doesn’t yet support “File” format inputs when calling prompt actions, so we’ll need to go through Power Automate.
Create a flow that adds the user’s message and data string as arguments, and passes them directly to AI Builder. Remember to convert the data string to binary using the base64ToBinary function.
Then set the received message as the return value, and the Power Automate flow construction is complete.
Building in Copilot Studio
We’ll start the conversation using “Conversation boosting” (formerly known as Conversation boosting).
*Note: This example only assumes a single conversation. For multiple ongoing conversations, additional testing would be required.
When a conversation begins, get the number of attached files,
If attachments exist, redirect to a dedicated topic.
Within the redirected topic, retrieve the contentUrl as described in the previous article,
Extract the data portion using Split and Index functions.
Then add the Power Automate flow we created earlier, and send the user’s message (Activity.Text) along with the file data portion.
Finally, display the message returned from Power Automate to complete the implementation.
Testing the Implementation
First, I sent an image, and it successfully identified it as a cat.
As of April 18, 2025, AI Builder’s file input supports “PNG, JPG, JPEG, PDF” formats, so I created this PDF,
When I asked a question about it, it provided an appropriate answer.
If you want to enable reading of Excel, PowerPoint, or other file formats, you would need to integrate with Document Intelligence or similar services.
コメント