How to Create a Multimodal Copilot Studio Agent that Analyzes File Attachments using AI Builder

This is a continuation of my previous article, where I’ll demonstrate how to build a multimodal agent in Copilot Studio that can read and analyze conversation attachments.

For reference, here’s my previous article:

How to Retrieve and Store File Attachments from Copilot Studio Conversations to SharePoint Online with Power Automate

Here's a note on how to retrieve the content of files attached to conversations.What I Want to Achieve This is about ret...

What I Want to Achieve
Image Analysis is Already Available
Implementation
Testing the Implementation
Related Articles

What I Want to Achieve

This is about asking questions about files attached to conversations using AI Builder (GPT4o).

Screenshot showing a file attachment in a Copilot conversation

Image Analysis is Already Available

Actually, image analysis functionality is already available, though still in preview.

Screenshot showing image analysis options in Copilot Studio

However, this feature requires turning on [Generative AI], which is not yet available in non-English Copilot. That’s why in this article, I’ll create an agent that can read images and PDFs by combining Copilot Studio with AI Builder.

Allow image input from users, and image analysis (preview) - Microsoft Copilot Studio

Allow your Microsoft Copilot Studio agent to analyze images that users upload during conversations with the agent.

Implementation

Just like in the previous example, we’ll only target the first attached file.
*Note: If you want to handle multiple files, treat them as a Table type or use Foreach.

Creating the Prompt Action

Create a new prompt from [AI hub] > [Prompts],

Screenshot showing AI hub navigation to create a new prompt

Screenshot showing the prompt creation screen

Add an appropriate prompt and “Image or Document” input.
*Note: The accuracy of this prompt has not been verified

Screenshot showing prompt configuration with image/document input option

This completes the creation of the prompt action.

Building Power Automate Flow

According to the official documentation, at the time of writing this article, Copilot Studio doesn’t yet support “File” format inputs when calling prompt actions, so we’ll need to go through Power Automate.

プロンプトにテキスト、画像、またはドキュメント入力を追加する

プロンプトにテキスト、画像、またはドキュメント入力を追加する方法を学習します。

Create a flow that adds the user’s message and data string as arguments, and passes them directly to AI Builder. Remember to convert the data string to binary using the base64ToBinary function.

Screenshot showing Power Automate flow configuration with base64ToBinary conversion for AI Builder

Then set the received message as the return value, and the Power Automate flow construction is complete.

Screenshot showing the return value configuration in Power Automate flow

Building in Copilot Studio

We’ll start the conversation using “Conversation boosting” (formerly known as Conversation boosting).
*Note: This example only assumes a single conversation. For multiple ongoing conversations, additional testing would be required.

When a conversation begins, get the number of attached files,

Screenshot showing how to get the number of attachments in a conversation

If attachments exist, redirect to a dedicated topic.

Screenshot showing the condition to redirect to a dedicated topic when attachments are present

Within the redirected topic, retrieve the contentUrl as described in the previous article,

Screenshot showing how to retrieve the contentUrl from the attachment

Extract the data portion using Split and Index functions.

Screenshot showing the use of Split and Index functions to extract the data portion

Then add the Power Automate flow we created earlier, and send the user’s message (Activity.Text) along with the file data portion.

Screenshot showing how to call the Power Automate flow with the message and file data

Finally, display the message returned from Power Automate to complete the implementation.

Screenshot showing how to display the response returned from Power Automate

Testing the Implementation

First, I sent an image, and it successfully identified it as a cat.

Screenshot showing the agent correctly identifying a cat image in the conversation

As of April 18, 2025, AI Builder’s file input supports “PNG, JPG, JPEG, PDF” formats, so I created this PDF,

Screenshot showing a sample PDF document that was created for testing

When I asked a question about it, it provided an appropriate answer.

Screenshot showing the agent correctly answering questions about the PDF content

If you want to enable reading of Excel, PowerPoint, or other file formats, you would need to integrate with Document Intelligence or similar services.

How to Retrieve and Store File Attachments from Copilot Studio Conversations to SharePoint Online with Power Automate

Here's a note on how to retrieve the content of files attached to conversations.What I Want to Achieve This is about ret...

How to Use Prompt Actions with AI Builder in Copilot Studio: Complete Tutorial for Generating AI Responses

This article explains how to use AI Builder's "Prompts" from Copilot Studio.Using Generative AI from Copilot Studio Ther...

How to Use the Generative Response Node in Copilot Studio: Detailed Tutorial for AI-Generated Answers

This article explains one of Copilot Studio's standard features for utilizing generative AI: the "Generative response" n...

Mastering Power Fx in Copilot Studio: Using Variables in Formulas and Key Functions

How to use topic variables and global variables within formulas in Copilot Studio.Formulas (Power Fx Expressions) In Cop...

Using Foreach Loops in Copilot Studio Conversations: Simplify Table and Array Operations

While researching loops in Copilot Studio, I came across an interesting article that I'd like to share.How to Create Loo...