How to Limit Copilot Studio Knowledge to Specific SharePoint Folders (and Why It Matters)

One of the biggest challenges in building RAG agents with Copilot Studio is retrieving the right information without noise.

Previously, adding a SharePoint site as a knowledge source meant exposing all documents to the agent. Now, you can finally specify Document Libraries and even Folders as data sources. In this guide, I will demonstrate how this granular control significantly improves answer accuracy.

スポンサーリンク

Generative Answers Knowledge

Copilot Studio has always allowed you to set an entire SharePoint site as a knowledge source (data source) for generative answers.

*For more details, see here: Setting SharePoint as Knowledge Source

Now, you can specify document libraries and even folders at a granular level. This is a game-changer for reducing “noise” in RAG responses. I tried out this new functionality right away.
Screenshot of the new SharePoint library selection in Copilot Studio

Preparation: Setting up the SharePoint Site for Knowledge

For this demonstration, I created a new SharePoint site to use as the knowledge source, and prepared two document libraries.
SharePoint site with two document libraries created
In one of the document libraries, I uploaded information related to Mr. Ichiro Tanaka.
Document library storing information about Mr. Ichiro Tanaka
In the other library, I placed information about Mr. Ichiro Sato in PDF format.
Document library with PDF information about Mr. Ichiro Sato

Experiment 1: Setting the Entire Site as Knowledge

First, let’s set the entire SharePoint site as knowledge and check how it works.

From [Add knowledge],
Knowledge addition screen
select SharePoint,
Select SharePoint as data source
enter the site URL, and click [Add].
Input SharePoint site URL
Provide a description and click [Add]. The knowledge source has now been added.
Add description for knowledge
Specify the site you just added as the knowledge source in the [Create Generative Answers] node,
Select knowledge source node
and when you ask about “Mr. Ichiro”, information related to Mr. Ichiro Tanaka appears.
Result showing Ichiro Tanaka's information

However, it didn’t retrieve information about Mr. Ichiro Sato.

Why did this happen?
When targeting an entire site, the search index might take longer to update (sometimes 4-6 hours), or the search relevance might prioritize the first library found. This highlights the importance of limiting the scope to ensure critical documents are found.

Next, let’s try specifying the second document library directly as knowledge.

Experiment 2: Setting a Document Library as Knowledge

As before, select SharePoint,
Select SharePoint as data source
This time, choose [Browse files].
Select 'Browse files' option
From Quick Access or another location, select the target document library, then click [Review selection].
Select document library and review selection
Enter a description and click [Add]. The document library is now added as knowledge.
Add description and complete knowledge addition
Now, specify only the newly added library as knowledge and ask the same question. This time, the information about “Mr. Sato” is retrieved successfully.
Successfully retrieved information about Mr. Sato

Experiment 3: Setting a Folder as Knowledge

Finally, let’s try specifying a folder as knowledge.

Add a folder to the previously used document library,
Adding a folder to the document library
and place information about “Ichiro Suzuki” inside the folder.
Placing Ichiro Suzuki's information into the folder
Return to Copilot Studio, select the folder you just created when adding knowledge, and click [Review selection].
Selecting the folder in Copilot Studio
When you ask the same question, only the information for Mr. Suzuki that was added appears as expected.
Only information about Ichiro Suzuki is retrieved

Important Limitations (Must Read)

While folder-level specification is powerful, keep these limitations in mind:

  • File Limits: Up to 1,000 files per knowledge source.
  • Folder Depth: Subfolders are supported, but extremely deep hierarchies might cause indexing delays.
  • File Size: Individual files should be under 3MB (for optimal performance) to 7MB.

By carefully managing and limiting the search scope for RAG—on a per agent or topic basis—you can greatly improve the accuracy of generated answers.

Related Articles

コメント

Copied title and URL