How to Implement Advanced RAG Using AI Search in Copilot Studio: A CRAG Tutorial

Having previously built a “Self-RAG” in Copilot Studio, this time I’ll implement “CRAG.”

スポンサーリンク

CRAG

CRAG (Corrective Retrieval Augmented Generation) is a RAG technique proposed around February 2024 that offers the advantage of reducing hallucinations compared to traditional RAG (Retrieval Augmented Generation).
The key characteristic of CRAG is the use of a Retrieval Evaluator to assess the relevance between retrieved documents and the user’s question.

This retrieval evaluation classifies documents into three categories: “Correct (highly relevant),” “Incorrect (low relevance),” and “Ambiguous (difficult to determine),” and takes the following actions for each:

  • Correct (highly relevant): Generate responses using the documents
  • Incorrect (low relevance): Generate responses from web searches without using the retrieved documents
  • Ambiguous (difficult to determine): Generate responses using both the retrieved documents and web search results

Building CRAG in Copilot Studio

Here’s the flow we’ll be implementing. We’ll use GPT4o for search evaluation and SerpAPI for web searches.
Diagram showing the CRAG implementation flow with retrieval evaluation steps

Implementation

Since our goal is simply to demonstrate how to build CRAG in Copilot Studio, we won’t focus on optimizing accuracy (such as refining prompts).

SharePoint Search

First, we’ll search for documents from SharePoint.
Screenshot showing SharePoint search configuration in Copilot Studio
We’ll use a prompt action to convert the user’s question into search terms.
Screenshot showing prompt action configuration for converting questions to search terms
The conversion prompt is reused from the previous example.
*Note: “Excluded Keywords” remains here only because we reused the prompt from the previous (Self-RAG) implementation. It’s not actually necessary for this purpose.
Screenshot showing the conversion prompt configuration with excluded keywords field
Here are examples of user questions and their corresponding search queries. Use these examples to guide your transformation of the input question.
Example Input 1: "How to connect to a database with Python?"
Example Output: "Python database connection method"
Example Input 2: "What impact does climate change have on ecosystems?"
Example Output: "Climate change ecosystem impact"

Additionally, if certain words are specified as "excluded keywords," ensure that these words are NOT included in the generated search query.
For example:
Excluded Keywords: ["Python connection"]
Example Input: "How to connect to a database with Python?"
Example Output: "Database connection method"

Now, based on these examples and the excluded keywords, transform the following user question into an effective search query, avoiding the specified excluded keywords.
Input: {input}
Excluded Keywords: {excluded_keywords}
Using the created search terms, we’ll search for SharePoint documents from AI Search.
Screenshot showing AI Search configuration to search SharePoint documents
Screenshot showing additional AI Search configuration settings

Document Relevance Evaluation

Next, we’ll score the relevance between the searched documents and the user’s question.
Diagram showing the document relevance evaluation process
First, execute a Foreach loop for all documents in the search results,
Screenshot showing Foreach loop configuration for processing search results
Within the loop, use a prompt action to determine the relevance score (0.0 to 1.0) for each document.
Screenshot showing prompt action configuration for relevance scoring inside the loop
Here’s what the prompt looks like.
Screenshot showing the relevance scoring prompt template
# Task 
Evaluate relevance between user question and retrieved document on a 0.0-1.0 scale

# Evaluation Criteria
## 1.0 - Document provides complete & direct answer
(Example: Contains specific numbers/dates/names matching query)
## 0.7-0.9 - Directly relevant but requires:
- Context synthesis OR
- Partial information extraction OR
- Terminology clarification
## 0.4-0.6 - Partial relevance through:
- Shared domain knowledge
- Indirect supporting evidence
- Related concepts without direct answer
## 0.1-0.3 - Barely relevant with only:
- Common keywords
- Generic domain overlap
- Indirect conceptual connections
## 0.0 - No semantic/contextual connection

# Output Format
- Relevance Score: [Strictly 0.0-1.0 numeric value only]

# Input Data
Query: {question}
Retrieved Document:
{document}
Finally, create an object array “Documents” with a new column “Score” added to the searched documents.
Screenshot showing the creation of a Documents array with score column

Checking Relevance Scores

Once the relevance score evaluation is complete, get the maximum score and determine whether to perform a web search or generate an answer directly.
Diagram showing the relevance score checking process and decision flow
Use the Max function to get the maximum value from the “score” column in the Documents we created earlier,
Screenshot showing the Max function used to find the highest relevance score
And use a condition to branch subsequent processing.
Screenshot showing the conditional branching based on relevance score

Case 1: Maximum Score is 0.8 or Higher →Generate Response from Highly Relevant Documents

If the maximum score is 0.8 or higher, extract only the highly relevant documents (those with scores of 0.8 or higher) and generate the response.
Diagram showing the flow for generating responses from highly relevant documents
From the Documents object array, use the Filter function to extract documents with scores of 0.8 or higher, and assign them to the variable “k_in”,
Screenshot showing the Filter function to extract documents with scores of 0.8 or higher
Then use this “k_in” as an argument when generating the response.
Screenshot showing k_in being used as an argument for response generation

Case 2: Maximum Score is Below 0.8 →Generate Response from Web Search and Moderately Relevant Documents

If the maximum score is less than 0.8, perform either “Ambiguous” (documents with scores of 0.4 or higher + Web search) or “Incorrect” (Web search only) processing.
Diagram showing the flow for generating responses when document relevance is low or ambiguous
First, extract documents with scores of 0.4 or higher and assign them to the variable “k_in”.
*Note: In the “Incorrect” case (Web search only), k_in will be empty.
Screenshot showing the extraction of documents with scores of 0.4 or higher
Next, perform a Web search (SerpAPI call) via Power Automate.
Screenshot showing Power Automate configuration for calling SerpAPI
SerpAPI is an API that can perform Web searches. In this case, we’ll use “related_question” and “organic_results” from the obtained information as RAG targets.
*Note: Scraping the sites obtained from web searches would increase accuracy, but we’ll skip that step here.
*Note: Ideally, we would regenerate search terms specifically for web searches, but we’ll skip that as well.
Screenshot showing SerpAPI response with related_question and organic_results
Use both k_in and the web search results as arguments for response generation.
Screenshot showing both k_in and web search results being used as arguments for response generation

Answer Generation

At the end, use a prompt action to run RAG with GPT-4o, using the information obtained so far as arguments.

If the query cannot be answered based on the provided context, respond with 'The provided information is not sufficient to answer this question.'  
Otherwise, generate a complete and accurate answer, and as an annotation, output the name of the referenced files at the end.

# query
 {query}

# information
 {information}

Testing the Implementation

First, let’s try asking a question that even simple RAG (Naive RAG) could answer,
Screenshot showing a test question that simple RAG could answer
A document with a relevance score of 1 (maximum) is found,
Screenshot showing a document with maximum relevance score of 1
And an answer is generated from this document.
Screenshot showing the answer generated from the highly relevant document
Next, let’s try asking a question that simple RAG couldn’t answer,
Screenshot showing a test question that simple RAG couldn't answer
Although the maximum score is 0, it still generates a correct answer.
Screenshot showing correct answer generated despite maximum relevance score of 0
This is because the web search results contain the correct information, demonstrating the benefit of CRAG in reducing hallucinations.
Screenshot showing web search results containing the correct information

So we’ve confirmed that CRAG successfully improves accuracy.

Since this solution relies on “web search,” it may not improve response accuracy for RAG systems searching internal documents, but it’s worth remembering this approach as it might be useful in certain scenarios.

Articles from the Last Two Sessions

Copilot Studio Features Used in This Article

コメント

Copied title and URL