Implementing Self-RAG with Copilot Studio: Advanced RAG Techniques for Better AI Responses

Following our previous implementation of Naive RAG in Copilot Studio, this time we’ll explore “Self-RAG,” one of the Advanced RAG techniques.

※For reference to our previous article:

Building a Basic RAG System with Copilot Studio AI Search: A Step-by-Step Guide

Building upon our previous work with AI Search, I implemented a Naive RAG (Retrieval-Augmented Generation) system in Cop...

Self-RAG
Implementing Self-RAG in Copilot Studio
Implementation
Testing Results
Copilot Studio Features Used in This Article

Self-RAG

Self-RAG is a methodology developed around October 2023, designed to improve response quality and reduce hallucinations.

Here’s a high-level overview of how Self-RAG works:

Determine whether information retrieval is necessary (if not, generate response directly)
If retrieval is needed, fetch multiple documents and evaluate their relevance to the question
Generate responses based on relevant documents
Evaluate each response and synthesize the final answer

Ideally, Self-RAG involves fine-tuning LLMs to create separate “critic” and “generator” models. However, since that level of customization isn’t feasible in our context, we’ll utilize GPT4o for all these functions.

Implementing Self-RAG in Copilot Studio

Drawing inspiration from two reference implementations, I’ve designed the following workflow. Since current models have significantly larger input token limits compared to when Self-RAG was first conceived (with GPT-3.5), we can now generate responses for multiple documents simultaneously.

Diagram showing Self-RAG implementation workflow in Copilot Studio

Advanced RAG Techniques | Pinecone

We explore several advanced RAG techniques and demonstrate an implementation that draws on the lessons learned from each...

langgraph/examples/rag/langgraph_self_rag.ipynb at main · langchain-ai/langgraph

Build resilient language agents as graphs. Contribute to langchain-ai/langgraph development by creating an account on Gi...

Implementation

Since our main focus is on building Self-RAG in Copilot Studio, we’ll prioritize the implementation over optimizing accuracy (such as prompt engineering).

Trigger and Variable Declaration

Select the “On Redirect” trigger and declare the variable “excluded_keywords.” This variable stores query terms that should be avoided when generating search queries (based on previous searches that failed to find relevant documents).

Screenshot showing trigger setup and variable declaration

Determining Search Necessity

Next, implement the “search necessity evaluation” component.

Screenshot showing search necessity evaluation flow

The evaluation is implemented using a prompt action.

Screenshot showing prompt action configuration

Configure the prompt as shown, specify JSON output format, and select GPT4o as the model.

Screenshot showing prompt settings and model selection

Your task is to determine whether a user's question requires external knowledge retrieval or not. Use the following criteria to make your decision:
### Decision Criteria:
1. **No retrieval required**:
   - If the question can be answered confidently using only your pre-existing knowledge, classify it as "no retrieval required."
   - Examples: Definitions, general knowledge, basic calculations, or simple reasoning tasks.

2. **Retrieval required**:
   - Classify the question as "retrieval required" if it meets any of the following conditions:
     - The question requires up-to-date information (e.g., recent events or news).
     - The question relates to specific domain knowledge (e.g., legal, medical, or technical details) that may not be fully covered by your internal knowledge.
     - The question explicitly references external resources (e.g., specific documents, websites, or datasets).
     - Your internal knowledge alone is insufficient to provide a comprehensive or accurate answer.

### Output Format:
Provide your answer in the following format:
- **"search_required": "yes"** (if retrieval is needed)
- **"search_required": "no"** (if retrieval is not needed)
Here is the user's question:
Question:  {question}

Respond with the required output format only, without any additional explanation or context.

If the evaluation determines that search is unnecessary, GPT4o generates a direct response and ends the process.

Screenshot showing direct response generation flow

If search is deemed necessary, the system generates search queries,

Screenshot showing search query generation

and performs the search using AI Search.

Screenshot showing AI Search implementation

Note: The Retrieval topic is a simple component that executes searches against AI Search using the received query and returns the results.

Screenshot showing Retrieval topic configuration

Relevance Evaluation

Next, evaluate the relevance of the retrieved documents.

Configure the prompt as shown, specify JSON output format, and use GPT4o as the model. Note: While we’re using a binary yes/no evaluation here, implementing a scoring system with thresholds might be more effective.

Screenshot showing prompt configuration for relevance evaluation

You are an evaluator tasked with determining the relevance of a retrieved document to a user question.
This assessment does not require overly strict criteria, but the goal is to exclude clearly irrelevant documents.
If the document directly answers the user question, provides supporting information, or includes keywords/semantic meaning clearly related to the question, grade it as relevant.
If the document is unrelated, off-topic, or too vague to establish a clear connection to the user question, grade it as not relevant.

Respond with a binary score:
Output yes if the document is relevant.
Output no if the document is irrelevant.

# user question : {question}
# documents : 
{docs}

If documents are deemed irrelevant, add the unsuccessful search query to our variables and restart from query generation.

Screenshot showing handling of irrelevant document cases

If relevance is confirmed, proceed to response generation.

Response Generation and Answer Validation

Next, generate responses from documents deemed relevant and evaluate whether these responses adequately answer the original question.

Screenshot showing response generation and validation flow

First, generate responses from the documents,

Screenshot showing response generation configuration

Then evaluate the generated responses.

Screenshot showing response evaluation setup

Configure the prompt as shown, specify JSON output format, and use GPT4o as the model.

Screenshot showing prompt configuration for response evaluation

You are an evaluator tasked with assessing whether a generated answer appropriately addresses or resolves a user's question.
If the answer directly resolves the question, provides accurate and sufficient information, or effectively addresses the intent behind the question, grade it as yes.
If the answer is incomplete, vague, inaccurate, off-topic, or fails to address the intent of the user question, grade it as no.

Respond with a binary score:
Output yes if the answer resolves the question.
Output no if the answer does not resolve the question.

# User question : {question}
# LLM generation answer : {generation}

If the response is deemed relevant, present it to the user and end the process.

Screenshot showing relevant response handling

If the response is deemed irrelevant, add the unsuccessful search query to our variables and restart from query generation. Note: While we could just retry response generation, we restart from the search phase since the document retrieval itself might have been suboptimal.

Screenshot showing handling of irrelevant responses

This completes the topic implementation.

Important Note: Without setting a maximum iteration limit for the “search query retry” process, there’s a risk of entering an infinite loop. While we’ve omitted this for this demonstration, it’s crucial to implement such safeguards in production environments.

Optional: Integration with Conversational Boosting

Finally, as in our previous implementation, complete the setup by connecting to the system topic’s “Conversational boosting.”

Screenshot showing Conversational boosting integration

Testing Results

First, we tested with the same question that worked in our previous implementation. The system provided an accurate response.

Screenshot showing successful response to previously answered question

Next, we tried the question that failed in our previous implementation. After several search iterations,

Screenshot showing multiple search iterations

the system successfully generated a response.

Screenshot showing successful response generation

As a bonus, the system also handled questions that don’t require search effectively.

Screenshot showing successful handling of non-search questions

These results confirm the improved accuracy of our implementation. In the next article, I’d like to experiment with CRAG and other advanced techniques.

For reference to our previous implementation:

Building a Basic RAG System with Copilot Studio AI Search: A Step-by-Step Guide

Building upon our previous work with AI Search, I implemented a Naive RAG (Retrieval-Augmented Generation) system in Cop...

Copilot Studio Features Used in This Article

How to Configure SharePoint Document Libraries in Azure AI Search (Preview)

A step-by-step guide on integrating SharePoint with Azure AI Search.Adding SharePoint Online as a Data Source to AI Sear...

How to Use SharePoint Knowledge Base in Copilot Studio for RAG Implementation

As I begin learning about Copilot Studio, I'm documenting my findings.In this article, I'll explain how to generate resp...

How to Use Prompt Actions with AI Builder in Copilot Studio: Complete Tutorial for Generating AI Responses

This article explains how to use AI Builder's "Prompts" from Copilot Studio.Using Generative AI from Copilot Studio Ther...

Mastering Power Fx in Copilot Studio: Using Variables in Formulas and Key Functions

How to use topic variables and global variables within formulas in Copilot Studio.Formulas (Power Fx Expressions) In Cop...

AI Builder 「GPTでプロンプトを使用してテキストを作成する」に、構造化出力（JSON出力）が追加された

AIBuilderに構造化出力のような機能が追加されたので試してみた。JSON出力とはJSON出力（構造化出力）は、ざっくりいうと「LLMの回答を、指定したJSON形式のデータに指定する」こと。今までは以下のようなプロンプトで出力を指定して...