A step-by-step guide on integrating SharePoint with Azure AI Search.
Adding SharePoint Online as a Data Source to AI Search
While still in preview, Azure AI Search now supports indexing document libraries from SharePoint Online.
- File extension restrictions apply
- SharePoint lists are not supported
- ASPX files are excluded, among other limitations
I found the process of adding SharePoint to Azure AI Search somewhat challenging, so I’ve documented the steps here for reference.
Procedure
- Create a SharePoint site for search indexing
- Create an Azure AI Search resource
- Register an application in Entra ID
- Set up data source, index, and indexer
Detailed instructions below.
Step 0: Creating a SharePoint Site


Step 1: Creating an Azure AI Search Resource




Step 2: Register an Application in Entra ID














This completes the Entra ID registration process.
Step 3: Azure AI Search: Adding a Data Source
{
"name": "【Your desired data source name (example: sharepoint-datasource)】",
"type": "sharepoint",
"credentials": {
"connectionString":"SharePointOnlineEndpoint=【SPO site URL (up to ~/sites/site-name)】;ApplicationId=【App ID (Note 1)】;ApplicationSecret=【Secret (Note 2)】;"
},
"container": {
"name": "【Target document library (details explained later)】"
}
}
- defaultSiteLibrary: Indexes all content in the site’s default document library
- allSiteLibraries: Indexes all content across all document libraries within the site
- useQuery: Only indexes content defined in the “query” parameter
※For more information about queries, please refer to this documentation
Step 4: Creating the Index
Select [Add index (JSON)] from the [Indexes] section and specify the desired columns.
For this demonstration, we’ll use minimal settings (avoiding vector search for now as embeddings incur additional costs).
{
"name" : "sharepoint-index",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true, "searchable": false },
{ "name": "metadata_spo_item_name", "type": "Edm.String", "key": false, "searchable": true, "filterable": false, "sortable": false, "facetable": false },
{ "name": "metadata_spo_item_path", "type": "Edm.String", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
{ "name": "metadata_spo_item_content_type", "type": "Edm.String", "key": false, "searchable": false, "filterable": true, "sortable": false, "facetable": true },
{ "name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "key": false, "searchable": false, "filterable": false, "sortable": true, "facetable": false },
{ "name": "metadata_spo_item_size", "type": "Edm.Int64", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
{ "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
]
}
Step 5: Creating the Indexer
{
"name" : "sharepoint-indexer",
"dataSourceName" : "sharepoint-datasource",
"targetIndexName" : "sharepoint-index",
"parameters": {
"batchSize": null,
"maxFailedItems": null,
"maxFailedItemsPerBatch": null,
"base64EncodeKeys": null,
"configuration": {
"indexedFileNameExtensions" : ".txt, .pdf",
"excludedFileNameExtensions" : ".png, .jpg",
"dataToExtract": "contentAndMetadata",
"failOnUnsupportedContentType" : false,
"failOnUnprocessableDocument" : false
}
},
"fieldMappings" : [
{
"sourceFieldName" : "metadata_spo_site_library_item_id",
"targetFieldName" : "id",
"mappingFunction" : {
"name" : "base64Encode"
}
}
]
}
- indexedFileNameExtensions: Specify file extensions to be indexed (in this case, only PDF and txt files)
- excludedFileNameExtensions: Specify file extensions to be excluded from indexing (in this case, image files)
- failOnUnsupportedContentType: When set to false, the indexer will skip unsupported documents instead of stopping
- failOnUnprocessableDocument: When set to false, the indexer will ignore unidentifiable content

Testing the Implementation


I plan to use this search functionality to experiment with various RAG (Retrieval-Augmented Generation) implementations.

コメント