Joymon V/S Code: SharePoint Online - Graph API to get all files in a drive recursively even if it has more than 5000 items

Problem

As part of the validation, we must dump all the file details inside a SharePoint drive. Some more points below

The input is a drive graph id.

It has to be recursive

There can be 10,000+ files in the drive and the export needs to be done in a reasonable amount of time.

The expected output is CSV with the below schema.

FolderName, FileGraphId, FileName

FileGraphId is the unique id of the file that Graph API understands. It is not Guid or integer instead it uses a 34-character wide format like '01XXXXA4QGNCJMT7XXXVDKCNXXXV6LCC4E'

Approaches

There are 3 APIs to interact with SharePoint Online. The first Graph has no direct method to list all the files recursively. We have to go by individual Graph API calls to the list folders and make subsequent Graph calls to get the folders and files inside recursively. It works but is very slow and high chance of getting throttled. There is a search API but that has no guarantee of returning all the files as the indexing will take some time.

If we use the PnP.Core or PnP.Frameworks, there is no way to get the Graph IDs.

Solution

So lastly we have to go via Graph API but get the List items (not drive) then get the driveItem of list item to retrieve the Graph Id. The code is as follows.

var graphClient = new GraphServiceClient(tokenProvider);

//Url for below looks like /v1.0/sites/{siteGraphid}}/lists/{listGuid}/items?$select=contentType&$expand=driveItem HTTP/1.1
var listItems = await graphClient
                        .Sites[siteGraphId]
                        .Lists[listGuid] // Need to get the ListId from DriveId
                        .Items
                        .GetAsync(req=> { 
                                            req.QueryParameters.Expand = new []{"driveItem"};
                                            req.QueryParameters.Select = new []{"contentType"};
                                        });
listItems
.Value
.Where(item=>item.ContentType.Name == "Document")
.ToList()
.ForEach(item=>Console.WriteLine($"FolderName:{item.DriveItem.ParentReference.Name},File Graph Id:{item.DriveItem.Id}, File Name: {item.DriveItem.Name}"));

The drive Id needs to be translated to ListId to get this working. The QueryParameters.Select is used to limit the data to mainly 2 elements: driveItem and contentType.

Full code

The above code gives the idea how the required data can be pulled. Below goes full code that can be run inside Polyglot notebook.

#r "nuget:Microsoft.Graph"
#r "nuget:Azure.Identity"
#r "nuget:CsvHelper"
using Microsoft.DotNet.Interactive;
using Microsoft.Graph;
using Microsoft.Graph.Models;
using Azure.Core;
using Azure.Identity;
using System.Collections.Generic;
using CsvHelper;
using CsvHelper.Configuration;
using System.Globalization;
public class FooMap : ClassMap<(string  FolderName,string FileGraphId,string FileName)>
{
    public FooMap()
    {
        Map(m => m.FolderName).Name("FolderName");
        Map(m => m.FileGraphId).Name("FileGraphId");
        Map(m => m.FileName).Name("FileName");
    }
}
TokenCredential tokenprovider = new UsernamePasswordCredential(
                            "{userName@tenant.onmicrosoft.com}",
                            "{password}",
                            "{AAD Tenant Id}",
                            "{AAD App registration id}");
var siteGraphId="{3 part graph Id of site}";
var listGuid = "{Guid of List}";// Get this using Drive.
var graphClient = new GraphServiceClient(tokenprovider);
//Url for below looks like /v1.0/sites/{siteGraphid}}/lists/{listGuid}/items?$select=contentType&$expand=driveItem HTTP/1.1
ListItemCollectionResponse listItems = await graphClient
                        .Sites[siteGraphId]
                        .Lists[listGuid]
                        .Items
                        .GetAsync(req=> { 
                                            req.QueryParameters.Expand = new []{"driveItem"};
                                            req.QueryParameters.Select = new []{"contentType"};
                                        });
var finalItems = new List<(string FolderName,string FileGraphId,string FileName)>();
var pageIterator = PageIterator<ListItem,ListItemCollectionResponse>
.CreatePageIterator(graphClient,listItems,(item)=>{
    if(item.ContentType.Name == "Document")
    {
        finalItems.Add((FolderName:item.DriveItem.ParentReference.Name,FileGraphId:item.DriveItem.Id,FileName:item.DriveItem.Name));
    }
    return true;
});
await pageIterator.IterateAsync();
using(var outputFileWriter= new StreamWriter(@"c:\temp\export.csv"))
{
    using (var csv=new CsvWriter(outputFileWriter,CultureInfo.InvariantCulture))
    {
        csv.Context.RegisterClassMap<FooMap>();
        csv.WriteRecords(finalItems);
    }
}
finalItems.ForEach((item)=>Console.WriteLine($"FolderName:{item.FolderName},File Graph Id:{item.FileGraphId}, File Name: {item.FileName}"))
Each(item=>Console.WriteLine($"FolderName:{item.DriveItem.ParentReference.Name},File Graph Id:{item.DriveItem.Id}, File Name: {item.DriveItem.Name}"));

The authentication used here is the ROPC username password. It needs to be changed based on best practices. It uses a nuget package for writing a list as a CSV file.

References

https://stackoverflow.com/questions/54864198/list-all-files-recursively-in-microsoft-graph

Joymon V/S Code

About Me

Social Media

Followers

Total Pageviews

Tuesday, October 31, 2023

SharePoint Online - Graph API to get all files in a drive recursively even if it has more than 5000 items

Problem

Approaches

Solution

Full code

References

No comments:

Blog Archive

Popular Posts

Label Cloud