Tuesday, November 14, 2023

SharePoint Online - Which PnP SDK to download large files greater than 2GB

As we know there are 2 PnP nuget SDKs to deal with SharePoint Online from .Net client applications. PnP.Framework and PnP.Core to name and this post discuss the approach to download files larger than 2 GB. The 2 GB is an important mark as some methods work only till the MaxValue of the integer¹ data type which is 2147483647.

PnP.Framework to download > 2GB files from SharePoint Online

Below is the code to download the file and save it as a local file.

var file = clientContext.Web.GetFileByUrl(driveItemIdOrUrl);
clientContext.Load(file);
await clientContext.ExecuteQueryAsync();
logger.LogInformation($"File exists in SharePoint. Size:{file.Length:0,#}");
ClientResult<Stream> streamFromSPO = file.OpenBinaryStream();
await clientContext.ExecuteQueryAsync();

var localFilePathToDownload = Path.Combine(localFolderPathToDownload, file.Name);
using (Stream fileStream = new FileStream(localFilePathToDownload, FileMode.Create))
{
    using (streamFromSPO.Value)
    {
        streamFromSPO.Value.CopyTo(fileStream);
    }
}
logger.LogInformation($"Downloaded to {localFilePathToDownload}");

It works fine when the file size is below the max value of the integer. 

Now let us see what happens when the file size is larger.

System.FormatException: 'Invalid MIME content-length header encountered on read.'
There is one more method called OpenBinaryStream with options² that accepts SPOpenBinaryOption' enum. But there is nothing that helps us to get more than 2 GB files.

There is another method called OpenBinaryDirect³ to download files. But it fails with 401 Unauthorized when using JWT for authentication. Not even able to download smaller files.
As we can see in the above image the clientContext object is the same that was used to obtain the file details.

Up on Google, we can get different suggestions such as doing it all ourselves using raw HTTP calls⁵ to avoid it and completely ⁶ avoid and migrate to PnP.Core. 

PnP.Core to download > 2GB files from SharePoint Online

Below is the code to download a file using PnP.Core.

var file = await pnpContext.Web.GetFileByServerRelativeUrlAsync(fileUrl);
logger.LogInformation($"File exists in SharePoint. Size:{file.Length}");
Stream streamFromSPO = await file.GetContentAsync(true);
var localFilePathToDownload = Path.Combine(localFolderPathToDownload, file.Name);
using (Stream fileStream = new FileStream(localFilePathToDownload, FileMode.Create))
{
    using (streamFromSPO)
    {
        streamFromSPO.CopyTo(fileStream);
    }
}
logger.LogInformation($"{nameof(DownloadFileWithReplace)} - Downloaded to {localFilePathToDownload}");

It works fine for files smaller than the max value of the integer.


What happens when the size increases above that limit?

It works!! The Int32.MaxValue didn't cause any problem here. But there are some occasions when we may experience an issue with the stream.CopyTo method. It can even happen regardless the process is 64-bit or not. In such situations, we need to tackle them separately.

One way is to pass the Buffer Size to the stream.CopyTo method.

streamFromSPO.CopyTo(fileStream,BufferSize);

Another way is to chunk ourselves. The code is below for our own chunking.

var file = await pnpContext.Web.GetFileByServerRelativeUrlAsync(fileUrl);
logger.LogInformation($"File exists in SharePoint. Size:{file.Length}");
var buffer = new byte[PnPCoreConstants.BufferSizeToDownloadLargeFile];
           
var localFilePathToDownload = Path.Combine(localFolderPathToDownload, file.Name);
using (Stream streamFromSPO = await file.GetContentAsync(true))
{
    using (Stream fileStream = new FileStream(localFilePathToDownload, FileMode.Create))
    {
        int bytesRead; long totalBytesRead = 0; long streamLength = 0;
        if (streamFromSPO.CanSeek)
        {
            streamLength = streamFromSPO.Length;
            logger.LogInformation($"{nameof(DownloadFileWithReplace)} - File Stream length:{streamLength}");
        }
        else
        {
            streamLength = file.Length;
        }
        while ((bytesRead = await streamFromSPO.ReadAsync(buffer, 0, buffer.Length)) != 0)
        {
            totalBytesRead += bytesRead;
            fileStream.Write(buffer, 0, bytesRead);
            logger.LogInformation($"{nameof(DownloadFileWithReplace)} - Progress - Read {bytesRead:0,#} and wrote.Total {totalBytesRead:0,#} / {streamLength:0,#} to {localFilePathToDownload}");
        }
     }
}
logger.LogInformation($"{nameof(DownloadFileWithReplace)} - Downloaded to {localFilePathToDownload}");

The interesting thing is that even if we provide a 2 MB buffer size, it reads 16,384 bytes only. 

Conclusion

Use the PnP.Core to download large files.

Reference

¹ - https://learn.microsoft.com/en-us/dotnet/api/system.int32.maxvalue?view=net-7.0

² - SPOpenBinaryOptions - https://learn.microsoft.com/en-us/previous-versions/office/developer/sharepoint-2010/bb802697(v=office.14)

³ - https://learn.microsoft.com/en-us/previous-versions/office/sharepoint-csom/ee537083(v=office.15)

 - https://stackoverflow.com/questions/45674435/401-unauthorized-exception-while-downloading-file-from-sharepoint

⁵ - https://piyushksingh.com/2016/08/15/download-large-files-from-sharepoint-online/

⁶ - Use PnP.Core https://github.com/pnp/powershell/pull/1239

No comments: