Tuesday, September 19, 2023

SharePoint Online - Check CopyJobProgress via Azure Storage Queue SDK for .Net

This post assumes some knowledge about SharePoint Online, it's file storage model via DocumentLibrary ie Drive ie List and CreateCopyJobs API used to bulk move drive items. Also Azure storage queue and how to interact with it via .Net SDK

Scenario

We must copy all the folders of one SharePoint drive sitting in one SharePoint Online site to drive sitting in another site. The destination site already has the drive created. It needs to be done efficiently i.e. without downloading the files from one SharePoint drive and upload to another.

There is no drive copy mechanism available in either Graph API as well as legacy SharePoint API (CSOM). Please comment if one exists. 
One of the promising ways is to use the CreateCopyJobs API present in SharePoint API via PnP.Framework SDK for .Net. This API accepts a list of drive item URLs, and a destination URL then returns one or more Jobs. This job is technically de-queued by Microsoft worker machines and does the actual copy job. They may have backdoors to copy the files efficiently. The returned value is of type CopyMigrationInfo. This has a JobId of type Guid. An Url to random Azure Storage Queue with SAS token, a key to decrypt messages, and the IDs of drive items it is working on.
Note that the Azure Storage Queue is not to store the work messages. It has the log messages that are generated as part of processing. 
Each Job can be polled via site.GetMigrationJobStatus(jobId) to check the high-level status of Queued, Processing, and None. completed. As per direct Microsoft meetings, None means Completed. Nowhere it is documented.
None ie Completed means it is a success. Failed also will be considered completed. In order to get a detailed status of whether it failed or not we need to use another API site.GetCopyJobProgress(jobInfo). This returns CopyJobProgress object. Using this GetCopyJobProgress API is complex. It emits Logs property sometimes. 

So we can look at the CopyMigrationInfo object that we got after the Job creation and its URL to dequeue log messages from Azure Storage Queue.

This post is about how to check the status of CopyJobs via the Azure Storage Queue. The second warning here. Without the familiarity of these APIs, it would be difficult to understand the rest.

Problem

99.99% of how this thing is working is not documented. No working sample from Microsoft to use this  CreateCopyJobs API. The PnP.Framework SDK doesn't support dequeue messages from the Azure Storage Queue. Even there is no class to deserialize the output log entry into.

Solution

Let us start with code as the high-level steps are to de-queue messages and decrypt using the key provided. The encryption algorithm is AES256 with CBC. The IV is base64 encoded and available with the message.

        private IEnumerable<CopyJobLog> GetCopyJobLogsFromAzureStorageQueue(CopyMigrationInfo info)
        {
            Response<QueueMessage[]> messages;
            List<CopyJobLog> result = new();
            do
            {
                QueueClient client = new(new Uri(info.JobQueueUri));
                messages = client.ReceiveMessages(maxMessages: 25);
                IEnumerable<CopyJobLog> logs = DecryptAndDeserializeMessages(messages, info.EncryptionKey);
                result.AddRange(logs);
            } while (messages.Value.Length != 0);
            return result;
        }

Now the code for DecryptAndDeserializeMessages()

        private IEnumerable<CopyJobLog> DecryptAndDeserializeMessages(Response<QueueMessage[]> messages, byte[] encryptionKey)
        {
            foreach (Azure.Storage.Queues.Models.QueueMessage msg in messages.Value)
            {
                var base64DecodedArray = Convert.FromBase64String(msg.Body.ToString());
                var jsonBody = Encoding.UTF8.GetString(base64DecodedArray);
                AzureJobProgress progress = JsonConvert.DeserializeObject<AzureJobProgress>(jsonBody);
                string progressString = progress.Decrypt(encryptionKey);
                yield return JsonConvert.DeserializeObject<CopyJobLog>(progressString);
            }
        }

Now we need the code for 2 entity classes and a couple of enums. They are below

   class AzureJobProgress
    {
        public string Label { get; set; }
        public string JobId { get; set; }
        public string IV { get; set; }
        public string Content { get; set; }

        public string Decrypt(byte[] key)
        {
            using (Aes aes = Aes.Create())
            {
                aes.Key = key;
                aes.IV = Convert.FromBase64String(IV);
                aes.Mode = CipherMode.CBC;
                using (ICryptoTransform decipher = aes.CreateDecryptor(aes.Key, aes.IV))
                {
                    using (MemoryStream ms = new MemoryStream(Convert.FromBase64String(Content)))
                    {
                        using (CryptoStream cs = new CryptoStream(ms, decipher, CryptoStreamMode.Read))
                        {
                            using (StreamReader sr = new StreamReader(cs))
                            {
                                return sr.ReadToEnd();
                            }
                        }
                    }
                }
            }
        }
    }

The last one has more properties and enums.

enum CopyJobLogMigrationDirection
    {
        Export,
        Import
    }
    enum CopyJobLogEvent
    {
        JobQueued,
        JobStart,
        JobLogFileCreate,
        FinishManifestFileUpload,
        JobProgress,
        JobEnd,
        JobFinishedObjectInfo,
        JobWarning,
    }
    enum CopyJobLogMigrationType
    {
        Copy,
    }
    class CopyJobLog
    {
        public CopyJobLogEvent Event { get; set; }
        public int TotalRetryCount { get; set; }//Set for Events:JobProgress,
        public DateTime Time { get; set; }
        public Guid JobId { get; set; }
        public Guid SiteId { get; set; }
        public Guid DbId { get; set; }
        public CopyJobLogMigrationType MigrationType { get; set; }
        public CopyJobLogMigrationDirection MigrationDirection { get; set; }
        public string Url { get; set; }
        public int FilesCreated { get; set; }//Set for Events:JobProgress,
        public int BytesProcessed { get; set; }//Set for Events:JobProgress,
        public int TotalExpectedBytes { get; set; }
        public int ObjectsProcessed { get; set; }//Set for Events:JobProgress,
        public int TotalExpectedSPObjects { get; set; }//Set for Events:JobProgress,
        public int TotalErrors { get; set; }//Set for Events:JobProgress,
        public int TotalWarnings { get; set; }//Set for Events:JobProgress,
        public double TotalDurationInMs { get; set; }//Set for Events:JobProgress,
        public int CpuDurationInMs { get; set; }//Set for Events:JobProgress,
        public int SqlDurationInMs { get; set; }//Set for Events:JobProgress,
        public int SqlQueryCount { get; set; }//Set for Events:JobProgress,
        public int WaitTimeOnSqlThrottlingMilliseconds { get; set; }
        public string FileName { get; set; }
        public bool IsShallowCopy { get; set; }//Set for Events:JobProgress,
        public Guid CorrelationId { get; set; }
        public string SourceObjectFullUrl { get; set; }
        public string TargetServerUrl { get; set; }
        public string TargetListId { get; set; }
        public string TargetObjectSiteRelativeUrl { get; set; }
        public string TargetObjectUniqueId { get; set; }
        public string TargetSiteName { get; set; }
        // For JobStart
        public Guid FarmId { get; set; }
        public Guid SubscriptionId { get; set; }
        // For JobWarning
        public string ObjectType { get; set; }
        public Guid Id { get; set; }
        public int SourceListItemIntId { get; set; }
        public int TargetListItemIntId { get; set; }
        public string Message { get; set; }
        public string ManifestFileName { get; set; }//Set for Events:FinishManifestFileUpload
    }

Hope this is somewhat explanatory with good prior information about SharePoint CreateCopyJobs API. Planning to wrap all these into a working sample in GitHub. Hope that can be done soon.

Comments are welcome. Use this API at your risk as it lacks documentation and samples.

References

No comments: