Tuesday, November 8, 2022

SharePoint Online - No APIs to setup Information Management Policies

After the huge gap since I started this blog, I am back active in this blog. My plan is to post 1 per week published every Tuesday. But it was not happening for the last months as we were taking an important step in the American dream. Moved to the south to own a house. Now we are kind of settled and resuming the weekly posts.

Background

SharePoint is a good content management software. If we are managing a project we can throw all the files at it and it does a pretty good job of housekeeping. Fundamentally it is nothing but a highly customizable ASP.Net Web Site.

The problem starts when we start using it for each and every problem. Basically when we customize. My first experience was in 2011-12 time when we tried to customize SharePoint to look like a normal site. At that time it was just SharePoint that we installed on our on-premise servers. Now Microsoft packed it as SharePoint Online and the same struggle continues ith it. It's like old wine in a new bottle including the API surface.

One of the issues I had posted earlier was about the "Dilemma of choosing .Net SDK to interact with SharePoint". This time forget about SDK, the API itself missing.

The problem

Let me try to state the problem with some background. One of my projects was using SQL FileStream to store files and Office Online Server to edit those files. The project is moving to the cloud and the client operations team found it very costly when saving files in SQL using FileStream technology. The client is already invested heavily in Microsoft 365 and the suggestion was to use SharePoint as storage. Later we hit the limits of Office Online Server and started looking for alternative solutions. Altogether we ended up in SharePoint Online. Normally when we work with SharePoint there will be one or 2 sites for the application. But this application is multitenant by nature and each tenant needs their own SQL database and SharePoint site and there are 1000s of tenants.

When we kept the files in the SQL FileStream there were 2 differential backups daily and full weekly. The operations team keeps those files for the next 180 days for any point in time rollbacks. They can revert back to any day within the last 180 days by just restoring the database. The 180 days is crucial from the data governance perspective.

SharePoint Online has 2 stages of recycle bins named "Site Recycle Bin" and "Site Collection Recycle Bin" but the combined total days is only ~93 days. So we cannot leverage that feature as we need 180 days. We ended up with our own mechanism of the "Custom Recycle Bin". It is nothing but a document library/drive. Whenever the application deletes a file it moves to this "Custom Recycle Bin" document library. The users are not allowed to directly go to SharePoint Online and manage files. SharePoint Online is strictly for viewing and editing files. The application keeps track of files and there is link-based navigation to SharePoint.com URLs. 

Now that when any file in that document library reaches 180 days we have to delete it? How to do that? Should we run CRON jobs ourselves or leverage anything from the mighty SharePoint?

Potential solution - policies

The more we write code the more chances of bugs. So we as architects always try to find proven ways and leverage those. To solve the above problem SharePoint has a mechanism of "Information Management Policies".

We can set a policy that says "when the 'CreatedOn' date of a document in a particular document library is older than 180 days, delete it". There are a lot of tutorials to do that using the on-premise SharePoint UI. The same UI is available in SharePoint Online as well.

Enabling the feature

We had to enable the feature first to get the screens to create policy as it is by default disabled for newly created sites.

The real problem - missing API

It all went well when we tried to set up an information management policy on the custom recycle bin document library. The first policy firing requires some days as the information management policy execution fires weekly. Anyway, it triggered and executed the file deletion. The problem was that there is no way the information management policy can be set up programmatically from the .Net application. Since we are creating the sites programmatically for each tenant, we have to set up the same policy everywhere. As explained earlier, the main reason to set up programmatically is that we are using SharePoint as backend storage without giving access to users to freely navigate in SharePoint. 
There were no APIs are available. We did all the google and tried to look at F12 dev tools for any API call. All that we saw was some .aspx pages are getting full-page refreshes. 
We finally contacted Microsoft and they also confirmed there is no API to set Information Management Policies in SharePoint Online. 
It was really surprising. It is kind of showing that they simply took the old ASPX pages from SharePoint and hosted them in Azure and started calling SharePoint Online. Put an unfinished Graph API around it for some features to give it a fresh look. Sorry Microsoft, this is the experience of a developer. Hope you can improve from this feedback.

Solution

We have to write CRON code and schedule it every day. It has to iterate through all the files in the custom recycle bin document library to check if anything is 180 or more days old. If so, delete it permanently without falling into the inbuilt SharePoint recycle bin.

No comments: