Gently Rotting Documents Can Corrupt Copilot Output

In April, I wrote about Microsoft’s introduction of Restricted SharePoint Search and described the new capability as a sticking plaster solution for those who fear that the introduction of Microsoft 365 Copilot will leak sensitive information into the public domain. The more I think about Restricted SharePoint Search, the more I dislike it. Limiting enterprise search to one hundred curated sites is a horrible thing to do to SharePoint Online.

The problem Microsoft attempts to address in Restricted SharePoint Search is Copilot’s marvelous ability to find and reuse text. Any document stored in SharePoint Online or OneDrive for Business that’s available to the signed-in account can be used by Copilot to generate responses to user prompts. Copilot only sees words. It has no sense of importance, accuracy, obsolescence, or usefulness. Humans acquire the ability to detect problems in written content through years of practice. The current generation of AI assistants focuses on words, not context.

There wouldn’t be an issue if humans stored perfect information in SharePoint Online and OneDrive for Business. Perfect input data usually leads to good results. Messy input data leads elsewhere. And the data stored in Microsoft 365 repositories is often messy, imprecise, inaccurate, or misleading. It is the digital debris or digital rot that afflicts so many tenants.

When we worked with paper filing cabinets, it was common practice to remove old material. Today, we leave documents and files to molder, buried at the bottom of folders, and invisible to the UIs that we use to interact with SharePoint Online and OneDrive for Business. Out of sight and out of mind is very accurate when it comes to old documents, presentations, worksheets, PDFs, and other files. And until now, having a layer of digital debris at the bottom of sites didn’t matter very much.

But Microsoft Search indexes the old stuff along with the new, and when Copilot queries the Graph to find suitable information to help it ground user prompts, the likelihood exists that Copilot will consume and reuse digital debris in its responses. And it could be true that a document written in 2016 remains the definitive treatment of a topic, but on the other hand, it might not.

Microsoft 365 Archive

This brings me to Microsoft 365 Archive, launched in preview about a year ago (and still in preview). Microsoft 365 Archive is a pay-as-you-go (PAYG) service intended to allow customers to move sites containing information that they want to keep but don’t need immediate access to. When a site is archived, Microsoft moves its content (document libraries, lists, etc.) to “colder storage.” The information in archived sites remains available to Microsoft Purview solutions, meaning features like retention processing and eDiscovery searches/content searches can find items. However, end users can no longer search and find information stored in the site and the site must be reactivated before it’s possible for eDiscovery searches to download results.

Microsoft 365 Archive is a perfect solution for companies that can’t make their mind up about how to handle old material. Or rather, don’t have the time to go through the old material and decide what to keep and what to retain. Microsoft charges a lower per-GB price for archived sites than they do for those using “hot” SharePoint storage.

While archived storage is metered, Microsoft doesn’t charge unless the combination of hot and cold storage consumed by the tenant passes the licensed SharePoint storage quota (1 TB plus 10 GB per licensed user account). When the total storage passes the tenant storage quota, Microsoft charges for archived sites at $0.05 per GB/month, which is a lot cheaper than the $0.20 per GB/month charged for additional SharePoint “hot” storage.

Enabling Microsoft 365 Archive

Because Microsoft 365 Archive is a PAYG solution, you must have an Azure subscription to pay for its bills. It’s the same as for other Syntex solutions, such as SharePoint document translation and Microsoft 365 Backup. All the metered charges for these solutions plus Azure charges for services like Microsoft Sentinel accrue into a single monthly bill that’s charged to the credit card associated with the subscription.

For some, setting up the Azure subscription is the hardest part of enabling Microsoft 365 Archive. Once the subscription is in place, the other steps to turn on Microsoft 365 Archive in a tenant (Figure 1) are easy, and once those steps are complete, you can head to the SharePoint Online admin center to select the sites for archival.

Enabling Microsoft 365 Archive in a tenant.
Figure 1: Enabling Microsoft 365 Archive in a tenant

Archiving Sites

To archive one or more sites, select them from the Active sites list and choose the Archive option from the ellipsis menu […]. After a short delay, the site disappears from the active sites view and appears in the archived sites view (Figure 2).

The archived sites view.
Figure 2: The archived sites view

The Reactivate option appears after selecting one or more sites. Reactivation brings a site back from cold to hot storage and makes it accessible to users. Reactivation, which can take 24 hours to complete, incurs a $0.60 per-GB charge that isn’t offset against the tenant SharePoint storage quota. However, Microsoft doesn’t charge for reactivation within seven days of arching a site.

Microsoft 365 Archive doesn’t support OneDrive accounts. Given that OneDrive storage is covered by an Office 365 license, the same impulse to reduce storage costs doesn’t exist. However, in the same way that organizations like to keep inactive mailboxes around for ex-employees, I could see how some would also like to do the same for inactive OneDrive accounts.

Site Archival and Teams

Many sites are connected to teams. If you archive a team-connected site, the archival process only handles the SharePoint content. Anything else connected to the team remains in place, including its channels, plans, tabs, apps, and so on. But when team members go to the Files tab, it is empty because anything that was accessible there is now archived. Also, any email sent to a channel in the team fails because Teams cannot create a copy of the email message in SharePoint Online.

Given that teams supports an archive option (recently added to a function to archive individual channels), it would make sense for Microsoft to archive the team when it archives a team’s site.

Another gotcha is that you can’t archive a site belonging to a team if it has private or shared channels (Figure 3).

Sites with private or shared channels aren't supported by Microsoft 365 Archive
Figure 3: Sites with private or shared channels aren’t supported by Microsoft 365 Archive

Given the Teams development group’s focus on making better use of channels and stopping people from creating a new team every time a new topic comes up for discussion, it’s surprising that Microsoft 365 Archive doesn’t handle group-connected sites better. An individual team has a 1,000-channel capacity spanning a mixture of regular, shared, and private channels. It would be nice if Microsoft 365 Archive could handle a team with hundreds of channel-connected sites. Maybe this is a preview deficiency that Microsoft will address before general availability.

Removing Digital Debris for Copilot

Coming back to digital debris and the need to create an environment where Copilot for Microsoft 365 can work well, how can Microsoft 365 Archive help? I think the answer is that companies can archive complete sites identified as containing stale content. Regular reviews of sites can pick up those that no longer need online access, and those sites can then be archived.

Keeping site content in cold storage makes it inaccessible to Copilot. As time passes, Purview retention policies can gradually clear out old material (hopefully, the more important files are assigned retention labels with long retention periods). If time allows, sites could even be brought back online for content owners to review and remove unwanted material before being rearchived.

Alternatively, organizations could create special archive sites on a per-month or per-quarter basis and move files from “regular” sites to the archive sites when the information is no longer needed on an immediate basis. The archive sites can then be closed off at the end of the period and moved to cold storage. Some automation could ease the process of moving information from live to archive site.

Both options are the equivalent of sweeping dust under the carpet. They remove digital debris to a place where Copilot cannot find and use it while making sure that the information is available for eDiscovery. The only problem is that Microsoft says that they’re working on allowing end users to search archived sites. Microsoft isn’t saying if a user search facility will expose content to Copilot. Let’s hope that it won’t.

Microsoft 365 Archive Can Help

The only thing we can be sure of is that content stored in SharePoint Online will continue to grow. Passing time, developing events, and human interaction will mean that digital debris will accrue. If we depend on artificial intelligence to process digital debris alongside ‘good information,’ no one should be surprised if the results are less than stellar. I think Microsoft 365 Archive has the potential to help with the problem. It won’t cure digital rot, but archiving old sites will remove them from Copilot processing.

Remember the old rubbish in equals rubbish out adage. AI doesn’t change that equation. AI just generates more rubbish faster, if we let it. Even if you don’t plan to use AI for now, it is still a good idea to take more control over what’s stored in SharePoint Online. You know it makes sense.

About the Author

Tony Redmond

Tony Redmond has written thousands of articles about Microsoft technology since 1996. He is the lead author for the Office 365 for IT Pros eBook, the only book covering Office 365 that is updated monthly to keep pace with change in the cloud. Apart from contributing to Practical365.com, Tony also writes at Office365itpros.com to support the development of the eBook. He has been a Microsoft MVP since 2004.

Comments

  1. Stef Wanders

    Hello Tony, what about Team Sites connected to a group which has an expiration Policy from Entra ID setup?

    Can’t find any information about this; I tend to believe that, given your article that the Team (example) won’t be ‘disabled/archived’, the Group remains in place as well. However, what happens when the Group gets deleted with an archived SharePoint? Will it be removed or disconnected from the group, thus turning in a Team Site only, for it later to be groupefied again?

Leave a Reply