I’m not known to be an avid supporter of backup for Office 365 data. ISVs operating in this space do a reasonable job with Exchange Online and SharePoint Online, largely based on years of experience gained with on-premises servers, but struggle with applications like Teams and Planner. These applications have no on-premises counterpart, connect components drawn from across Microsoft 365, and don’t have an API suitable for backup and restore, which is not a great foundation for any backup product. And the surprising thing is that the problem of backup and restore for Office 365 has worsened since its launch ten years ago.
Issues with Restoring Office 365 Data
Leaving backup aside, the restore side of the equation is even more problematic. Among the issues I see are:
- The amount of data which might need to be restored is typically larger in the cloud than with on-premises servers. An Exchange Online enterprise mailbox with an archive might span 250 GB (or more). OneDrive for Business accounts can grow to 25 TB, and so on. Remember, Microsoft likes tenants to have all their data in Office 365. The more data in Microsoft’s datacenters, the harder it is to move to another cloud service.
- The challenge of restoring data into applications where no programmatic access is available. How, for instance, can you restore tasks into a Planner plan?
- The difficulty of restoring integrated applications. Is restoring Teams just a matter of restoring channel conversations, including private channels and soon shared channels? What about the SharePoint sites belonging to teams and private channels, personal and group chats, plans, and apps?
- The big question is finding a suitable restore target. If Office 365 is unavailable, what is a valid target to put the restored data? On-premises servers might be able to handle some mailbox and document data, but how quickly can these servers be brought online to deliver service to users, including linking an on-premises directory to these objects? Restoring to on-premises servers isn’t possible for cloud-only apps like Yammer, Teams, and Planner. And it’s hard to see how you could move the data to a different cloud service.
The Need for an Online Restore Target
Practically speaking, tenants need Office 365 and Azure AD to be online to restore data. And if a tenant is online, the instances when data needs to be restored include scenarios like:
- Cyberattack (ransomware) encrypts user data.
- Malicious or accidental deletion of user data which cannot be recovered using the methods built into Office 365.
Looking at how attacks have developed, it seems clear that documents and email are the most likely data ransomware seeks to encrypt. With that in mind, given that backup for documents and email is well covered, perhaps the attention of anyone concerned about ransomware should focus on these workloads. Not only are there many backup products available which can process this information, documents and email are the easiest to restore if a problem erupts.
The ransomware scenario is a real concern, but tenants can make sure that they’re not an easy target for attack by eliminating basic authentication wherever feasible, using multi-factor authentication for as many accounts as possible, and educating users how to recognize phishing and malware which gets through mail hygiene services. I don’t know of any Office 365 tenants that have been victims of a ransomware attack, but the fact that Microsoft publishes advice to help tenants recover from an attack indicates that this has happened.
Malicious removal of user data is often referred to as the “rogue administrator” problem, when someone who has permissions deletes data because they are disaffected for some reason (like they’ve just been fired). I don’t doubt that some become very annoyed and want to hurt a company, but I don’t know of many instances where this happened. Perhaps the extensive auditing of actions within Office 365 (which proves who did what and when) is enough to dissuade potential rogues from carrying out their plans. Or maybe it’s because tenants can use tools like Privileged Access Management and Privileged Identity Management to limit administrator access to data.
Out-of-the-box tools available in some Office 365 applications can help with the data removed accidently problem. For instance:
- The recover deleted items (email) feature available in Outlook clients. Exchange Online administrators can also recover deleted items for users through the new EAC or with PowerShell.
- The restore library feature available for SharePoint Online and OneDrive for Business allows the retrieval of deleted files for up to 30 days.
- Administrators can recover deleted items in mailboxes and sites if retention policies cover the locations or the items had retention labels. A content search can find and export copies of deleted items. Although you can use retention policies and labels to stop the permanent removal of data, these are tools for information governance and not backup. However, because policies retain data for set periods, a good chance exists that it will be possible to retrieve items deleted in error, assuming that the data are in locations covered by retention processing and the retention period does not expire.
User-centric features don’t handle large-scale recovery well. If you need to retrieve 100,000 documents or 100 mailboxes, restoring data from a backup is usually faster. That is, if you have a backup. If you don’t, you can still use the out-of-the-box tools in the knowledge that retrieval will be slower.
Gaps in Restore
Which brings us back to the issue that backup tools can handle Exchange Online and SharePoint Online but struggle with other Office 365 workloads. If someone deletes a bunch of tasks in a plan, you won’t get them back because Planner doesn’t have a recycle bin or other intermediate deletion point. If someone deletes a bunch of messages in a group chat, you might be able to retrieve the compliance records for those messages but won’t be able to insert them back into the chat. And anyway, some of the content in the messages will be missing (like reactions). If someone deletes all the registered app details from Azure AD, any app which had consent to use the Graph APIs to access Office 365 data is nullified.
The point is that restoring all the connections which constitute an Office 365 tenant and its active workloads is a devilishly complex undertaking. So much so that I doubt that the complete restoration of a tenant, its configuration, and all its data can be done automatically. It might be possible to demonstrate such a feat with a test tenant with a small amount of data. But once the imperfections of operational life take hold (evident in symptoms like group sprawl), the difficulties facing any restore operation mounts. This doesn’t mean that an imperfect restore has no value. If your tenant is dead in the water, any restore is better than none.
Ten Years On
Office 365 is approaching its tenth anniversary. It’s odd that a situation exists where comprehensive tenant-wide backup and recovery spanning all workloads is impossible. This is especially true given that it was possible to contemplate such an operation for the original applications included in Office 365 in June 2011. The introduction of cloud-only applications and the massive growth in data since has created the challenges we now face. Microsoft has remained oddly passive in this area and left the running to ISVs, who are handicapped by the lack of suitable APIs. Let’s hope the situation improves over the next decade.