In our first two articles, we established the fundamentals of the Microsoft Graph Activity Log and applied them to Investigate Mailbox Breach. Now, we pivot from communication security to data security, tackling another critical threat vector: document exfiltration.
Investigating a Potential Theft of Confidential Plans
Consider a situation where a key product manager resigns. Shortly after, the organization’s confidential product features are announced by a competitor. When an internal review is conducted, it shows no obvious signs of a breach: email logs are clean, and endpoint security raised no alarms. The data leak is untraceable, yet the financial and competitive damage is real. Such a silent theft of intellectual property, be it from an insider or a stealthy external attacker, underscores the necessity for deep visibility into file-level activity.
This is where the investigative power of the Microsoft Graph Activity Log becomes essential. By providing a detailed audit trail of every file interaction within SharePoint and OneDrive, these logs allow us to trace the digital footprints of data as it moves through your tenant.
In this article, we’ll provide a practical playbook for using the Graph Activity Log and Kusto Query Language (KQL) to hunt for indicators of document exfiltration. We will cover how to spot anomalous file access, detect mass download events, and uncover suspicious sharing activities to build a clear timeline of a data breach.
Anatomy of Document Exfiltration: What to Look For
To hunt for a threat, one must first understand its nature. Document exfiltration is rarely a brute-force attack; it is a crime of stealth. After an initial compromise, an attacker’s primary objective is to blend in with the noise of daily business operations while quietly siphoning an organization’s most valuable data. Their goal is to copy, sync, or share sensitive files without raising a single alarm. Exfiltration is not always subtle; sometimes it can be quite obvious. A resigning employee, for example, may have limited time to act. This pressure can cause them to abandon stealth for speed, leading to rushed mistakes and anomalous activity that is easier to detect. Investigators should therefore hunt for both quiet, “low-and-slow” patterns and noisy, high-volume events.
Fortunately, these covert actions are not invisible. Every interaction with a file through the Microsoft 365 ecosystem, from a simple preview to a bulk download, generates a corresponding API request. The Microsoft Graph Activity Log serves as the immutable, high-fidelity witness to these events. By meticulously analyzing these digital breadcrumbs, investigators can reconstruct the attacker’s movements, piece together a timeline of compromise, and build a clear narrative of the breach. This deep understanding of the attacker’s playbook is the foundation for a focused and effective investigation.
The digital footprints of data theft often manifest as specific, telltale signs. Investigators should hunt for the following indicators of compromise (IOCs):
- Aberrant Access Locations: A user account that typically operates from one geographic region, suddenly accessing sensitive SharePoint sites from a disparate location or an anonymizing VPN service. This is a classic indicator of a compromised account being used to probe data stores.
- Anomalous Data Volume: A sudden, dramatic spike in file read or synchronization operations that deviates sharply from established user baselines. An account downloading gigabytes of data outside of business hours, for example, strongly suggests an automated data harvesting attempt.
- Exposure via Public Sharing: The creation of anonymous or “Anyone with the link” permissions for confidential documents or folders. This action effectively bypasses perimeter controls and makes sensitive data accessible from the public internet, a common tactic for easy exfiltration.
- Unusual Device Sync Activity: The synchronization of a sensitive document library, such as “Financial Reports” and “Legal Contracts” to a new, previously unseen device. This could indicate an attacker preparing to take a large volume of data offline. However, this indicator also includes other device activity that may seem normal in isolation but is unusual for a particular user. Examples include requesting Bring Your Own Device (BYOD) or remote desktop access, which may facilitate external capture, adding a new personal printer, or connecting new hardware like HDMI-to-USB capture adapters that can masquerade as standard monitors to record screen activity.
- Privilege Escalation for Data Access: An application being granted powerful, data-centric permissions like Sites.Read.All or Files.ReadWrite.All. If this occurs outside of a documented change control process, it may signal an attacker provisioning a rogue application to steal data tenant-wide.
Investigative Playbook: Core KQL Queries
With these IOCs in mind, we can use Log Analytics to pinpoint suspicious file-related activities. The following KQL queries are designed to hunt for the digital breadcrumbs of document exfiltration.
- General File Activity Audit for a Specific User
When investigating a user’s account, the first step is to establish a picture of their normal behavior. It’s impossible to spot strange activity without first understanding what a user’s everyday actions look like.
This initial check acts like a wide net, capturing all recent file and folder interactions in SharePoint and OneDrive. These actions are logged automatically in the Microsoft 365 audit logs, which serve as a foundational source for behavioral baselining. You can learn more about the types of activities that are recorded in the audit log by visiting Audit log activities in Microsoft 365. By reviewing this history, investigators can more easily spot anomalies that may indicate a compromised account or insider threat.
The following KQL query shows a user’s file activity from the last seven days, including when it happened, the source IP address, and the application used.
MicrosoftGraphActivityLogs | where TimeGenerated > ago(7d) | where UserId == "<UserPrincipalName>" | where RequestUri has "/drive/items/" or RequestUri has "/sites/" | project TimeGenerated, AppId, IPAddress, UserAgent, RequestMethod, RequestUri | sort by TimeGenerated desc
Reviewing this output helps you identify access from strange IP addresses, usage of unapproved applications (AppId), or access to unusual files and sites.
2. Uncovering the “Smash and Grab”: Detecting High-Volume Download Activity
A common tactic after an initial compromise is a digital “smash and grab,” where an attacker attempts to exfiltrate as much valuable information as possible before being discovered. Their objective is to copy, sync, or share sensitive files quietly. While a single employee downloading files is normal, an account that suddenly accesses hundreds of files within a very short timeframe is a significant red flag. This type of high-volume activity is one of the most reliable signs of data exfiltration.
Catching this behavior requires a method to distinguish the routine noise of daily operations from the sudden spike of a potential breach. The following KQL query serves as a powerful tool for this purpose. It is engineered to act as a digital tripwire, specifically detecting these anomalous surges in download activity.
MicrosoftGraphActivityLogs | where TimeGenerated > ago(7d) | where RequestUri has "/content" // Filters for file download/read operations | project TimeGenerated, AppId, UserId, IPAddress | summarize RequestCount = count() by AppId, UserId, IPAddress, bin(TimeGenerated, 1h) | where RequestCount > 200 | order by RequestCount desc
This query returns a list of users who have performed an unusually high number of file downloads or reads, more than 200 in a single hour, within the last seven days.
Each result highlights a potential data harvesting attempt by showing the total download count, the user, their IP address, and the application used.
3. Hunting for Suspicious Anonymous Sharing
Creating anonymous or external sharing links is a simple yet effective technique that attackers may use to exfiltrate data by exploiting permissive file access settings in SharePoint or OneDrive. The following query identifies API calls that initiate the creation of such links, providing visibility into the users involved, the applications used, and the originating IP addresses.
MicrosoftGraphActivityLogs | where TimeGenerated > ago(30d) | where RequestUri has "/createLink" and RequestMethod == "POST" | project TimeGenerated, AppId, UserId, IPAddress, RequestUri | sort by TimeGenerated desc
The query returns an audit log of all file and folder sharing links created within the past 30 days. The results are sorted with the most recent entries first and include the timestamp, user ID, IP address, application ID, and the specific resource that was shared.
4. Detecting Attempts to Cover Tracks: The Download-and-Delete Pattern
A sophisticated attacker or a malicious insider might not just steal data; they might also try to cover their tracks by deleting the evidence. A classic pattern of this behavior is for a user to download a sensitive file and then delete the original from the corporate system shortly thereafter, hoping to obscure what was taken.
This advanced query is specifically designed to hunt for this sequence of events. It correlates file download operations with file deletion operations that are performed by the same user from the same IP address within a narrow 60-minute window. Flagging such a tight correlation provides a strong signal of intentional and highly suspicious activity that warrants immediate investigation.
Now, it’s crucial to understand that this activity isn’t always malicious. Consider an employee who is leaving the company. They might be downloading their personal information from the system, like healthcare insurance forms or pension details, and then deleting the originals to protect their privacy. This is a perfectly reasonable action, especially if they know their manager might get access to their account after they depart.
So, before escalating an alert, take a moment to check if the deleted files appear to be personal. This simple step can save you from chasing a false alarm and help you focus on what could be a genuine “download-and-delete” attack.
MicrosoftGraphActivityLogs | where TimeGenerated > ago(7d) | where RequestUri has "/content" or (RequestMethod == "DELETE" and RequestUri has "/drive/items/") | extend IsDownload = iif(RequestUri has "/content", 1, 0), IsDelete = iif(RequestMethod == "DELETE" and RequestUri has "/drive/items/", 1, 0) | summarize DownloadTime = maxif(TimeGenerated, IsDownload == 1), DeletionTime = maxif(TimeGenerated, IsDelete == 1), DownloadedFile = maxif(RequestUri, IsDownload == 1), DeletedFile = maxif(RequestUri, IsDelete == 1) by IPAddress, UserId | where isnotnull(DownloadTime) and isnotnull(DeletionTime) | where abs(datetime_diff("minute", DeletionTime, DownloadTime)) < 60 | project DownloadTime, DeletionTime, UserId, IPAddress, DownloadedFile, DeletedFile
The results from this query pinpoint these highly correlated events. Each row provides the investigator with the specific user and IP address involved, the exact timestamps for both the download and the deletion, and the names of the files, offering a high-fidelity signal for follow-up.
Automating the Hunt: A PowerShell Playbook
While KQL is powerful for interactive analysis, PowerShell scripts can standardize the investigation and make it accessible to more team members. The following script automates the four queries discussed above, prompting the analyst for input and exporting the results to a CSV file for reporting or further analysis.
Connect-AzAccount # Define Workspace ID and output file path $workspaceId = "<replace_this_with_your_workspace_id>" $csvFilePath = ".\exfiltration_file_activity_result.csv" # Prompt for user input Write-Host "`nSelect the query to run:" Write-Host "1. Establishing a Baseline: Auditing User File Interactions (last 7 days)" Write-Host "2. Uncovering High-Volume Download Activity (last 7 days)" Write-Host "3. Hunting for Suspicious Anonymous Sharing (last 30 days)" Write-Host "4. Detecting the Download-and-Delete Pattern (last 7 days)" $selection = Read-Host "Enter the query number (1-4)" # Dynamically build the KQL query based on selection switch ($selection) { '1' { $userId = Read-Host "Enter UserPrincipalName (e.g., user@example.com)" $query = @" MicrosoftGraphActivityLogs | where TimeGenerated > ago(7d) | where UserId == '$userId' | where RequestUri has "/drive/items/" or RequestUri has "/sites/" | project TimeGenerated, AppId, IPAddress, UserAgent, RequestMethod, RequestUri | sort by TimeGenerated desc "@ } '2' { $query = @" MicrosoftGraphActivityLogs | where TimeGenerated > ago(7d) | where RequestUri has "/content" | project TimeGenerated, AppId, UserId, IPAddress | summarize RequestCount = count() by AppId, UserId, IPAddress, bin(TimeGenerated, 1h) | where RequestCount > 200 | order by RequestCount desc "@ } '3' { $query = @" MicrosoftGraphActivityLogs | where TimeGenerated > ago(30d) | where RequestUri has "/createLink" and RequestMethod == "POST" | project TimeGenerated, AppId, UserId, IPAddress, RequestUri | sort by TimeGenerated desc "@ } '4' { # This query now matches the advanced query from the KQL section $query = @" MicrosoftGraphActivityLogs | where TimeGenerated > ago(7d) | where RequestUri has "/content" or (RequestMethod == "DELETE" and RequestUri has "/drive/items/") | extend IsDownload = iif(RequestUri has "/content", 1, 0), IsDelete = iif(RequestMethod == "DELETE" and RequestUri has "/drive/items/", 1, 0) | summarize DownloadTime = maxif(TimeGenerated, IsDownload == 1), DeletionTime = maxif(TimeGenerated, IsDelete == 1), DownloadedFile = maxif(RequestUri, IsDownload == 1), DeletedFile = maxif(RequestUri, IsDelete == 1) by IPAddress, UserId | where isnotnull(DownloadTime) and isnotnull(DeletionTime) | where abs(datetime_diff('minute', DeletionTime, DownloadTime)) < 60 | project DownloadTime, DeletionTime, UserId, IPAddress, DownloadedFile, DeletedFile "@ } default { Write-Error "Invalid selection. Please run the script again and choose between 1-4." exit } } Write-Host "`nRunning selected Graph API activity query..." try { $result = Invoke-AzOperationalInsightsQuery -WorkspaceId $workspaceId -Query $query -ErrorAction Stop if ($result.Results) { Write-Host "`nQuery successful. Displaying results:" $result.Results | Format-Table -AutoSize Write-Host "`nSaving results to '$csvFilePath'..." $result.Results | Export-Csv -Path $csvFilePath -NoTypeInformation -Encoding UTF8 } else { Write-Host "Query ran successfully but returned no results." } } catch { Write-Error "Query failed: $($_.Exception.Message)" } Write-Host "`nInvestigation complete."
Before running the script, ensure the $workspaceId variable is updated with the actual Log Analytics Workspace ID for your environment. This PowerShell script automates key Microsoft Graph activity queries, making investigations faster and easier. It helps security teams quickly identify suspicious file activity, high-volume downloads, sharing link creation, and download-delete patterns. By simplifying these hunts and exporting results for review, the script improves efficiency and consistency in threat investigations.
In this article, we showed how the Microsoft Graph Activity Log can help uncover document exfiltration by highlighting suspicious behaviors like sudden file downloads, anonymous sharing links, and deletion after access. We also shared practical KQL queries and a PowerShell script to streamline investigations. Next, we’ll dive into a growing concern, OAuth app abuse and show how to use the Graph Activity Log to detect and investigate risky or malicious app activity within your Microsoft 365 environment.