Retrieve Sensitivity Label Information for SharePoint Online Documents
In July 2022, I wrote about using Graph APIs to create a report of files stored in a SharePoint Online site. It was a fun project, and I learned a lot about navigating SharePoint Online with Graph requests using concepts like drives and drive items. I use the script to generate reports frequently, so I gained something from the exercise.
One thing I couldn’t report was whether the reported files had sensitivity labels. I’ve created scripts in the past to retrieve files with sensitivity labels (this example offers the chance to decrypt files with sensitivity labels using the Unlock-SPOSensitivityLabelEncryptedFile cmdlet) where a request like the one shown below retrieves the necessary information:
$Uri = "https://graph.microsoft.com/v1.0/sites/$($Siteid)/lists/Documents/Drive/Items/$($DriveId)/children?`$select=sensitivitylabel,weburl,name"} [array]$Files = (Invoke-RestMethod -Uri $URI -Headers $Headers -Method Get -ContentType "application/json")
I could have continued along that path to retrieve the sensitivity label information, but I decided to use the new extractSensitivityLabels API. The big difference with this API is that when you use it, the API updates “the metadata of a drive item with the latest details of the assigned label.” As we’ll see, this aspect of the API can have an unexpected side effect.
In addition, the API can handle the extraction of “one or more sensitivity labels assigned to a drive item.” I didn’t know that SharePoint Online documents can have more than one sensitivity label, but apparently, they can if a document passes from one tenant to another and accrues labels from each tenant. However, once a user applies a label with encryption, the document can store just that label.
Changes Made to Retrieve Sensitivity Label Data
To test the API, I added some code to the UnpackFilesRecursively function in the original script (available from GitHub). If you want to try the script out, download the script and insert the following code to replace the original UnpackFilesRecursively function:
Function UnpackFilesRecursively { # Unpack set of items (files and folders) param ( [parameter(Mandatory = $true)] $Items, # Items to unpack [parameter(Mandatory = $true)] $SiteUri, # Base site URI [parameter(Mandatory = $true)] $FolderPath, # Folder path [parameter(Mandatory = $true)] $SiteFiles, [parameter(Mandatory = $false)] [bool]$IsNextLink ) # Sensitivity label document types [array]$ValidDocumentTypes = "docx", "pptx", "xlsx", "pdf" # Find sub-folders that we need to check for files [array]$Folders = $Items.Value | Where-Object {$_.Folder.ChildCount -gt 0 } # And any files in the folder [array]$Files = $Items.Value | Where-Object {$_.Folder.ChildCount -eq $Null} $before = $SiteFiles.count # Report the files ForEach ($D in $Files) { $LabelName = $Null $FileSize = FormatFileSize $D.Size # Check Sensitivity label $Type = $D.Name.Split(".")[1] If ($Type -in $ValidDocumentTypes) { # Write-Host "Processing filename:" $D.Name $Uri = ("https://graph.microsoft.com/beta/sites/{0}/drive/items/{1}/extractSensitivityLabels" -f $Site.Id, $D.id) Try { $LabelsInfo = Invoke-MgGraphRequest -Uri $Uri -Method POST } Catch { Write-Host ("Failure reading data from file {0}" -f $D.Name) $LabelsInfo = $Null } # Resolve sensitivity label identifier if one is found to find label name If ($LabelsInfo.labels.sensitivityLabelId) { # Write-Host "Label Id" $LabelsInfo.labels.sensitivityLabelId $LabelName = $LabelsHash[$LabelsInfo.labels.sensitivityLabelId ] } # End if Label data } # End if type $ReportLine = [PSCustomObject] @{ FileName = $D.Name Folder = $FolderPath Author = $D.createdby.user.displayname Created = $D.createdDateTime Modified = $D.lastModifiedDateTime Size = $FileSize 'Sensitivity Label' = $LabelName Uri = $D.WebUrl Id = $D.Id} $SiteFiles.Add($ReportLine) } # End If $NextLink = $Items."@odata.nextLink" $Uri = $Items."@odata.nextLink" While ($NextLink) { $MoreData = Invoke-MgGraphRequest -Uri $Uri -Method Get UnpackFilesRecursively -Items $MoreData -SiteUri $SiteUri -FolderPath $FolderPath -SiteFiles $SiteFiles -IsNextLink $true $NextLink = $MoreData."@odata.nextLink" $Uri = $MoreData."@odata.nextLink" } # End While $count = $SiteFiles.count - $before if (-Not $IsNextLink) { Write-Host " $FolderPath ($count)" } # Report the files in each sub-folder ForEach ($Folder in $Folders) { $NewFolderPath = $FolderPath + "/" + $Folder.Name $Uri = $SiteUri + "/" + $Folder.parentReference.path + "/" + $Folder.Name + ":/children" $SubFolderData = Invoke-MgGraphRequest -Uri $Uri -Method Get UnpackFilesRecursively -Items $SubFolderData -SiteUri $SiteUri -FolderPath $NewFolderPath -SiteFiles $SiteFiles -IsNextLink $IsNextLink } # End Foreach Folders }
You can see that the script declares the set of file extensions, and it will check for sensitivity labels. I’ve added the extensions for Word, PowerPoint, Excel, and PDF to the array. Over time, as Microsoft Information Protection supports additional file types, the extensions for those files can be added.
Interpreting Sensitivity Label Data
The information returned for a file looks like this:
Name Value ---- ----- tenantId a662313f-14fc-43a2-9a7a-d2e27f4f3478 assignmentMethod standard sensitivityLabelId 1b070e6f-4b3c-4534-95c4-08335a5ca610
Not everyone speaks fluent GUID, so to interpret the sensitivity label identifier to label name, we create a hash table to hold label identifiers and names that the script can look up. This code must be run at the start of the script:
Connect-IPPSSession [Array]$LabelData = Get-Label | Select-Object ImmutableId, DisplayName $Global:LabelsHash = @{} ForEach ($L in $LabelData) {$LabelsHash.Add([string]$L.ImmutableId,[string]$L.DisplayName) }
Everything works very nicely (Figure 1) with the caveat that the script now makes an additional Graph request for every document with a supported file type. This will inevitably slow processing down. I’ve run the script against document libraries holding thousands of files, and the performance wasn’t unacceptable. It can take a second or so to fetch the label information for a very large document (for example, a 1,400-page 38 MB Word document), but smaller files don’t cause a large delay.
Document Mismatches
As noted above, there was an unexpected side effect. Because the API updates the SharePoint site with the latest label metadata, running the script against a large document library caused ten document mismatch notifications to arrive in quick succession. The explanation was simple: every sensitivity label has a priority order. If you put a document with a high-priority label into a site assigned a lower-priority (container management) label, a mismatch occurs. In my case, the priority for sensitivity labels assigned to documents changed since their original assignment, and the documents now had a higher-priority label than the site.
Data Governance Reports
One of the benefits of the Microsoft Syntex-SharePoint Advanced Management license is that administrators can access data governance reports through the SharePoint Online admin center (Figure 2).
The data governance reports are nothing special and are certainly not a good reason to buy the advanced management license. You can do a much better job yourself with PowerShell, including output to (with the PSWriteHTML module) or Excel (with the ImportExcel module). If you’re interested in the Microsoft Syntex-SharePoint Advanced Management license, focus on features like blocking downloads for Teams Meeting recordings. You’ll be happier.
Next Stop, Assigning Sensitivity Labels with a Graph API
Along with an API to retrieve sensitivity label information for SharePoint Online documents, the assignSensitivityLabel Graph API is available to assign sensitivity labels to documents. The problem is that this API is metered and protected. Metered means that you pay to use it through an Azure subscription. Protected means that Microsoft must give its consent for an app to use the API. I’ve applied for permission and prepared Azure to accept charges. Once Microsoft gives the OK, I’ll report back on using the API to assign sensitivity labels.
Very informative post Tony!
Wondering if you know of a way to get the Sensitive Info Type of each labeled files? Any Graph API or Purview API call would work. I wanted to add those information on top of your script. Thanks!
I don’t think there is a Graphi API yet to extract SITs from items.
Amazing post! Thank you so much for providing all these details.
I had an open case with Microsoft, because SharePoint was not displaying the correct sensitivity information for some PDF files. After using the script, the sensitivity information was updated correctly.
Hi Tony, great post, thank you.
I’m trying the assignSensitivityLabel API. I enabled the metered API but I’m still getting the following error:
Error: 422, {“error”:{“code”:”notSupported”,”message”:”Sensitivity label cannot be assigned due to unsupported feature.”,”innerError”:{“code”:”unprocessableEntity”}}}
Did you have any luck testing this API?
I never went back to try the assignSensitivityLabel API. I grew tired of waiting for Microsoft to approve my app (they probably thought it wasn’t a serious app) and other things took precedence…
It appears that the assignSensitivityLabel API is now publicly available as a metered API.
https://learn.microsoft.com/en-us/graph/metered-api-setup?tabs=azurecloudshell
I’ve followed the guide and created the app registration. However, I haven’t quite figured out how to use the API to assign the sensitivity labels yet…
Let us know when you’ve figured things out… I might have another look at this the next time I get some time.
Hi Tony, great article! I’ve noticed that there is a SharePoint property on the ListItem named ‘_IpLabelId’ that seems to contain the Sensitivity Label Id.
Do you know anything about this, and how robust it may be?
Am I also right in thinking there can be multiple sensitivity labels applied to a file, for example from different tenants? My understanding is the extractSensitivityLabel command produces an array of objects.
A document can have multiple sensitivity labels assigned by multiple tenants if those labels do not enforce encryption. Once a label enforces encryption, it becomes the only one assigned to the document.
SharePoint does populate properties with label GUIDs. See https://office365itpros.com/2020/06/30/search-sharepoint-for-sensitivity-labels/
Hi Tony,
I managed to get it all working! 🙂
https://github.com/soundguy5566/assignSenLabel/blob/main/label
Good job!
Hi Tony, I have a challenge here. Is there a way to create a report where I can see the Sensitivity label (internal, public, confidential, etc.) of the files saved in a folder?
Can you help me and provide guidance on this?
Did you look at the script? That’s exactly what it does…