Targeted Collections Set Precise Scope for Content Searches

A reader asked if it was possible to use a content search to export the information from the Recoverable Items structure in user mailboxes. Recoverable Items holds folders like Purges, Versions, SubstrateHolds, DiscoveryHolds, and Deletions used to allow users to recover deleted items (Deletions) and retain copies of items required for retention purposes. The folder also includes the Audits folder holding Exchange mailbox audit items and Calendar Logging, used by the calendar assistant.

The request was unusual but justified because the organization was in the midst of a tenant-to-tenant migration caused by the sale of a business unit. The requirement existed to transfer all the retention data to the target tenant. Apparently, the chosen migration tool didn’t process Recoverable Items.

Before discussing a solution, it’s worth underlining that Recoverable Items only capture copies of deleted or altered items. If items are unchanged, they remain in place in other mailbox folders. Thus, most mailbox data required for retention moves in regular folders.

Export Mailboxes to PSTs

Returning to the solution, it’s possible to export mailbox contents to PSTs using content searches. Because the primary intention for search exports is to accommodate investigations, exports include Recoverable Items. You could therefore export the mailboxes and discard everything except Recoverable Items by opening each PST with Outlook and deleting the unwanted folders. This approach works, but it’s mind-bendingly boring and tiresome.

Targeted Collections

A better approach is to use a technique called targeted collections. This is a method to set a precise scope for a content search against mailboxes or SharePoint Online and OneDrive for Business sites. In other words, you can define the exact mailbox folders or document library folders to search.

It would be nice to set a search scope by specifying simple folder names, but because we’re dealing with computers, some complexity arises. In this case, a content search requires that the search query contains pointers to the folders in the targeted collection. For mailboxes, the links are folder identifiers; for SharePoint Online and OneDrive for Business, they are document links (path to the folder).

All of the above is explained in Microsoft’s documentation (see link above) and doesn’t need to be covered in more detail here. The documentation includes a PowerShell script to demonstrate how to extract folder identifiers and document links. In the remainder of this article, I explain how I adapted the Microsoft script to handle the need to retrieve the contents of the Recoverable Items folders.

Creating a Targeted Collection of Mailbox Folders

In this scenario, we only need to deal with mailbox folders. The basic plan is:

  • Find details of the folders used for retention in the Recoverable Items structure and retrieve the identifiers for those folders. We’ll end up with something like this:
FolderPath                                 FolderQuery
----------                                 -----------
/Deletions                                 folderid:5EF42BB02DCD9F4CAED6E3A2F5480A7D000000DA52150000
/DiscoveryHolds                            folderid:239368965F66C840854E766CEE824F0900014434EA550000
/Purges                                    folderid:5EF42BB02DCD9F4CAED6E3A2F5480A7D000000DA521F0000
/SubstrateHolds                            folderid:37B5390C4C3298448EB307D556E7D40D0002FF60F2BC0000
/Versions                                  folderid:5EF42BB02DCD9F4CAED6E3A2F5480A7D000000DA521A0000
  • Find details of the equivalent folders in the Recoverable Items structure in the archive mailbox (if available) and retrieve the identifiers for those folders.
  • Create a Keyword Query Language (KQL) query containing the folder identifiers retrieved by the script. The KQL query includes folders from both the primary and archive mailbox.
  • Create and start a compliance search with the KQL query.
  • Report the results.

Figure 1 shows the result of the first two steps.

 Finding folder identifiers to use with a targeted collection for a content search
Figure 1: Finding folder identifiers to use with a targeted collection for a content search

After asking what mailbox to search and checking that the input is a valid mailbox, the script uses the Get-ExoMailboxFolderStatistics cmdlet to fetch details of the Recoverable Items folders.

Write-Host ("Checking primary mailbox for {0}" -f $UserAccount)
[array]$Folders = Get-ExoMailboxFolderStatistics -Identity $UserAccount -FolderScope RecoverableItems
If (!($Folders)) { Write-Host ("Unable to retrieve mailbox folder statistics for {0} - exiting" -f $UserAccount) ; break }

The script then loops through the set of returned folders to find the ones that we’re interested in and convert the folder identifier reported by the cmdlet to the form required by KQL. For instance, the Versions folder has an identifier of:

LgAAAADQcplWzm6tQLvNaj1faOqZAQAXnZ+HYqYBRKOW99BpIQz7AAAAAAEXAAAB

The value needed by the KQL query is:

folderid:179D9F8762A60144A396F7D069210CFB0000000001170000

I reused a modified form of the Microsoft code to generate the folder identifiers:

$Encoding = [System.Text.Encoding]::GetEncoding("us-ascii")
$Nibbler = $Encoding.GetBytes("0123456789ABCDEF")
ForEach ($Folder in $Folders)  {
  $FolderPath = $Folder.FolderPath;
  If (($FolderPath -eq "/Versions") -or ($FolderPath -eq "/Deletions") -or ($FolderPath -eq "/Purges") -or ($FolderPath -eq "/DiscoveryHolds") -or ($FolderPath -eq "/SubstrateHolds")) {
   $FolderId = $Folder.FolderId      
   $FolderIdBytes = [Convert]::FromBase64String($folderId)
   $IndexIdBytes = New-Object byte[] 48
   $IndexIdIdx=0
   $FolderIdBytes | Select-object -skip 23 -First 24 | %{$indexIdBytes[$indexIdIdx++]=$nibbler[$_ -shr 4];$indexIdBytes[$indexIdIdx++]=$nibbler[$_ -band 0xF]}
   $FolderQuery = "folderid:$($encoding.GetString($indexIdBytes))";
   $FolderDetails = New-Object PSObject
      Add-Member -InputObject $FolderDetails -MemberType NoteProperty -Name FolderPath -Value $FolderPath
      Add-Member -InputObject $FolderDetails -MemberType NoteProperty -Name FolderQuery -Value $folderQuery
      $FolderQueries += $FolderDetails
  } # End if
} # End Foreach

Creating a Content Search for a Targeted Collection

Before the script can work with content searches, it must connect to the compliance endpoint by running the Connect-IPPSSession cmdlet. To create a content search, we need to know the content (KQL) query and the search locations. The script computes the query, and we can add the target mailbox as the search location. Two points to remember are:

  • When a content search processes a mailbox, it always processes both the primary and archive mailbox.
  • The content search can process the KQL query against all Exchange mailboxes and will find the correct folders. However, it’s always best to be as specific as possible when composing search criteria, which is why I always specify exactly which mailboxes I want the search to process.

The script uses these commands to remove any previous content search, create a content search to process the query, and start the search:

$SearchName = "Focused Mailbox Search"
Remove-ComplianceSearch -Identity $SearchName -Confirm:$False -ErrorAction SilentlyContinue
New-ComplianceSearch -Name $SearchName -ContentMatchQuery $KQLQuery -Description ("Focused folder search for mailbox {0}" -f $UserAccount) -ExchangeLocation $UserAccount
Write-Host "Starting search"
Start-ComplianceSearch -Identity $SearchName
 Do {
       Write-Host ("Waiting for search {0} to comlete..." -f $SearchName)
       Start-Sleep -Seconds 5
       $ComplianceSearch = Get-ComplianceSearch -Identity $SearchName
   } While ($ComplianceSearch.Status -ne 'Completed')

Write-Host ("Search found {0} items in mailbox {1}" -f $ComplianceSearch.Items, $UserAccount)

Figure 2 shows the result of the content search as viewed through the Microsoft Purview compliance portal.

Results of a content search using a targeted collection query
Figure 2: Results of a content search using a targeted collection query

Remember, content searches always start with an estimate search conducted against the search indexes. To retrieve the actual data, you must export the search results. This action forces Purview to conduct a full-fledged search against the target locations and copy found items to an Azure location from where they can be downloaded and exported to a PST.

In addition to content searches, targeted collections work with the searches used by Microsoft Purview eDiscovery (standard and premium) cases.

Not a Problem, After All

As it turned out, the migration product could export and import Recoverable Items data. An overlooked option kicked off the exercise of figuring out how to export Recoverable Items folders from a tenant. This underlines the need to read the documentation and do comprehensive testing of software used in exercises like tenant-to-tenant migrations. But at least it gave me a good opportunity to do a deep-dive into using targeted collections with content searches. You can download the script I used from GitHub.

On Demand Migration

Migrate all your workloads and Active Directory with one comprehensive Office 365 tenant-to-tenant migration solution.

About the Author

Tony Redmond

Tony Redmond has written thousands of articles about Microsoft technology since 1996. He is the lead author for the Office 365 for IT Pros eBook, the only book covering Office 365 that is updated monthly to keep pace with change in the cloud. Apart from contributing to Practical365.com, Tony also writes at Office365itpros.com to support the development of the eBook. He has been a Microsoft MVP since 2004.

Leave a Reply