Speed Up Processing of Large Numbers of Microsoft 365 Objects

Not being a professional PowerShell guy like Michel de Rooij, I hack merrily away at PowerShell to get stuff done without being too concerned about the finer points of code. Once I learn how to do something, I tend to keep on using that technique, which is why many of the scripts that I write have similarities. I suspect that I’m not the only one whose PowerShell journey has been a succession of learning experiences without the benefit of formal training. In any case, what I do works, and I enjoy grappling with PowerShell very much.

Which brings me neatly to an excellent article about optimizing Microsoft Graph PowerShell scripts by Nicola Suter. This is an article that anyone working with PowerShell in a large organization where it’s common to work with tens of thousands of objects like user accounts or mailboxes should read. What attracted my attention is the discussion about batching requests, or as Microsoft refers to the topic: Combine multiple HTTP requests using JSON batching, a title possibly not designed to attract the attention of people looking for a good read.

Batching HTTP Requests

In any case, the idea is that you can combine up to a maximum of twenty individual Graph requests into a single JSON object. The object is passed using a HTTP POST request to the batch endpoint, which takes care of processing the batches.

To speed things up even further, after preparing the batches, you can leverage the parallel processing capability of PowerShell 7 to submit the batches (Michel de Rooij discusses parallel processing in this article). Each batch uses a PowerShell runspace containing variables, modules, and functions. By default, PowerShell 7 creates five runspaces for parallel execution, so the work in the batches is divided over five runspaces to speed up execution. Obviously, this isn’t something that you would do for scripts that already run acceptably quickly, but it could make a real difference in other circumstances.

A batch looks like this:

Name                           Value
----                           -----
ContentType                    application/json
Uri                            https://graph.microsoft.com/v1.0/$batch
Body                           {…
Method                         Post

The requests that the batch asks the Graph to process is contained in a single JSON object in the body (hence the name “JSON batching.”. Each request has an identifier, method (in this case, GET because we want to retrieve some information), and the URL passed to the Graph.

{
  "requests": [
    {
      "Id": 0,
      "Method": "GET",
      "Url": "users/0a6b8952-baca-4019-bdaf-450536c6ece6?$select=id,displayname,assignedLicenses,country,city,jobtitle,officelocation,userprincipalname,businessphones,employeeid,employeehiredate"
    },
    {
      "Id": 1,
      "Method": "GET",
      "Url": "users/28f205c1-95cd-4d64-b998-e5324b4c032f?$select=id,displayname,assignedLicenses,country,city,jobtitle,officelocation,userprincipalname,businessphones,employeeid,employeehiredate"
    },

If you use a tool like the Graph Explorer to run a request, you need to construct a full URL. For example, https://graph.microsoft.com/V1.0/users/0a6b8952-baca-4019-bdaf-450536c6ece6?$select=id,displayname,assignedLicenses,country,city,jobtitle,officelocation,userprincipalname,businessphones,employeeid,employeehiredate(Figure 1).

Running a query with the Graph Explorer.
Figure 1: Running a query with the Graph Explorer

Upping Parallel Execution

PowerShell supports parallel execution across more than five runspaces. However, this isn’t something to plunge into unless you’re sure that the workstation running PowerShell has sufficient resources to cope with the demand created by parallel processing. More information about parallel PowerShell capabilities is available in this Microsoft blog.

Because individual runspaces need to spin up before they run a batch, their results must be collected in a thread-safe manner. You can’t use an array or normal list for this purpose, but a ConcurrentBag list works and can accept results from the threads as they run. After all the data is collected, it’s possible to convert the ConcurrentBag into a normal array, hash table, or whatever form is needed by a script.

Testing Parallel Batching

Nicola Suter does a good job of explaining how to put together a script to test the effectiveness of parallel batching. To acquaint myself with how things work, I decided to update the test script to do some real work and demonstrate the value of the technique. I therefore created a script (available from GitHub) to do the following:

  • Find all user mailboxes.
  • Create batches to fetch user account information from the Graph list users API based on the external directory object id mailbox property (the link between a mailbox and the owning Entra ID account). The request fetches details such as the licenses assigned to the account and details like the city, country, and employee identifier and hire date.
  • Submit the batches.
  • Retrieve the responses and combine them in the ConcurrentBag.
  • Convert the ConcurrentBag to a hash table for quick keyed access to the user data.
  • Create a report combining the mailbox and user data, including the translation of the SKU identifiers for assigned licenses to product names. To do this, I use the technique explained in the article about creating a user licensing report to create a hash table from product information downloaded from Microsoft.
  • Output a CSV file and display the report through the Out-GridView cmdlet (Figure 2).
A mixture of mailbox and account data created using PowerShell parallel batches
Figure 2: A mixture of mailbox and account data created using PowerShell parallel batches

All in all, everything worked well, and I was pleased at the speed of data retrieval.

Just a Test

I don’t pretend that this example is anything other than a test to demonstrate how to use one kind of Microsoft 365 data to drive the retrieval of other information using parallel batch execution. However, learning through doing is a great way to become acquainted with new techniques and I enjoyed experimenting with batches. Will I remember to use parallel processing in the future – well, that all depends on the situation and whether I remember!

About the Author

Tony Redmond

Tony Redmond has written thousands of articles about Microsoft technology since 1996. He is the lead author for the Office 365 for IT Pros eBook, the only book covering Office 365 that is updated monthly to keep pace with change in the cloud. Apart from contributing to Practical365.com, Tony also writes at Office365itpros.com to support the development of the eBook. He has been a Microsoft MVP since 2004.

Leave a Reply