New Interface and New Capabilities Make It Easier to Manage Sensitive Information Types

Sensitive information types are used by Microsoft 365 components like DLP policies and auto-label (retention) policies to locate data in messages and documents. Earlier in January, Microsoft released a set of new sensitive information types to make it easier to detect country-specific sensitive data like identity cards and driving licenses. Message center notification MC233488 (updated January 23) brings news of two additional changes rolled out in late January and early February.

Confidence Levels for Matching

First, the descriptions of the confidence levels used for levels of match accuracy are changing from being expressed as a percentage (65%, 75%, 85%) to low, medium, and high. Existing policies pick up the new descriptors automatically. This is Microsoft 365 roadmap item 68915.

The change is intended to make it easier for people to better understand the accuracy of matches when using sensitive information types. DLP detects matches in messages and documents by looking for patterns and other forms of evidence. The primary element (like a regular expression) defined in the pattern for a sensitive information type defines how basic matching is performed for that type. Secondary elements (like keyword lists) add evidence to increase the likelihood that a match exists. The more evidence is gathered, the higher the match accuracy and the confidence level increases from low to medium to high. Policy rules use the confidence level to decide what action is necessary when matches are found. For instance, a rule might block documents and email when a high confidence exists that sensitive data is present while just warning (with a policy tip) for lower levels of confidence.

Copying Sensitive Information Types

The second change is that you can copy sensitive information types, including the set of sensitive information types distributed by Microsoft. For instance, let’s assume that you don’t think that the standard definition for a credit card number works as well as it could. You can go to the Data Classification section of the Compliance Center, select the credit card number sensitive information type, and copy it to create a new type (Figure 1). The copy becomes a custom sensitive information type under your control to tweak as you see fit.

Copying the sensitive information type for a credit card number
Figure 1: Copying the sensitive information type for a credit card number

Improved User Interface

The third change is the most important. Figure 1 is an example of a new user interface to manage sensitive information types (Microsoft 365 roadmap item 68916). The new interface is crisper and exposes more information about how information types work. For instance, in Figure 1, we can see that the primary element for credit card number detection is a function to detect a valid credit card number. Further evidence (supporting elements) come from the presence of keywords like a credit card name (for example, Visa or MasterCard) and expiration date near a detected number.

Few are likely to have the desire to tweak the standard sensitive information types. However, being able to examine how Microsoft puts these objects together is instructive and helpful when the time comes to create custom sensitive information types to meet business requirements.

Detecting Azure AD Passwords

For instance, MVP James Cussen points out that Azure AD passwords are not included in the list of sensitive information types. While some people need to send passwords around in email and Teams messages, it’s not the most secure way of transmitting credentials. In this post, he uses the old interface to define a sensitive information type to detect passwords. To test the new interface, I used his definition as the basis for a new custom sensitive information type.

The primary element to match passwords is a regular expression:

((?=[\S]*?[A-Z])(?=[\S]*?[a-z])(?=[\S]*?\d)|(?=[\S]*?[A-Z])(?=[\S]*?[a-z])(?=[\S]*?[^a-zA-Z0-9])|(?=[\S]*?[A-Z])(?=[\S]*?\d)(?=[\S]*?[^a-zA-Z0-9])|(?=[\S]*?[a-z])(?=[\S]*?\d)(?=[\S]*?[^a-zA-Z0-9]))[^\s]{8,256}

A bunch of suggested expressions to detect passwords can be found on the internet. Most fail when input for use with a sensitive information type because they fail Microsoft’s rules to detect illegal or inefficient expressions. Not being a Regex expert, I tried several (all worked when tested against https://regex101.com/), and all failed except the one created by James.

A keyword list is a useful secondary element to add evidence that a password is found. The list contains a comma-separated set of common words that you might expect to find close to a password. For instance:

“Here’s your new password: 515AbcSven!”

“Use this pwd to get into the site ExpertsAreUs33@”

In multilingual tenants, the ideal situation is to include relevant words in different languages in the keyword list. For instance, if the tenant has Dutch and Swedish users, you could include wachtwoord (Dutch) and lösenord (Swedish). To accommodate the reality that people don’t always spell words correctly, consider adding some misspelt variations of keywords. In this instance, we could add keywords like passwrod or pword.

James’s definition allows keywords to be in a window of 300 characters anchored on the detected password (see this Microsoft article to understand how the window works). I think this window is too large and might result in many false positives. The keyword is likely to be close to the password, so I reduced the window to 80 characters.

Figure 2 shows the result after inputting the regular expression, keyword list, confidence level (medium), and character proximity. It’s a less complex definition than for Microsoft’s sensitive information types. The big question is does it work.

Definition for the Azure Active Directory password custom sensitive information type
Figure 2: Definition for the Azure Active Directory password custom sensitive information type

Testing

The Test option allows you to upload a file containing sample text to run against the definition to see if it works. As you can see in Figure 3, the test was successful.

Testing a custom sensitive information type
Figure 3: Testing a custom sensitive information type

Using the Custom Sensitive Information Type in a Teams DLP policy

Testing gives some confidence that the custom sensitive information type will work properly when deployed in a DLP policy. After quickly creating a DLP policy for Teams, we can confirm its effectiveness (Figure 4) in chats and channel conversations.

Passwords blocked in a Teams chat
Figure 4: Passwords blocked in a Teams chat

I deliberately choose Teams as the DLP target because organizations probably don’t want their users swapping passwords in chats and channel conversations. Before rushing to extend the DLP policy to cover email, consider the fact that it’s common to send passwords in email. For instance, I changed the policy to cover email and Teams and discovered that the policy blocks any invitation to Zoom meetings because these invitations include the word “pwd” as in:

https://us02web.zoom.us/j/9355319659?pwd=dExxYVl1N1diS0RiVG1nYmFEWlRjQT09

Although it might be an attractive idea to block Zoom to force people to use Teams online meetings instead, it’s not a realistic option. The simple solution is not to use this DLP policy for email.

False Positives and Policy Overrides

The downside of matching text in messages against keywords defined in a policy is that some false positives can happen. For instance, I have a Flow to import tweets about Office 365 into a team channel. As Figure 5 shows, some tweets are picked up as potential password violations because a keyword appears near a string which could be a valid password.

Tweets posted in Teams are blocked because they match the password definition
Figure 5: Tweets posted in Teams are blocked because they match the password definition

Adjusting the definition for the sensitive information type to reduce the character proximity count (from 80 to 60) reduced the number of false positives. Testing and observation will tell how effective changes like this are when exposed to real-life data.

Apart from adjusting character proximity, two other potential solutions exist. First, amend the DLP policy to allow users to override the block and send the message with a justification reported to the administrator. If the message is important, users will override the policy. The administrator will be notified when overrides occur and tweak the policy (if possible) to avoid further occurrences.

The second option is to exclude accounts (individually or the members of a distribution list) which have a business need to send passwords from the DLP policy. DLP will then ignore messages sent by these individuals.

Creating Custom Sensitive Information Types a Nice to Have

Given the broad range of standard types created by Microsoft, the need to define custom sensitive information types isn’t likely to be a priority for most tenants. However, for those who need this feature for business reasons, the recent changes are both welcome and useful.

About the Author

Tony Redmond

Tony Redmond has written thousands of articles about Microsoft technology since 1996. He is the lead author for the Office 365 for IT Pros eBook, the only book covering Office 365 that is updated monthly to keep pace with change in the cloud. Apart from contributing to Practical365.com, Tony also writes at Office365itpros.com to support the development of the eBook. He has been a Microsoft MVP since 2004.

Comments

  1. Juan Jose de Leon

    The test button only shows for Global Admin?
    I’ve tried delegating many roles and the button doesn’t appear, Azure Roles and Compliance roles.

  2. Roberto Noguera

    There are a sensitive infomation called “Credit Card Number”. You don´t need to create list.

  3. Gareth

    Hi Tony,

    Do Microsoft provide a way to test whether a regex expression meets their standard before adding it to a new sensitive into type, or is the acceptability of the expression validated whilst creating the sensitive info type?

    Thanks

  4. Saravana

    Hi Tony,

    Does creating Sensitive Information Type is going to prevent employees from sharing documents with people outside the company??

    If YES, how does it prevents employees from sharing the documents?

    1. Avatar photo
      Tony Redmond

      A sensitive information type is just a way for Microsoft 365 to identify documents and messages containing instances of sensitive information. It won’t do anything to stop people sharing that information unless you use the SIT in a DLP policy.

  5. Jim E

    Hi Tony. Another great read.

    For some reason I have a mental block around confidence levels and instance counts in a policy.

    I have three sensitive information types created and each one is using a library. One is company name, one is account number, and one is bank account number. For this first phase I’m not looking at exact data match.

    I’m confused about the optimal settings to use for confidence level and instance counts for each of the 3 SITs.

    Can you clarify it’s stupid person 101 how these two options work together? As I understand it the confidence level is how sure DLP is that there is a match. The instance count is “OK, I found a match, based on the number of matches do this”. Do I have that right?

    If that’s the case am I correct that I want a higher confidence level for SITs that are highly sensitive and lower confidence levels on ones that are less sensitive.

    I have read all the articles but for some reason I can’t seem to grasp the simple concept.

    1. Avatar photo
      Tony Redmond

      I typically start with medium confidence level and adjust as necessary after testing. It’s hard to be precise until you see how a sensitive information type works in practice. For instance, the Azure AD password SIT can generate a great number of false positives if it is tuned too high. I think you’re on the right track to have high confidence in the SITs that you can define with a high degree of accuracy and a lower confidence in the ones that are more fudgeable (like the password).

  6. Thomas Guldberg

    Hi Tony
    Do you know of a way to update a “Keyword List” with powershell? I cannot find any commands and we need to update our lists through the GUI, which is not optimal as we need to automate the process.

  7. Pierre HARDY

    Hi,

    Thank you, nice article about DLP new features.
    How do you manage the detection of a sensitive information list ?

    For exemple, in an Excel spreadsheet, there is a column named “Credit Card” and underneath a list of 100 credit card.

    Credit Card
    xxxx-xxxx-xxxx-xxxx
    xxxx-xxxx-xxxx-xxxx
    xxxx-xxxx-xxxx-xxxx
    xxxx-xxxx-xxxx-xxxx
    ..
    ..
    ..
    xxxx-xxxx-xxxx-xxxx

    Configuring a keyword evidence with a proximity of 300 will match only the first Credit cards numbers.(those at 300 character of the keyword “Credit Card”).

    So, how can I detect the others creditcard numbers ?

    Regards

    1. Avatar photo
      Tony Redmond

      I’m not sure what you want to do. The presence of a single credit card number with supporting evidence is enough to trigger a DLP violation. The presence of other credit card numbers adds evidence and might trigger another DLP rule (depending on its configuration). You’ll just have to test to get DLP to do what you want.

Leave a Reply