Exact Data Match


Custom Sensitive Information Types in Microsoft 365
1m 16s

The course is part of this learning path

Start course

In this course, we look at how to create custom sensitive information types with tools like Exact Data Match classification, keyword dictionaries, and document fingerprinting.

Learning Objectives

  • Learn how to create and manage custom sensitive information types
  • Learn how to use the Exact Data Match classifier
  • Learn how to implement Document fingerprinting
  • Learn how to create and utilize a keyword dictionary

Intended Audience

  • This course is designed for anyone looking to keep their data safe within Microsoft 365 using sensitive information types



Exact Data Match is a sensitive information type that allows organizations to refer to exact values within data. It can be incredibly helpful when you are experiencing a high number of false positives and is made to be dynamic, allowing for daily changes if needed. But how do we create one? Well, technically, there are two different ways we could do this, as Microsoft has a classic experience that follows through Windows PowerShell, as well as a new experience, which is what we'll be focusing on today. This new experience takes us through a mixture of Microsoft Purview and Windows PowerShell, so let's go ahead and jump into it. With this new experience, setting up an exact data match sensitive information type is a five-step process. 

Step 1 is to export source data for the exact data match based sensitive information type. Step 2 is to create a sample file. Step 3 is to create the EDM sensitive information type. Step 4 is to hash and upload the sensitive information source table for the exact data match sensitive information type. And Step 5 is to then test that exact data match sensitive information type. So, starting off with Step 1, we need to export the source data. Organizations can create and define their own exact data match SIT. However, it's important that you refer to the rules Microsoft provides when doing so. Luckily for me, in many organizations, Microsoft provides sample file templates that we can use. Specifically, we have healthcare documents, financial documents, and insurance documents. So, for this demo, we're going to be using the healthcare data. For specifics about rules when creating your own dataset and downloading the sample datasets, I have linked Microsoft's documentation down below in the course material section. 

But with that healthcare data downloaded, I can move on to Step 2, which is to create the sample file. Since this is purely for demo purposes, we're just going to use the same file we used in Step 1 and quickly move into Step 3, which is to create the EDM sensitive information type. This is where we move into Microsoft Purview and go back to the data classification solution we used in the last lecture. From here, we click on 'Classifiers' and go to EDM Classifiers. Like I mentioned earlier, we are going to be using the new experience for this demo, and as you can see, we have that toggled on. If your page looks different than this, check to ensure that this toggle is properly set. Also, for a quick overview of creating an EDM SIT and the new experience, you can click on the 'Learn the End-to-End Workflow', and it breaks everything down for you into a step-by-step process for you to reference. 

So, if you ever get lost here, you can always refer back to this and the documentation below to help find your way back. But once we are ready, we can click on 'Create EDM Classifier' and start. Like everything, we need a name and description, so I'll just go ahead and put patient records here with a short description and then click 'Next'. This is where we define the schema for the exact data match SIT. We could manually define our structure, like I mentioned, or we could use our sample data. Microsoft gives you an easy download link here, but we already have it set, so I'll go ahead and check on Upload a File and click 'Next'. Now we just upload our sample data, give it a few seconds to analyze the file, and like that, we can now see how it broke down all the data.

Like every sensitive information type, we need a primary element, and you can see that I only have the option for the social security number. The reason for this is that exact data match requires easily identifiable patterns, and social security numbers all follow the exact same format and are rare in comparison to something like a birthday, so it's the perfect candidate for the primary element. So, we'll go ahead and choose social security number and hit 'Next'. Now here, we choose the settings for our data or set them individually by columns. Since it's a demo, I'll simply leave this all the same settings that it currently has for each column, but I could enable and disable case sensitivity and ignore punctuation if I like. But I think these settings look good, so I'll just go ahead and hit 'Next'. Here we can see the elements that are used to detect this sensitive information type with a medium and high confidence level. As you can see, we have all the same data that we also saw in the primary element section. That's because any column that is not used as a primary element within the dataset will be used as a supporting element. Medium confidence uses a single supporting element while high requires two supporting elements, making these both pretty strong determinations of a positive match. I can also adjust the proximity of supporting elements, but I think 300 is a good place to start. If we decide that we don't need to change anything here, we can hit 'Next' and review everything. Here, we simply review the entire sensitive information type, and if everything looks good, simply hit 'Submit', and we'll have created our EDM classifier, and that completes Step 3.

Now, we move on to Step 4, which is admittedly the most complicated step thus far as it requires us to hash and upload the sensitive information source table. Step 4 actually has a few sub-steps of its own where we need to, first, set up a custom security group and user account. Second, set up the EDM Upload Agent tool. And third, then use the EDM Agent tool to hash the sensitive information source table and then upload it. So, let's start by creating that security group. All we need to do is navigate to our Microsoft 365 Admin Center and head to Teams and Groups, then click on 'Active Teams and Groups'. From here, we need to add a security group to the user who will be using the EDM Upload tool. So, we click 'Create a New Group' and just go through this process. Choose security group, name the group EDM_DataUploaders, and give it a description about what it's for, and then follow the prompts to create your group. Once the group is created, we need to add an owner and members to it. So, we click on the security group, choose our EDM_DataUploaders group, and then go to Members. This is where we can add members and owners simply by typing their name out, and once that's completed, we can back out of here, and that's all for this admin center.

Next, we need to download the EDM Upload Agent and connect it to a Microsoft account with the proper permissions that we just gave. And just a reminder that links for all of this can be found in the course materials section below. Once the EDM Upload Agent has been downloaded, we need to connect it to our account. So, we open up our command prompt as an admin in Windows, direct it to our EDM tool directory with this command, and authorize the tool using this command. Once entered, you'll be asked to log into the account that was added to the EDM group we had just created. Once you're logged in, we're ready to upload the data. If you haven't already downloaded your schema from the sensitive information type we created, you can do that using this command. You can see if I navigate to my folder that it downloaded my patient record schema just now. Once we verify this worked, we can now hash and upload our data. We can do that with this command, and finally, once it's all completed, we can check the status of the upload with this command. Once the upload is complete, we can go back to Microsoft Purview Security and Compliance Center right back into our EDM classifiers and see that our patient records EDM match has been created and is complete.


About the Author
Learning Paths

Lee has spent most of his professional career learning as much as he could about PC hardware and software while working as a PC technician with Microsoft. Once covid hit, he moved into a customer training role with the goal to get as many people prepared for remote work as possible using Microsoft 365. Being both Microsoft 365 certified and a self-proclaimed Microsoft Teams expert, Lee continues to expand his knowledge by working through the wide range of Microsoft certifications.