Data protection is probably the central area of concern regarding system security. The proliferation of online systems means tension between data privacy and usability. The key to a usable but safe data environment is knowing what level of protection needs to be applied to different data, that is, how to classify data. In the past, this has been a predominantly manual and subjective exercise. As data volumes have expanded exponentially, there is a need for automated data classification systems. This course looks at the data classification technologies available through the Microsoft 365 compliance portal.
Learning Objectives
- Overview document protection and data classification
- Learn how to create a sensitive information type
- Learn how to implement Exact Data Matching
- Learn about trainable classifiers
- See how to view classified data with Content Explorer
Intended Audience
- Students working towards the MS-101 Microsoft 365 Mobility and Security exam
- Those wanting to learn about data classification and how it's implemented in the Microsoft 365 compliance environment
Prerequisites
- Have taken the Introduction to Sensitivity Labels and A Practical Guide to Sensitivity Labels courses, or have an excellent grasp of sensitivity labels and how they are created and managed within the Microsoft Purview portal
Exact data match sensitive info type, or EDM SIT extends and builds on the sensitivity info type enabling you to protect specific data instances. You can use EDM to protect confidential information relating to specific clients, employees, patients, or the like. Exact data match functionality is available under an E5 subscription and is able to identify or protect up to 100 million unique values.
You start by defining the shape or format of the unique data in the form of a schema. This can be done by example, as in uploading a file of sample data or by manually configuring the schema. A sample file must have a column header row, where the column names must only contain alphanumeric characters, so no spaces or underscores. When you upload the file, any data that matches a sensitive info type will be identified. Alternatively, you can define a data format manually by selecting a sensitive info type. The sensitive info types can be either predefined or ones you've created yourself.
I'll quickly go through the process of adding one column called client id that is formatted as a US bank account. Each EDM sensitive info type needs at least one primary column, which should contain unique values. I'll go with the default column settings and leave my detection confidence level as high. Click next and submit.
Once the EDM data schema has been defined and saved, you need to upload instances of the data you want to protect. Sensitive data is uploaded using the EdmUploadAgent command line tool. Most customers will use the commercial version, but there are government and department of defense versions. The EdmUploadAgent tool hashes the sensitive data as it's uploaded with a built-in salt value, or you can provide your own salt value. Sensitive data, such as patient IDs are seldom static, so you can use the EdmUploadAgent to upload and refresh to a data store up to twice a day.
If you created your EDM schema through the wizard as we have, you must use that schema file with the EdmUploadAgent tool.
Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.