Start course

Data protection is probably the central area of concern regarding system security. The proliferation of online systems means tension between data privacy and usability. The key to a usable but safe data environment is knowing what level of protection needs to be applied to different data, that is, how to classify data. In the past, this has been a predominantly manual and subjective exercise. As data volumes have expanded exponentially, there is a need for automated data classification systems. This course looks at the data classification technologies available through the Microsoft 365 compliance portal.

Learning Objectives

  • Overview document protection and data classification
  • Learn how to create a sensitive information type
  • Learn how to implement Exact Data Matching
  • Learn about trainable classifiers
  • See how to view classified data with Content Explorer

Intended Audience

  • Students working towards the MS-101 Microsoft 365 Mobility and Security exam
  • Those wanting to learn about data classification and how it's implemented in the Microsoft 365 compliance environment



Data classification in Microsoft 365 can be summarized as classifying documents and files according to their contents. Embedding labels in the documents and automated assessment of arbitrarily complex content are the key features of Microsoft's data classification. This functionality falls under the Microsoft purview umbrella and can be accessed through the compliance portal.

Microsoft purview uses sensitive info types and trainable classifiers to enable document labeling at scale. Sensitive info types are an extensive list of predefined information types that primarily use content formatting or keywords to determine the type of content. Using keywords, functions, or regular expressions, you can create your own sensitive info type

Trainable classifiers represent the next step in content assessment complexity using machine learning to assess content within OneDrive, Exchange, and Teams channels. Like sensitive info types, there are predefined classifiers in the form of machine learning models. These models can be adapted and re-trained using custom positive and negative sample data stored in SharePoint folders. As a machine learning model, a trainable classifier is not a trivial piece of technology, and it comes with some caveats. You will need an E5 subscription to access this feature. Microsoft estimates between 7 to 14 days to crawl your files with the predefined trainable classifiers. Training a new classifier is a time and resource-intensive exercise. You will need at least 50 samples of positive seed data for the initial model learning phase, then at least 200, although the more, the better, positive, negative, and ambiguous samples for the training phase.

Content explorer allows you to browse your content by assessed info type. Content explorer list viewer and Content explorer content viewer are the roles required to use content explorer. 

My name is Hallam Webber, and we've been looking at Data Classification with Microsoft 365.

About the Author
Learning Paths

Hallam is a software architect with over 20 years experience across a wide range of industries. He began his software career as a  Delphi/Interbase disciple but changed his allegiance to Microsoft with its deep and broad ecosystem. While Hallam has designed and crafted custom software utilizing web, mobile and desktop technologies, good quality reliable data is the key to a successful solution. The challenge of quickly turning data into useful information for digestion by humans and machines has led Hallam to specialize in database design and process automation. Showing customers how leverage new technology to change and improve their business processes is one of the key drivers keeping Hallam coming back to the keyboard.