Document Fingerprinting


Custom Sensitive Information Types in Microsoft 365
1m 16s

The course is part of this learning path

Start course

In this course, we look at how to create custom sensitive information types with tools like Exact Data Match classification, keyword dictionaries, and document fingerprinting.

Learning Objectives

  • Learn how to create and manage custom sensitive information types
  • Learn how to use the Exact Data Match classifier
  • Learn how to implement Document fingerprinting
  • Learn how to create and utilize a keyword dictionary

Intended Audience

  • This course is designed for anyone looking to keep their data safe within Microsoft 365 using sensitive information types



Document Fingerprinting is a feature that uses document fingerprints to apply DLP policies. Similarly, how each person has unique patterns and fingerprints, each document also has unique word patterns. Effectively, document fingerprinting is used as a way to identify these unique word patterns within documents. Because of how document fingerprinting functions, it is best used with documents containing similar word patterns, which is why it is generally used with frequently used templates within an organization. Think about it like this, an invoice, typically always has a place to input name and address of a customer, or maybe even a credit card number. While that information will be unique and every individual invoice will be different, the words like name, address, and credit card will all be exactly the same within every single invoice.

So, while other sensitive information types would identify a credit card number, document fingerprinting would identify an invoice template as they all hold a similar fingerprint. This sensitive information type differs from others because while features like Exact Data Match search for specific sensitive information, document fingerprinting doesn't differentiate between a template that has information versus a template that doesn't have information. This way, if an organization uses a template, such as an invoice which includes a credit card number, address or more since each invoice follows the same template, regardless of whether or not it holds information, it will be pinged and flagged with document fingerprinting. Now, how document fingerprinting works is it takes a blank template like an invoice and converts it to a Unicode XML file containing a hash value which represents the original text document.

This value that is created is what the Document fingerprinting is referencing. This fingerprint can be assigned to a DLP policy and once associated with the policy it can start protecting your data. As of March 2023, it is only available within exchange online and has a couple of limitations. Document Fingerprinting cannot detect password-protected files, files that contain only images, documents that don't contain all the text from the original form used to create the fingerprint, files greater than 10 megabytes, and they can only store fingerprints within a separate rule pack, up to 150 kilobytes, roughly equating to about 50 fingerprints per tenant. In order to create a document fingerprint, organizations need to utilize PowerShell and use the new DLP Fingerprint and new DLP Sensitive Information Type command-lets. For a step-by-step guide on how to implement a document fingerprinting, I have linked to documentation down below in the course materials section for you to review.

About the Author
Learning Paths

Lee has spent most of his professional career learning as much as he could about PC hardware and software while working as a PC technician with Microsoft. Once covid hit, he moved into a customer training role with the goal to get as many people prepared for remote work as possible using Microsoft 365. Being both Microsoft 365 certified and a self-proclaimed Microsoft Teams expert, Lee continues to expand his knowledge by working through the wide range of Microsoft certifications.