In this course, users will explore the suite of tools available in Microsoft Purview for registering and scanning data sources, connecting a business glossary, searching the data catalog, and customizing metadata with enrichments and classifications. In addition, this course will review some of the management and administrative functionality in Purview, including creating roles, managing authorizations, and using the Apache Atlas API for custom implementations. This course will also review deployment best practices and network security considerations. By completing this course, users will have a strong understanding of the suite of functionality currently available in Purview and how these tools support a larger governance initiative within an organization.
Learning Objectives
- Provision and install Microsoft Purview
- Create and manage a role
- Register and scan data sources
- Create a business glossary
- Enrich metadata with classifications
- Review data lineage tooling
- Understand deployment best practices
- Take network security considerations into account
Intended Audience
This course is designed for individuals who are responsible for setting up, monitoring, or exploring data catalog and governance programs within their organization.
Prerequisites
To get the most from this course, you should have some familiarity and experience with governance tooling as well as a basic understanding of the Azure portal.
Let's take a look at Microsoft Purview. Microsoft Purview is a unified data governance service that helps our organization manage and govern on-premises, multi-cloud, and software as a service data. Using the platform, we can create a holistic, up-to-date map of the data landscape with automated data discovery, sensitive data classification, and end to end data lineage. With Purview, we can enable data curators to manage and secure the data estate, as well as empower data consumers to find valuable trustworthy data. Microsoft Purview automates data discovery by providing data scanning and classification as a service for assets across our data estate.
Metadata and descriptions of discovered data assets are integrated into a holistic data map. Atop this map, there are purpose-built apps that create environments for data discovery, access management, and insights about our data landscape. The three main attributes of Microsoft Purview include a data map, data catalog, and data insights. Microsoft Purview data map provides the foundation for data discovery and effective data governance. Data map is a cloud native PaaS service that captures metadata about enterprise data present in analytics and operation systems, both on-premises and in the cloud. Data map is automatically kept up to date with built-in automated scanning and classification systems.
Business users can configure and use the data map through an intuitive UI and developers can programmatically interact with the data map using open-source Apache Atlas 2.0 APIs. With the Microsoft Purview data catalog, business and technical users alike can quickly and easily find relevant data using a search experience with filters based on various lenses like glossary terms, classifications, sensitivity labels, and more. For subject matter experts, data stewards and officers, the data catalog provides data curation features like business glossary management and the ability to automate tagging of data assets with glossary terms.
Data consumers and producers can also visually trace the lineage of data assets starting from the operational systems on-premises through movement, transformation, and enrichment with various data storage and processing systems in the cloud, to consumption in an analytics system like Power BI. With the Microsoft Purview data insights, data officers and security officers can get a bird's eye view and at a glance understand what data is actively scanned, where sensitive data is, and how it moves.
Purview help solve a number of discovery challenges for data consumers. Let's look at some of these discovery challenges. Because there's no central location to register data sources, users might be unaware of a data source unless they come into contact with it as part of another process. Unless users know the location of a data source, they can't connect to the data by using a client application. Data consumption experiences require users to know the connection string or path. The intended use of the data is hidden to users unless they know the location of a data source's documentation. Data sources and documentation might live in several places and be consumed through different kinds of experiences.
If users have questions about an information asset, they must locate the expert or team that's responsible for the data and engage them offline. There's no explicit connection between data and the experts that have perspective on its use. Unless users understand the process for requesting access to the data source, discovering the data source and its documentation won't help them access the data. Microsoft Purview provides a cloud-based service into which an organization can register data sources.
Uniquely, the data remains in an existing location, but a copy of its metadata is added to the Microsoft Purview catalog along with a reference to the data source location. The metadata is indexed to make each data source easily discoverable via search and understandable to the users who discover it. We can also enrich this metadata by linking to glossary terms, custom classification, and data stewards and owners. Business users ultimately use the data catalog experience to quickly find data that matches their needs, understand the data to evaluate its fitness for the purpose, and consume the data by opening the data source in their tool of choice.
Steve is an experienced Solutions Architect with over 10 years of experience serving customers in the data and data engineering space. He has a proven track record of delivering solutions across a broad range of business areas that increase overall satisfaction and retention. He has worked across many industries, both public and private, and found many ways to drive the use of data and business intelligence tools to achieve business objectives. He is a persuasive communicator, presenter, and quite effective at building productive working relationships across all levels in the organization based on collegiality, transparency, and trust.