Scanning Non-Azure Resources and Using the API
Start course

In this course, users will explore the suite of tools available in Microsoft Purview for registering and scanning data sources, connecting a business glossary, searching the data catalog, and customizing metadata with enrichments and classifications. In addition, this course will review some of the management and administrative functionality in Purview, including creating roles, managing authorizations, and using the Apache Atlas API for custom implementations. This course will also review deployment best practices and network security considerations. By completing this course, users will have a strong understanding of the suite of functionality currently available in Purview and how these tools support a larger governance initiative within an organization.  

Learning Objectives

  • Provision and install Microsoft Purview
  • Create and manage a role
  • Register and scan data sources
  • Create a business glossary
  • Enrich metadata with classifications
  • Review data lineage tooling
  • Understand deployment best practices
  • Take network security considerations into account

Intended Audience

This course is designed for individuals who are responsible for setting up, monitoring, or exploring data catalog and governance programs within their organization.  


To get the most from this course, you should have some familiarity and experience with governance tooling as well as a basic understanding of the Azure portal.


Scanning non-Azure sources and using the API. The Multi-Cloud Scanning Connector for Microsoft Purview allows us to explore our organizational data across cloud providers, including Amazon Web Services in addition to Azure storage services. The Microsoft Purview scanner is deployed in a Microsoft account in AWS. To allow the Microsoft Purview scanner to read our S3 data, we must create a dedicated role in the AWS portal in the Identity Access Management area to be used by the scanner. For other non-Azure sources, Microsoft Purview supports basic authentication - username and password for scanning. One example is Snowflake. 

Here the default role of the given user will be used to perform the scan. The Snowflake user must have usage rights on a warehouse and the databases to be scanned and read access to system tables in order to access advanced metadata. In August 2021, access control and Microsoft Purview moved from the Azure Identity and Access Management control plane to the Microsoft Purview collections data plane. This change gives enterprise data curators and administrators more precise granular access control on their data sources scanned by Microsoft Purview. The change also enables organizations to audit right access and right use of their data. 

Anyone who wants to submit data to Microsoft Purview include Microsoft Purview as part of an automated process or build their own user experience in Microsoft purview can use the REST APIs to do so. For a REST API client to access the catalog, the client must have a service principal application and an identity that the catalog recognizes and is configured to trust. When we make REST API calls to the catalog, they use the service principal's identity. Once service principals are created, we need to assign data plane roles of our Purview account to the service principal created. The Purview REST API is based on similar Apache Atlas APIs. 

The REST API supports bulk loading, custom lineage, custom type definitions, and more from an SDK and Excel templates and integration. The PyApacheAtlas package supports programmatic interaction and an Excel template for low code uploads. The PyApacheAtlas also supports these interactions with the Purview catalog: programmatically creating entities and types, perform partial updates of an entity, extract entities by guid or qualified name, create custom lineage, and working with the glossary.


About the Author

Steve is an experienced Solutions Architect with over 10 years of experience serving customers in the data and data engineering space. He has a proven track record of delivering solutions across a broad range of business areas that increase overall satisfaction and retention. He has worked across many industries, both public and private, and found many ways to drive the use of data and business intelligence tools to achieve business objectives. He is a persuasive communicator, presenter, and quite effective at building productive working relationships across all levels in the organization based on collegiality, transparency, and trust.