AWS Glue Data Catalog Primer


Course Introduction
Amazon CloudWatch
AWS Config
What is AWS Config?
AWS Control Tower
AWS Control Tower
PREVIEW19m 56s
AWS Resource Access Manager
AWS Management
AWS Service Catalog
PREVIEW10m 34s
AWS Trusted Advisor Best Practices
AWS Health Dashboard
AWS Data Visualization
Finding Compliance Data with AWS Artifact
Observability in AWS
AWS Glue Data Catalog Primer
6h 2m

This section of the AWS Certified Solutions Architect - Professional learning path introduces the AWS management and governance services relevant to the AWS Certified Solutions Architect - Professional exam. These services are used to help you audit, monitor, and evaluate your AWS infrastructure and resources and form a core component of resilient and performant architectures. 

Want more? Try a Lab Playground or do a Lab Challenge!

Learning Objectives

  • Understand the benefits of using AWS CloudWatch and audit logs to manage your infrastructure
  • Learn how to record and track API requests using AWS CloudTrail
  • Learn what AWS Config is and its components
  • Manage multi-account environments with AWS Organizations and Control Tower
  • Learn how to carry out logging with CloudWatch, CloudTrail, CloudFront, and VPC Flow Logs
  • Learn about AWS data transformation tools such as AWS Glue and data visualization services like Amazon Athena and QuickSight
  • Learn how AWS CloudFormation can be used to represent your infrastructure as code (IaC)
  • Understand SLAs in AWS

AWS Glue historically was only an ETL service. Since then, the service has turned into a suite of data integration tools. Now, AWS Glue is made up of four different services: 

  1. Glue Data Catalog

  2. Glue Studio

  3. Glue DataBrew, and 

  4. Glue Elastic Views. Glue Elastic Views is out of scope for this content, so I won’t be talking about it in this lecture. If you’re interested in Glue Elastic Views, I will link a course specifically for that topic. 

In this lecture, I’ll mainly focus on the Glue Data Catalog aspect of this service.   

AWS defines the Glue Data Catalog as a central metadata repository. This means that it stores data about your data. This includes information like data format, data location, and schema. Here’s how it works: 

You upload your data to storage like Amazon S3, or a database like Amazon DynamoDB, Amazon Redshift, or Amazon RDS. From there, you can use a Glue Crawler to connect to your data source, parse through your data, and then infer the column name and data type for all of your data. The Crawler does this by using Classifiers, which actually read the data from your storage. You can use built-in Classifiers or custom Classifiers you write to identify your schema. 

Once it infers the schema, it will create a new catalog table with information about the schema, the metadata, and where the source data is stored.  You can have many tables filled with schema data from multiple sources. These tables are housed in what’s called a database. 

Note, that your data still lives in the location where you originally uploaded it, but now you also have a representation of the schema and metadata for that data in the catalog tables. This means your code doesn’t necessarily need to know where the data is stored and can reference the Data Catalog for this information instead. 

That’s it for this one. See you soon! 

About the Author
Learning Paths

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.