1. Home
  2. Training Library
  3. Management (SAA-C03)

AWS Glue Data Catalog Primer

Contents

keyboard_tab
SAA-C03 Introduction
Amazon CloudWatch
AWS CloudTrail
AWS Control Tower
10
AWS Control Tower
PREVIEW19m 56s
AWS Management
12
AWS Service Catalog
PREVIEW10m 34s
Cost Management
25
26
Credits
PREVIEW1m 52s
27
28
Reports
PREVIEW1m 30s
29
30
Budgets
6m 51s
Improve Planning and Cost Control with AWS Budgets
AWS Cost Management: Tagging
36
Tagging
PREVIEW6m 51s
Data Visualization
Finding Compliance Data with AWS Artifact
AWS CloudFormation
SAA-C03 Review

Instructor: Alana Layton

AWS Glue Data Catalog Primer

AWS Glue historically was only an ETL service. Since then, the service has turned into a suite of data integration tools. Now, AWS Glue is made up of four different services: 

  1. Glue Data Catalog
  2. Glue Studio
  3. Glue DataBrew, and 
  4. Glue Elastic Views. Glue Elastic Views is out of scope for this content, so I won’t be talking about it in this lecture. If you’re interested in Glue Elastic Views, I will link a course specifically for that topic. 

In this lecture, I’ll mainly focus on the Glue Data Catalog aspect of this service.   

AWS Glue Data Catalog

AWS defines the Glue Data Catalog as a central metadata repository. This means that it stores data about your data. This includes information like data format, data location, and schema. Here’s how it works: 

You upload your data to storage like Amazon S3, or a database like Amazon DynamoDB, Amazon Redshift, or Amazon RDS. From there, you can use a Glue Crawler to connect to your data source, parse through your data, and then infer the column name and data type for all of your data. The Crawler does this by using Classifiers, which actually read the data from your storage. You can use built-in Classifiers or custom Classifiers you write to identify your schema. 

Once it infers the schema, it will create a new catalog table with information about the schema, the metadata, and where the source data is stored.  You can have many tables filled with schema data from multiple sources. These tables are housed in what’s called a database. 

Note, that your data still lives in the location where you originally uploaded it, but now you also have a representation of the schema and metadata for that data in the catalog tables. This means your code doesn’t necessarily need to know where the data is stored, and can reference the Data Catalog for this information instead. 

That’s it for this one. See you soon!

Difficulty
Beginner
Duration
4h 53m
Students
600
Ratings
4.4/5
starstarstarstarstar-half
Description

This section provides detail on the AWS management services relevant to the Solution Architect Associate exam. These services are used to help you audit, monitor and evaluate your AWS infrastructure and resources.  These management services form a core component of running resilient and performant architectures. 

Want more? Try a lab playground or do a Lab Challenge!

Learning Objectives

  • Understand the benefits of using AWS CloudWatch and audit logs to manage your infrastructure
  • Learn how to record and track API requests using AWS CloudTrail
  • Learn what AWS Config is and its components
  • Manage your accounts with AWS Organizations, including single sign-on with AWS SSO
  • Learn how to carry out logging with CloudWatch, CloudTrail, CloudFront, and VPC Flow Logs
  • Understand how to design cost-optimized architectures in AWS
  • Learn about AWS data transformation tools such as AWS Glue and data visualization services like Amazon Athena and QuickSight
About the Author
Students
213346
Labs
1
Courses
213
Learning Paths
171

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.