hands-on lab

Using AWS Glue for ETL Workloads

Beginner

Up to 1h

398

5/5

Start lab

Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.

Learn and validateUse validations to check your solutions every step of the way.

See resultsTrack your knowledge and monitor your progress.

Description

AWS Glue is a serverless data integration offering that you can use to discover, prepare, transfer, and integrate your data. AWS Glue jobs are commonly used for Extract, Transform, and Load (ETL) tasks to support analytics, data migration, and machine learning activities.

Learning how to use AWS Glue jobs will make you more proficient at working with data in the public AWS cloud.

In this hands-on lab, you will examine data to work with, implement an AWS Glue job, and verify the results of an example ETL workload.

Learning objectives

Upon completion of this beginner-level lab, you will be able to:

Use the Amazon S3 and Amazon DynamoDB consoles to view source data
Implement an AWS Glue job using Python and Apache Spark
Run an AWS Glue job with a supplied parameter

Intended audience

Candidates for the AWS Certified Data Engineer Associate certification
Cloud Architects
Data Engineers
DevOps Engineers
Machine Learning Engineers
Software Engineers

Prerequisites

Familiarity with the following will be beneficial but is not required:

AWS Glue
The Python scripting language
Amazon DynamoDB

The following content can be used to fulfill the prerequisites:

Environment before

Environment after

About the author

Andrew Burchill, opens in a new tab

Labs Developer

Students

68,469

Labs

170

Courses

Learning paths

Andrew is a Labs Developer with previous experience in the Internet Service Provider, Audio Streaming, and CryptoCurrency industries. He has also been a DevOps Engineer and enjoys working with CI/CD and Kubernetes.

He holds multiple AWS certifications including Solutions Architect Associate and Professional.