hands-on lab

Transforming Data With Apache Spark and Amazon EMR

Beginner
Up to 1h 30m
81
5/5
Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.
Learn and validateUse validations to check your solutions every step of the way.
See resultsTrack your knowledge and monitor your progress.
Lab description

Amazon EMR (formerly known as Amazon Elastic Map Reduce) is a big data platform that supports many popular open-source data processing frameworks, including Apache Spark. Amazon EMR simplifies the configuration, provisioning, and scaling of clusters for data analysis and processing workloads.

Learning how to use Amazon EMR will help anyone looking to understand how to perform big data processing in the real world.

In this hands-on lab, you will tour an Amazon EMR cluster, place data and a script in a location accessible to Amazon EMR, submit a workload to an Amazon EMR cluster, and examine the results.

Please note an Amazon EMR cluster takes approximately ten minutes to create and become usable. Please ensure you have enough time available before starting the lab.

Learning objectives

Upon completion of this beginner-level lab, you will be able to:

  • Understand the configuration of an Amazon EMR cluster
  • Upload a script and data file to an Amazon S3 bucket
  • Submit work to a cluster by adding a step
  • Inspect the results of an Amazon EMR step

Intended audience

  • Candidates for AWS Certified Data Engineer Associate certification
  • Cloud Architects
  • Data Engineers
  • DevOps Engineers
  • Machine Learning Engineers

Prerequisites

Familiarity with the following will be beneficial but is not required:

  • Amazon EMR
  • Amazon Simple Storage Service (S3)
  • The Python scripting language
  • The JavaScript Object Notation (JSON) data format

The following content can be used to fulfill the prerequisites:

Environment before
Environment after
About the author
Students
66,553
Labs
164
Courses
2
Learning paths
4

Andrew is a Labs Developer with previous experience in the Internet Service Provider, Audio Streaming, and CryptoCurrency industries. He has also been a DevOps Engineer and enjoys working with CI/CD and Kubernetes.

He holds multiple AWS certifications including Solutions Architect Associate and Professional.

Covered topics
Lab steps
Logging In to the Amazon Web Services Console
Touring an Amazon EMR Cluster
Uploading Files to Amazon S3
Submitting a Job to an Amazon EMR Cluster
Examining the Results