hands-on lab

Getting Started with Amazon Elastic MapReduce

Intermediate

Up to 1h 45m

3,609

4.4/5

Start lab

Get guided in a real environmentPractice with a step-by-step scenario in a real, provisioned environment.

Learn and validateUse validations to check your solutions every step of the way.

See resultsTrack your knowledge and monitor your progress.

Lab description

Amazon Elastic MapReduce (Amazon EMR) makes it easy to process vast amounts of data in a variety of applications, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. Amazon EMR uses Hadoop, an open-source framework, to distribute raw data and processing across a resizable cluster of Amazon EC2 instances.

Hadoop uses a distributed processing architecture called MapReduce in which a task is mapped to a set of servers for processing. The results of the computation performed by those servers are then reduced to a single output data set.

A high-level view of the EMR workflow is as follows:

Load the input dataset
Execute a Map-Reduce job
Store the job results in HDFS
View the job results from HDFS

The focus of this lab is configuring and launching an EMR cluster. You will be provided with sample input data sets and sample applications to process the data sets. Treating the application and data set as a "black box" will lift unneeded complexities and free you up to concentrate on the configuration component.

Please note that this lab involves creating a new Amazon EMR cluster which typically takes approximately ten minutes. Please ensure you have enough time available before starting the lab.

Learning objectives

Upon completion of this lab, you will be able to:

Configure and launch a cluster in two different launch modes
Submit tasks for your cluster to process
Check the status of your cluster and the tasks it processes
Terminate, clone, reconfigure, and launch a cluster
View logs and results

Intended audience

Cloud Architects
Data Engineers
DevOps Engineers
Machine Learning Engineers

Prerequisites

Familiarity with the following will be beneficial but is not required:

Amazon EMR
Amazon Simple Storage Service (S3)
Amazon Elastic Compute Cloud (EC2)

The following content can be used to fulfill the prerequisites:

Lab environment

After completing the lab instructions the environment should look similar to:

Updates

November 29th, 2023 - Updated screenshots to reflect the latest user interface and updated the lab structure for clarity

July 26th, 2023 - Addressed user ban issue and added warning

March 30th, 2023 - Updated the instructions and screenshots to reflect the latest UI

December 27th, 2022 - Updated the instructions and screenshots to reflect the latest UI

September 13th, 2022 - Updated the instructions and screenshots to reflect the latest UI

December 13th, 2021 - Adjusted the allowed bandwidth for the lab to account for increased network usage by EMR

January 10th, 2019 - Added a validation Lab Step to check the work you perform in the Lab

About the author

Greg DeRenne, opens in a new tab

Lab Research dev

Students

79,295

Labs

Greg has been a consistent high performer for pioneering technologies in the wireless web industries with an illustrious career that is a testament to his breadth of knowledge. Dabbling with MS Azure, at Cloud Academy, Greg really thrives on evangelizing the benefits of Amazon Web Services. A dedicated and passionate professional who learns new and emerging technologies quickly, Greg always ensures the highest quality and caliber of everything he produces.

Covered topics

Amazon Elastic Map Reduce (EMR)

Amazon S3

Lab steps

Logging In to the Amazon Web Services Console

Creating an Amazon S3 Bucket for Amazon EMR

Creating an Amazon EMR Cluster