Introduction to EMR
Introduction to EMR

This course provides an introduction to the big data processing service known as Amazon Elastic Map Reduce, commonly referred to as EMR. You will learn the characteristics of the service and its base architecture.

If you have any feedback relating to this course, feel free to contact us at

Learning Objectives

The objectives of this course are to provide a foundational understanding of Amazon Elastic MapReduce, allowing you to learn what it is, some of its characteristics, and its base architecture.

Intended Audience

This course is ideal for those looking to become a data scientist or a solutions architect. Also, if you are studying for the AWS Data Analytics - Specialty certification, then this provides a great insight into EMR before diving deeper on the service.


To get the most from this course, you should have a basic knowledge of the AWS platform. Some understanding of big data processing would also be beneficial.


Hello and welcome to this lecture covering Elastic MapReduce, known as EMR.

Amazon Elastic MapReduce is a managed service designed to process and analyze vast amounts of data through the use of jobs that can be short running with per second costs, or for long-running workloads allowing you to build in high availability into your architecture.

EMR is based on the popular and solid Apache Hadoop framework, an open-source distributed processing framework intended for big data processing. Organizations and companies can gain great benefit in using Amazon EMR because it abstracts and reduces the complexity of the infrastructure layer used with traditional MapReduce frameworks.

The efforts involved in implementing a healthy Hadoop cluster setup are not so trivial. So what AWS did was to encapsulate all the infrastructure of the Hadoop framework into an integrated environment so you can launch a cluster in minutes and focus on the real important part, which is not managing infrastructure but getting your data processed according to your needs.

Amazon EMR securely and reliably handles your data analytic use cases, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. Amazon EMR takes advantage of Amazon EC2 instances that are configured with the Hadoop framework to deliver petabyte-scale processing power. 

Amazon EMR supports a number of other frameworks used within the field of big data and data analytics, these include Spark, Presto, and HBase.  Using the AWS Management Console or AWS CLI, you can quickly and easily create clusters for each of these frameworks. 

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.