Introduction to EMR
Introduction to EMR
4h 57m

This section of the AWS Certified Solutions Architect - Professional learning path introduces common AWS solution architectures relevant to the AWS Certified Solutions Architect - Professional exam and the services that support them. These services form a core component of running resilient and performant architectures. 

Want more? Try a Lab Playground or do a Lab Challenge!

Learning Objectives

  • Learn how to utilize managed services and serverless architectures to minimize cost
  • Understand how to use AWS services to process streaming data
  • Discover AWS services that support mobile app development
  • Understand when to utilize serverless services within your AWS solutions
  • Learn which AWS services to use when building a decoupled architecture

Hello and welcome to this lecture covering Elastic MapReduce, known as EMR.

Amazon Elastic MapReduce is a managed service designed to process and analyze vast amounts of data through the use of jobs that can be short running with per second costs, or for long-running workloads allowing you to build in high availability into your architecture.

EMR is based on the popular and solid Apache Hadoop framework, an open-source distributed processing framework intended for big data processing. Organizations and companies can gain great benefit in using Amazon EMR because it abstracts and reduces the complexity of the infrastructure layer used with traditional MapReduce frameworks.

The efforts involved in implementing a healthy Hadoop cluster setup are not so trivial. So what AWS did was to encapsulate all the infrastructure of the Hadoop framework into an integrated environment so you can launch a cluster in minutes and focus on the real important part, which is not managing infrastructure but getting your data processed according to your needs.

Amazon EMR securely and reliably handles your data analytic use cases, including log analysis, web indexing, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. Amazon EMR takes advantage of Amazon EC2 instances that are configured with the Hadoop framework to deliver petabyte-scale processing power. 

Amazon EMR supports a number of other frameworks used within the field of big data and data analytics, these include Spark, Presto, and HBase.  Using the AWS Management Console or AWS CLI, you can quickly and easily create clusters for each of these frameworks. 

About the Author
Learning Paths

Danny has over 20 years of IT experience as a software developer, cloud engineer, and technical trainer. After attending a conference on cloud computing in 2009, he knew he wanted to build his career around what was still a very new, emerging technology at the time — and share this transformational knowledge with others. He has spoken to IT professional audiences at local, regional, and national user groups and conferences. He has delivered in-person classroom and virtual training, interactive webinars, and authored video training courses covering many different technologies, including Amazon Web Services. He currently has six active AWS certifications, including certifications at the Professional and Specialty level.