In this course for the Big Data Specialty Certification, we learn how to identify the appropriate data processing technologies needed for big data scenarios. We explore how to design and architect a data processing solution, and explore and define the operational characteristics of big data processing.
Learning objectives
- Recognize and explain how to identify the appropriate data processing technologies needed for big data scenarios.
- Recognize and explain how to design and architect a data processing solution.
Intended audience
This course is intended for students wanting to extend their knowledge of the data processing options available in AWS.
Prerequisites
While there are no formal prerequisites for this course, students will benefit from having a basic understanding of cloud computing services. If you would like to gain a solid foundation in compute fundamentals, then check out our Compute Fundamentals For AWS course.
This Course Includes
75 minutes of high-definition video.
What You'll Learn
- Course Intro: What to expect from this course
- Amazon Elastic MapReduce Overview: In this lesson, we discuss how EMR allows you to store and process data
- Amazon Elastic MapReduce Architecture: In this lesson, you’ll learn about EMR’s clustered architecture.
- Amazon Elastic MapReduce in Detail: In this lesson, we’ll dig deeper into EMR storage options, resource management, and processing options.
- Amazon Elastic MapReduce Reference Architecture: Best practices for using EMR.
- Amazon Lambda Introduction: This lesson will kick off our discussion of Lambda and how it’s used in Big Data scenarios.
- Amazon Lambda Overview: This lesson discusses how Lambda allows you to run code for virtually any type of application or backend service with no administration.
- AWS Lambda Architecture: In this lesson, we’ll discuss generic Lambda architecture and Amazon’s serverless service.
- AWS Lambda in Detail: In this lesson, we’ll dig into Events and Service Limits.
- AWS Lambda Reference Architecture: In this lesson, we'll look at a real-life scenario of how lambda can be used.
Okay, so as we come to the end of this module on Amazon EMR, let's have a quick look at an example reference architecture from AWS, where Amazon MapReduce can be used. If we look at this scenario, what we're looking at is sensor data being streamed from devices such as power meters, or cellphones, through using Amazon's simple queuing services into a DynamoDB database.
From there, we could stream the data using AWS data pipeline into Amazon EMR, where we can go through and use MapReduce, or Tez, or Spark, or Presto, or the Hbase processing engine, to run some analytical routines and find some insight into that data. And from there, we could either query that data using our BI tool via SQL queries going through Hive, or we could move the data out of EMR into Amazon Redshift and store it in a relational model, and allow the native capability via ODBC or JDBC, or the BI tools that go through and query. So that brings us to the end of the Amazon ER module. I look forward to speaking with you again.
Shane has been emerged in the world of data, analytics and business intelligence for over 20 years, and for the last few years he has been focusing on how Agile processes and cloud computing technologies can be used to accelerate the delivery of data and content to users.
He is an avid user of the AWS cloud platform to help deliver this capability with increased speed and decreased costs. In fact its often hard to shut him up when he is talking about the innovative solutions that AWS can help you to create, or how cool the latest AWS feature is.
Shane hails from the far end of the earth, Wellington New Zealand, a place famous for Hobbits and Kiwifruit. However your more likely to see him partake of a good long black or an even better craft beer.