The course is part of these learning paths
Big Data Storage
Course two of the Big Data Specialty learning path focuses on storage. In this course we outline the key storage options for big data solutions. We determine data access and retrieval patterns, and some of the use cases that suit particular data patterns such as evaluating mechanisms for capture, update, and retrieval of catalog entries. We learn how to determine appropriate data structure and storage formats, and how to determine and optimize the operational characteristics of a Big Data storage solution.
This course is intended for students looking to increase their knowledge of the AWS storage options available for Big Data solutions. Pre-requisites - While there are no formal pre-requisites students will benefit from having a basic understanding of cloud storage solutions. Recommended courses - Storage Fundamentals , Database Fundamentals
• Recognize and explain big data access and retrieval patterns.
• Recognize and explain appropriate data structure and storage formats.
• Recognize and explain the operational characteristics of a Big Data storage solution.
This Course Includes:
Over 90 minutes of high-definition video.
Real-Life Scenarios using AWS Reference Architecture
What You'll Learn:
- Course Intro: What to expect from this course.
- Amazon DynamoDB: How you can use Amazon DynamoDB in Big Data scenarios.
- Amazon DynamoDB Reference Architecture: A real-life model using DynamoDB
- Amazon Relational Database Service: A look at how Amazon RDS works and how you can use it in Big Data scenarios.
- Amazon Relational Database Service Reference Architecture: A real-life model using RDS.
- Amazon Redshift: An overview of Amazon Redshift works and how you can use it in Big Data scenarios.
- Amazon Redshift Reference Architecture: A real-life model using Redshift.
Okay. So, before we finish with this module using Amazon Redshift, let's have a quick look at an example architecture from Amazon where we use Redshift as part of the end to end big data services stack. So, in this scenario, what we're looking at is, we're looking at sensor data coming in from remote devices such as pound meters, or cell phone clients, which is streaming through into the DynamoDB environment.
And, it's been streamed through using simple queuing services. Once the data's landed, we may augment it with data from SCADA. So, for example, bring through flow of samples. And, once it's available, we'll then move it using Data Pipeline into the Amazon EMR, or, Elastic Map Reduce engine, where we might go through and run some analytical routines to find some insight on that data.
We then take the results of those analytical routines, and we'll stream them through into Redshift, which will then make it available for all our business intelligence tools, and query tools, and our users being able to query that data. So, what we've seen so far is how Amazon Redshift, being a fast, and fully managed petabyte scale data warehouse service can be used to store and process large volumes of data, and to provide access to this data using your existing business intelligence and query tools.
So, that's the end of the module. I look forward to speaking to you soon.
About the Author
Shane has been emerged in the world of data, analytics and business intelligence for over 20 years, and for the last few years he has been focusing on how Agile processes and cloud computing technologies can be used to accelerate the delivery of data and content to users.
He is an avid user of the AWS cloud platform to help deliver this capability with increased speed and decreased costs. In fact its often hard to shut him up when he is talking about the innovative solutions that AWS can help you to create, or how cool the latest AWS feature is.
Shane hails from the far end of the earth, Wellington New Zealand, a place famous for Hobbits and Kiwifruit. However your more likely to see him partake of a good long black or an even better craft beer.