Big Data Storage
The course is part of this learning path
Course two of the Big Data Specialty learning path focuses on storage. In this course, we outline the key storage options for big data solutions. We determine data access and retrieval patterns, and some of the use cases that suit particular data patterns such as evaluating mechanisms for capture, update, and retrieval of catalog entries. We learn how to determine appropriate data structure and storage formats, and how to determine and optimize the operational characteristics of a Big Data storage solution.
Amazon Aurora is now MySQL and PostgreSQL-compatible.
- Recognize and explain big data access and retrieval patterns.
- Recognize and explain appropriate data structure and storage formats.
- Recognize and explain the operational characteristics of a Big Data storage solution.
This course is intended for students looking to increase their knowledge of the AWS storage options available for Big Data solutions.
While there are no formal prerequisites for this course, students will benefit from having a basic understanding of cloud storage solutions. Our courses on AWS storage fundamentals and AWS database fundamentals will give you a solid foundation for taking this present course.
This Course Includes
- Over 90 minutes of high-definition video.
- Real-Life Scenarios using AWS Reference Architecture
What You'll Learn
- Course Intro: What to expect from this course.
- Amazon DynamoDB: How you can use Amazon DynamoDB in Big Data scenarios.
- Amazon DynamoDB Reference Architecture: A real-life model using DynamoDB
- Amazon Relational Database Service: A look at how Amazon RDS works and how you can use it in Big Data scenarios.
- Amazon Relational Database Service Reference Architecture: A real-life model using RDS.
- Amazon Redshift: An overview of Amazon Redshift works and how you can use it in Big Data scenarios.
- Amazon Redshift Reference Architecture: A real-life model using Redshift.
So in this scenario, what we're doing is we're running a number of web servers on the Amazon EC2 instances, and we're using Amazon CloudFront as part of the content delivery network. The log files from this are being streamed using the Amazon simple storage service through to some S3 buckets. And from there we're loading those log files into the Elastic MapReduce cluster so that we can do some analytics on user behavior as they are surfing through the website.
We take the results of that analysis and we load it into Amazon RDS, which enables a number of different inline query tools to be used by the analyst to understand and visualize the user behavior as they went through the website. That brings us to the end of this module on Amazon RDS.
As we've seen, Amazon RDS allows us to easily set up, operate and scale a relational database in the cloud. It's primarily designed to be used as a transactional database underpinning applications, but it also provides a flexible and managed repository for reporting data where storage volumes are below six terabytes. And so that's the end of the Amazon RDS module. I look forward to speaking to you soon.
Shane has been emerged in the world of data, analytics and business intelligence for over 20 years, and for the last few years he has been focusing on how Agile processes and cloud computing technologies can be used to accelerate the delivery of data and content to users.
He is an avid user of the AWS cloud platform to help deliver this capability with increased speed and decreased costs. In fact its often hard to shut him up when he is talking about the innovative solutions that AWS can help you to create, or how cool the latest AWS feature is.
Shane hails from the far end of the earth, Wellington New Zealand, a place famous for Hobbits and Kiwifruit. However your more likely to see him partake of a good long black or an even better craft beer.