Big Data Storage
In this course, we outline the key storage options for big data solutions. We determine data access and retrieval patterns, and some of the use cases that suit particular data patterns such as evaluating mechanisms for capture, update, and retrieval of catalog entries. We learn how to determine appropriate data structure and storage formats, and how to determine and optimize the operational characteristics of a Big Data storage solution.
- Recognize and explain big data access and retrieval patterns.
- Recognize and explain appropriate data structure and storage formats.
- Recognize and explain the operational characteristics of a Big Data storage solution.
This course is intended for students looking to increase their knowledge of the AWS storage options available for Big Data solutions.
While there are no formal prerequisites for this course, students will benefit from having a basic understanding of cloud storage solutions. Our courses on AWS storage fundamentals and AWS Database Fundamentals will give you a solid foundation for taking this present course.
Amazon Aurora is now MySQL and PostgreSQL-compatible.
Okay. So, before we finish with this module using Amazon Redshift, let's have a quick look at an example architecture from Amazon where we use Redshift as part of the end to end big data services stack. So, in this scenario, what we're looking at is, we're looking at sensor data coming in from remote devices such as pound meters, or cell phone clients, which is streaming through into the DynamoDB environment.
And, it's been streamed through using simple queuing services. Once the data's landed, we may augment it with data from SCADA. So, for example, bring through flow of samples. And, once it's available, we'll then move it using Data Pipeline into the Amazon EMR, or, Elastic Map Reduce engine, where we might go through and run some analytical routines to find some insight on that data.
We then take the results of those analytical routines, and we'll stream them through into Redshift, which will then make it available for all our business intelligence tools, and query tools, and our users being able to query that data. So, what we've seen so far is how Amazon Redshift, being a fast, and fully managed petabyte scale data warehouse service can be used to store and process large volumes of data, and to provide access to this data using your existing business intelligence and query tools.
So, that's the end of the module. I look forward to speaking to you soon.
Shane has been emerged in the world of data, analytics and business intelligence for over 20 years, and for the last few years he has been focusing on how Agile processes and cloud computing technologies can be used to accelerate the delivery of data and content to users.
He is an avid user of the AWS cloud platform to help deliver this capability with increased speed and decreased costs. In fact its often hard to shut him up when he is talking about the innovative solutions that AWS can help you to create, or how cool the latest AWS feature is.
Shane hails from the far end of the earth, Wellington New Zealand, a place famous for Hobbits and Kiwifruit. However your more likely to see him partake of a good long black or an even better craft beer.