AWS NoSQL databases
AWS Relational Databases
*** PLEASE NOTE *** This course has been replaced with two new courses: Database Fundamentals for AWS - Part 1 of 2 and Database Fundamentals - Part 2 of 2.
This course will provide you with an introduction to the cloud database services offered by AWS. In this course, we will first explore the fundamentals of cloud databases, outline the cloud databases provided by AWS before exploring how to get started selecting and using the AWS Database Services.
This course suits anyone interested in learning more about the database services offered by AWS.
The course is an introductory level course so there are no specific database skills or experiences required as a pre-requisite. Having a basic understanding of cloud computing will help you gain the most from this course. I recommend completing “What is cloud computing?” first if you are new to cloud computing.
On completing this course you will have an understanding of the different types of database services available to you within the AWS cloud platform. You will be able to recognize and explain the various database services offered by AWS, and be able to identify and select which database service might suit a specific use case or requirement.
First, we learn to recognize and explain the basics of a cloud database service.
We then learn to recognize and explain the differences between non-relational and relational databases before taking a high-level pass over the family of AWS database services available.
We then dive in the Non Relational Databases - Amazon DynamoDB - Amazon Elasticache - and Amazon Neptune exploring use cases for when we might want to use a non-relational database service.
Next, we dive into amazon RDS - the AWS Relational Database Service, exploring the database services provided by RDS. We then examine the services and their various use cases in the context of a scenario.
The Basics - What is a Cloud Database?
Overview of the AWS Database Services
AWS Non Relational Databases
- Amazon DynamoDB
- Amazon Elasticache
- Amazon Neptune
AWS Relational Database Service
- The RDS Service
- MySQL for RDS
- Microsoft SQL Server for RDS
- Oracle for RDS
- MariaDB for RDS
- PostGresSQL for RDS
- Amazon Aurora for RDS
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
22-01-2020: Added note about Amazon Elasticache being used as a cache in front of Amazon RDS services
For additional training on the topics covered in this course, please take a look at the following Cloud Academy content:
Hi, and welcome back. Let's practice matching requirements to database services to help us build our understanding of the AWS Cloud databases, refreshing that we have two families of database: the non-relational databases and the relational databases. In the non-relational family we have Amazon DynamoDB, Amazon ElastiCache, and Amazon Neptune. And then in relational databases, we've got support for a number of different database engines in the RDS platform.
So the question is often, which one suits which use case, and how do we decide which one to use? You remember our scenario was that we were creating a prototype claims management system, and our IT department suggested or encouraged us to use the Microsoft SQL Server database because that's what everybody else was using in the business. So that is a great fit for our scenario. We also need to be asking though, "Is that the right service for these particular requirements?" You need to be constantly challenging design assumptions to ensure that the services best meet the requirements, and create the very best outcome for your end user.
Now, Amazon Relational Database Service, or RDS, enables us to implement and start using Microsoft SQL Server without any upfront costs from within the console. We can choose the version of SQL Server that we want, we can scale the database across multiple availability zones to be highly available if required, And we only pay for the time that we use the service.
For our scenario, we need a database to store information collected from web forms submitted by customers or from data entered by our claims processing staff. The data relationships will not be the priority for working in the forms. However, we will want to create views to help staff search for users and customers and to update information from a web form. So some type of relationship between tables will be useful, and it's likely to reduce the amount of development time required to build or modify the front-end application forms.
Now, arguably both Microsoft SQL Server and Amazon DynamoDB could handle the type of records we would be storing and working with; both provide a way to store records. In fact, DynamoDB enables protection at rest as a plus, so it does encryption at rest, but so does Microsoft SQL Server. We will need to create relationships between tables to present data back to the application and so to our end users. Now are those relationships so much of a constraint to mean it could become an overhead to manage those relationships within our application layout? That is probably our key question at this point. Do we want to handle that type of structured query in the database or at the code layout? Do we need the full processing power of a relational database, or is the priority to have flexibility and scalability? And where does cost sit in this equation?
With a finite development resource available to us, it would make sense for us to favor the use of a relational database. How many table relationships will we need to support for the application to save and return data records back to our claims processing staff? And how many native processing requirements do we have? DynamoDB provides reliable performance and its automatic scaling of throughput capacity means it can handle burst activity requirements which are common with internet services like the one that we're creating.
Now, imagine we have a travel alert page within our claims management application. People check this travel alert page frequently to see the state of weather before booking or commencing their travel plans. If there is a storm set to hit the east coast and many of our customers check this update page to see if their travel insurance would include storm cover during the storm, for example, over time, we might experience some slowing of this business app because when there's a storm and the travel update page is being visited frequently, there's a lot of load on the database and on the application. The issue we have is a high number of read requests made to the database that's starting to impact the speed of writes to the database. So the database is becoming slowed down by a lot of people checking the one or two fields frequently at high volumes. We didn't envisage this would be an issue when we designed the claims management service. At design time, it made sense to store the text and images for this travel update page in our database as a BLOB field. Now, it turns out that the claims management staff like to load a lot of images of the weather patterns and storm conditions to this update page to best inform customers. Now, as a result, the amount of content to present for each page visit is exceeding 500 kilobytes, which is taking time to render back to the browser and so is slowing other database requests.
We have a number of data options to tackle such an issue. The first and most practical would be to place the images in an object store such as Amazon S3, and reduce the need for using a binary field in the database. With images stored in Amazon S3, there'll be less load on the database, that is the first optimization we should make. However, even after that improvement, there may still be a general slowing of the system under load, which suggests more can be done.
Now, the volume of read requests made to the system is seen as the main issue, our default position here with a traditional database stored in our server room or data center might be vertical scaling, i.e. getting a bigger server. Now with a cloud database service, it's far easier for us to scale the IOPS, the memory or the machine size of our database instances. So a benefit of using the AWS service is the ability to redesign services to best meet requirements. We could implement a read replica of our Microsoft SQL Server database to handle some of the read traffic. Now, on paper, this appears to make sense. However, in practice, there is not an easy way to create a read replica for a Microsoft SQL Server database and the claims staff are updating the pages frequently, so there is quite a lot changes to manage between those two replicas if we had one.
An easier way to reduce the load on our claims database would be to implement a cache between the database and the claims application. A cache is ideal for holding frequently requested data, so the web application does not need to read those components from our permanent data store. A cache will generally hold data for a finite period of time. If a record is changed, the cache will compare, flush and store the latest version of a record.
So using Amazon ElastiCache, we can easily provision and implement ElastiCache to sit between our main database and our web application. Read requests that are made over and over will be stored temporarily within the ElastiCache database. ElastiCache will respond and send that frequently requested data to the web application front end, meaning our main database does not receive so many requests.
So the design I would use for this scenario would be to use a relational database for the policy customer holder records, simply because most of those will be stored in a relational model already. So we're likely to have to import them from existing databases, so reporting them into a relational database would make more sense than I think having to re-engineer them to set them into a non-relational database.
Now, in terms of the relational database that I'd use, I would go for Postgres, because Postgres gives me the encryption that I need, so that we are storing personal data, so I would want to have some headway to be able to achieve compliance to guarantee that the records are stored in the most secure way possible. My second choice would be to go for the Aurora platform because Amazon Aurora just has so much durability and speed with its high availability, it would just be another great choice. And both of those two in my view would be more operationally efficient and cost-efficient than using the Microsoft SQL Server solution that's been proposed by the IT staff. I don't think the fact that it's been used by other parts of the business is enough of a business driver for us to choose Microsoft SQL Server over these other two databases.
So in terms of how I'd manage images and things outside of the policy, then I would use DynamoDB to store a key, and I would store any of the images or submitted information like weather reports or pictures in Amazon S3, with the link stored back in my DynamoDB table. Now, if for any reason we had to provide really high levels of burst activity, if there was an incident for example, if there was a major storm, and we had a million or so requests all submitted within a five minute period, we'd need to ensure that the system was going to be able to handle that. So I would put in front ElastiCache in between my permanent storage database and my web front end. So that ElastiCache could offload some of the read requests if we did have a lot of requests coming suddenly that needed to be answered quickly.
Okay, that brings our design hypothesis brainstorm to a close, I will see you in the next lecture.
About the Author
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.