AWS NoSQL databases
AWS Relational Databases
*** PLEASE NOTE *** This course has been replaced with two new courses: Database Fundamentals for AWS - Part 1 of 2 and Database Fundamentals - Part 2 of 2.
This course will provide you with an introduction to the cloud database services offered by AWS. In this course, we will first explore the fundamentals of cloud databases, outline the cloud databases provided by AWS before exploring how to get started selecting and using the AWS Database Services.
This course suits anyone interested in learning more about the database services offered by AWS.
The course is an introductory level course so there are no specific database skills or experiences required as a pre-requisite. Having a basic understanding of cloud computing will help you gain the most from this course. I recommend completing “What is cloud computing?” first if you are new to cloud computing.
On completing this course you will have an understanding of the different types of database services available to you within the AWS cloud platform. You will be able to recognize and explain the various database services offered by AWS, and be able to identify and select which database service might suit a specific use case or requirement.
First, we learn to recognize and explain the basics of a cloud database service.
We then learn to recognize and explain the differences between non-relational and relational databases before taking a high-level pass over the family of AWS database services available.
We then dive in the Non Relational Databases - Amazon DynamoDB - Amazon Elasticache - and Amazon Neptune exploring use cases for when we might want to use a non-relational database service.
Next, we dive into amazon RDS - the AWS Relational Database Service, exploring the database services provided by RDS. We then examine the services and their various use cases in the context of a scenario.
The Basics - What is a Cloud Database?
Overview of the AWS Database Services
AWS Non Relational Databases
- Amazon DynamoDB
- Amazon Elasticache
- Amazon Neptune
AWS Relational Database Service
- The RDS Service
- MySQL for RDS
- Microsoft SQL Server for RDS
- Oracle for RDS
- MariaDB for RDS
- PostGresSQL for RDS
- Amazon Aurora for RDS
If you have thoughts or suggestions for this course, please contact Cloud Academy at email@example.com.
22-01-2020: Added note about Amazon Elasticache being used as a cache in front of Amazon RDS services
For additional training on the topics covered in this course, please take a look at the following Cloud Academy content:
- [Instructor] Hello and welcome back. Now there are two families of databases provided as a service by Amazon Web Services. There's relational databases and there's non-relational databases. And the two families of databases suit different use cases so let's ensure we understand the differences between the two first before we delve into the details of either. Now relational databases provide structured tables which generally support SQL operations. SQL stands for Structured Query Language and it is a common language and syntax for writing and retrieving data from a database construct. A database construct could be a table, a view, or a stored procedure, even a function. So relational databases use a schema to define tables within their database and data operations, for example create, replace, update, or delete, can be done using Structured Query Language. So as an example, if I go Select * from Customer.Person, we will return all the rows from the Customer.Person table. Person is the name of the table and customer is the name of the database. So without delving too deep into the Structured Query Language, we are able to select specific rows. For example, if we go Select * from Customer.Person where CustomerFirstName = andrew and CustomerLastName = smith, then we'll return all the values in the table rows from the Customer.Person table where the first name equals Andrew and the last name equals Smith. Now naturally there are maybe more than one Andrew Smith in our customers' table and there are a number of SQL operations we can use when writing SQL statements to help identify distinct records. But let's stay focused on our database and how the relational database works for us. A common use case for our claims processing staff at our insurance company is to find all the details about a customer when logging and researching a claim. So claim processing staff need to quickly identify if they have the right Andrew Smith and so often they need to see the address or the phone number details of all the Andrew Smith records if they're gonna do this.
Now a person may have more than one address so a common database design approach would be to store address details in a separate table in the customer database and relate the customer and address tables together. To select fields from unrelated customer and address tables, we would first need to select the name details from the customer table and then select the correct address values from the address table and we do that based on a common field say customer ID. Now a relational database with its built-in schema and relationships and constraints makes creating this type of relationship between tables very easy. In a relational database, we can relate an address record to a person or customer record using a one to many relationship based on a common integer field so our claims processing staff are really not so concerned by all of these relationships. They just want to see the person's address and post code so they can identify if we have the correct Andrew Smith. So we need to view the results and select the correct user from a combination of both queries and this is where relational databases really come into play. They have features like views and store procedures which allow us to create views on tables which can help speed things up for us. A benefit of the relational database is that we can create both temporary and permanent relationships between tables. Now these relationships make creating, selecting, or updating records from multiple tables a much easier exercise. We can set permanent relationships on tables using constraints and keys. And we can then use the database's native script engine to create and process a view or stored procedure to select the values from a number of tables using an early bound process. What that means in plain English is that it's the fastest way possible to return rows. Now these views or functions can also be stored as permanent functions. In our insurance example, we could create a relationship between the customers' table and the address table and that query might return rows from both the customer and address tables so that's pretty simple. That would give us all of the records we need to make a decision about whether we have the right Andrew Smith. So one of the benefits of the relational database service is that we generally have a native script engine that's built into the database to process queries for us and as a native process within the database engine can return data faster and provide features such as rollback and versioning as well. Now the downside of having all of this in-built functionality is that generally that requires a complex code footprint to run it. Now that generally means that the software footprint of a relational database is usually more complex and larger than the software of a non-relational database. Okay, so keep that in mind. So here is a high-level view of the relational database services provided by AWS. Now we will explore each of these services in more detail later in this course. First of all, let me explain the concept of Amazon RDS. That stands for the Amazon Relational Database Service.
Amazon RDS provides you with a managed service which takes care of the provisioning of the hardware, networking, and database software. So while you retain some control of the configuration, Amazon RDS manages the patching of a database software and the compute platform. Amazon RDS enables you to run a database in multiple availability zones so your database service is highly available. And by default, RDS also manages backing up your database for you. So this service in itself provides you with a managed service for the following database engines, all right? The family is MySQL, which is a Swedish software company founded in 1995 and it was acquired by Sun Microsystems in 2008 and then Sun in turn was acquired by Oracle in 2010. We have the Microsoft SQL Server solution and that's the relational database provided by Microsoft with a number of different licensing options. There's the Oracle family of databases. The Oracle Database is a common platform in corporate environments. There's MariaDB and MariaDB is the community develop fork of the MySQL Relational Database Management System. There's Postgres, an open source database service and arguably the leading open source database service. Amazon Aurora which is Amazon's own fork of MySQL which provides significantly faster processing and availability as it has its own cloud native database engine. So that's the Amazon RDS family. But inside relational databases, there's also Amazon Red Shift. Amazon Red Shift is Amazon's data warehouse solution and it was built as a cloud native application so it provides speed and availability by default as well as a well-priced data warehousing service. So the second family of databases provided by AWS are the non-relational databases. So in comparison to relational databases, non-relational databases provide a simple tabular structure without a processing engine built into the database software. So the key difference with non-relational databases is the lack of a schema and transaction engine. This makes non-relational databases a little lighter, simpler, and perhaps less dependent on native database code. So let's delve deeper into the concept of non-relational databases. You might hear the phrase NoSQL used to describe non-relational databases such as DynamoDB and you wonder what that means. Well, NoSQL can be defined as meaning no SQL. However, it more commonly stands for Not Only SQL. A little confusing so non-relational is usually an easier descriptor than NoSQL. Either way, what it means to us is that NoSQL data constructs have the benefit of not requiring a schema.
A non-relational database can still be accessed and worked with, but in a different way from the Structured Query Language we use to access a relational database. With a relational database, we have a persistent connection to the database and then we use the Structured Query Language to work with the data within it. With a non-relational database, we generally use a RESTful HTTP interface. So before your application can access a database, it must be authenticated to ensure that the application is allowed to use that database and that it needs to be authorized so that the application can only perform actions for which it has those permissions. For example, how we work with DynamoDB is different from how we would work with a relational database like Microsoft SQL Server. With DynamoDB, we use the query action to retrieve data. The DynamoDB query action lets you retrieve data from the physical location of where the data is stored so the syntax and operations are different. We can use the DynamoDB query function with any table that has a composite primary key which is a partition key or a sort key. In DynamoDB, you must use the expression attribute values as place holders in the expression parameters such as the key condition expression and the filter expression.
This is the same as if we were using the bind variables in the relational database where you substitute the actual values into the select statement at runtime. In general, the AWS non-relational databases can scale faster than relational databases. With a non-relational database, you don't need to define a schema for the tables first. So without having to define the schema means changes to a non-relational database can be made faster. Non-relational databases suit non-structured data so they are designed specifically for handling non-structured data types, i.e. videos, images, or data objects that are not uniform in structure. Now this is a very common situation with web-based applications. For our claims processing database, we may need to support or store images uploaded from a customer to support the claim. Now in a relational database, we would need to alter our table schema to enable us to store images as binary or blob fields. Either way, we would need to alter the table schema. In a non-relational database, support for different data types is the default behavior so no change to the schema would be required with a non-relational database. By providing just the data store and keeping any code decoupled from the storage layer, non-relational databases generally require less computing resources and they're simpler in their design which means they tend to be more flexible and scalable. If we have a join on tables in a non-relational database, then we'll generally manage those relationships and joins in our application code. Where that difference would be felt is if we wanted to process logic on data using complex inner or outer joins within a large number of tables or queries. Now that type of processing can become quite complex to manage. Plus, it can take up quite a lot of compute resource where the native engine can actually make things much faster. So if we do have a complex set of tables that we need to regularly join, then that would probably suit a relational database with a native transactional engine. So the non-relational databases provided by AWS are DynamoDB which is a cloud native object and document object store, Amazon Elasticache which is a cache service that runs either the Redis or Memcache.d cache engines. So the differences between relational and non-relational databases before we delve into them in more detail, when should we use one over the other? A non-relational database just stores the data. A relational database stores the data and provides a processing engine. Non-relational databases suit situations where you just need a fast, secure, highly available data store which can manage many different types of objects. Relational databases suit data storage requirements where you have complex relationships between tables that might require a processing engine within the database to manage the processing of queries and updates. Okay, hopefully we're clear on the differences between the non-relational and relational database services. Let's get into looking at the services in detail in the next lectures.
About the Author
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.