Differences Between AWS Database Types
The course is part of these learning paths
This course provides a high-level overview of the managed database offerings available from AWS.
It covers relational and non-relational databases, how they work, their strengths, and what workloads are best suited for them.
It includes an overview and the characteristics of non-relational databases as well as what NoSQL means and why it’s important to application development.
If you have any feedback relating to this course, feel free to contact us at firstname.lastname@example.org.
- Gain a general understanding of databases within AWS
- Learn about non-relational databases and how they can be used
- Understand the characteristics of NoSQL databases and how they can be used for application development
- Gain a comprehensive understanding of the different types of managed NoSQL databases on AWS
This course is intended for people that are relatively new to relational and non-relational databases and want to gain an understanding of what types of databases are available on AWS.
To get the most out of this course, you should ideally have a very basic knowledge of relational and non-relational databases and previous experience with AWS.
Relational databases have been commercially available since the 1970s. They provide an efficient, intuitive, and flexible way to store and report on highly-structured data.
These structures, called schemas, are defined before any data can be entered into the database.
Schemas are designed and built based on reporting requirements.
This means that a database’s expected output drives the creation of the database and how data is stored inside it.
Once a schema has been defined, database administrators and programmers work backward from these requirements to define how data will be stored inside the database.
No data can be stored in a database until this work has been completed.
Schema changes to existing databases are expensive in terms of time and compute power. It also has a risk of corrupting data and breaking existing reports.
Data in a relational database is stored in tables. Each table--sometimes called a relation--contains one or more rows of data.
Each row--sometimes called a record--contains a collection of logically related data that is identified by a key.
The pieces of data stored in a row are called attributes or fields.
Visually, a table looks like a spreadsheet that has rows and columns.
Stored in one of the columns, each table has a primary key that uniquely identifies the information stored in each row.
Relationships between tables are created using these keys and there are rules that govern their behavior. The primary key in one table is a foreign key in another.
Data integrity is of particular concern in a relational database, there are a number of constraints that ensure the data contained in tables is reliable and accurate.
These reliability features--commonly referred to as ACID transactions--are atomicity, consistency, isolation, and durability.
Atomicity refers to the elements that make up a single database transaction. A transaction could have multiple parts. It is treated as a single unit that either succeeds completely or fails completely.
Consistency refers to the database’s state. Transactions must take the database from one valid state to another valid state.
Isolation prevents one transaction from interfering with another.
Durability ensures that data changes become permanent once the transaction is committed to the database.
Data in a relational database must be kept in a known and stable state.
As part of the requirements to maintain database stability, Primary and Foreign Keys are constrained--they have rules that govern them--to ensure the integrity of database tables.
Entity Integrity ensures that, in a table, the primary key is unique to the table and it has a value. Primary keys cannot be blank or null.
Referential Integrity requires that every value in a Foreign Key column exists as the Primary Key of its originating table. If four tables are related and a record is deleted in one of them, then the corresponding records in related tables must be deleted as well.
The standard user and application programming interface--or API--of relational databases is the Structured Query Language, SQL.
Pronounced as either Ess-Queue-Ell or Sequel, it can be used either interactively or programmatically to create, update, and maintain the data inside a relational database.
SQL is the dominant query language for relational databases.
SQL is an industry standard, it is interoperable between database engines and application programming languages, well-documented, and stable.
Security is one of the most important responsibilities of a database administrator.
Relational database engines have built-in features for securing and protecting data but planning and effort are required to properly implement them.
These features include user authentication, authorization, and audit logging.
As part of the structure, data stored in relational databases is highly normalized. Normalization is a process where information is organized efficiently and consistently before storing it.
Duplicate data is discarded.
Closely related fields are grouped together.
Data should only be stored one time in a relational database. Fields that are logically related, like a first and last name, should be stored in the same table.
Removing redundancy and keeping similar data close reduces storage costs and improves the efficiency of data retrieval.
Relational databases are not partition tolerant. A data partition, in this case, refers to the disk.
Adding another disk would be like creating a second copy of the database. This copy, or partition, is called a shard.
When a shard is created, it uses the original database’s schema. This is a horizontal partition of a database.
To use it, logic outside of the database must be created to direct queries to the correct database.
This is because relational databases are designed to validate how data is stored. They do not check to see if information belongs inside it.
To illustrate, here’s a weather database split into a pair of shards, Rain and Snow. They are identical except for the information stored inside them.
An application determines if data should be stored in Rain or if it should be stored in Snow.
If a record belonging in Rain ends up in Snow and it matches the database schema, it will be stored.
However, since that record belongs in Rain, the reports will be wrong and applications will break when trying to query data.
Because of this complexity, most of the time relational databases are scaled vertically.
Horizontal scaling adds a copy of the database server. Vertical scaling is growing the server; usually by adding memory, CPU, or expanding a disk volume.
Vertical scaling has limits. There are only so many resources that will fit inside a server. Once these limits have been reached, a database will either need to be redesigned or broken into shards.
AWS has six fully-managed database engines available inside the Relational Database Service, RDS.
They are Amazon Aurora, MySQL, Postgres, MariaDB, Oracle, and Microsoft SQL Server.
Amazon Aurora is AWS’s cloud-native version of MySQL and Postgres.
As a review…
Relational databases are highly-structured data stores.
The structure is called a schema.
The schema defines how data is stored in tables.
Inside tables there are rows and columns.
A row is a record and each column is an attribute or field of the record.
Tables have keys that identify data in a table.
A Primary Key uniquely identifies a row in a table.
Foreign Keys are used to connect data in a row to rows in other tables.
Scaling is usually done vertically by adding compute resources to an existing database.
Horizontal scaling is called sharding and requires logic outside of the database.
Relational databases are ideal for applications that do online transactional processing.
These OLTP applications include online banking, e-commerce sites, inventory management, human resource management, and financial services.
OLTP transactions usually perform specific tasks and involve a single record or a small selection of records.
An online banking customer might send money from a checking account to a saving account.
A transaction like this involves two accounts and no other customers of the bank.
But, what about analytical applications where hundreds, thousands, or millions of transactions need to be processed quickly, efficiently, and at a low cost?
That’s where non-relational databases are helpful. Though, unlike the various relational database engines that have similar needs around structured data, the size & shape of the unstructured or semi-structured data determine the type of non-relational database to choose.
These non-relational databases are often called NoSQL databases because, when they were first developed, they used something other than SQL to store and retrieve data.
However, over time, SQL has been adapted to be used with some of these non-relational databases. Because of this, NoSQL can also mean “Not Only SQL.”
If any type of data can be stored in a relational database, why bother with a non-relational database?
In the next lecture, let’s learn about NoSQL Databases, what they are, and what differentiates them from relational databases.
Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.
Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.
Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.
In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.