Differences Between AWS Database Types
The course is part of this learning path
In this section of the Cloud Practitioner learning path, we introduce you to the various Database services currently available in AWS that are relevant to the CLF-C01 exam.
- Identify and describe the various Database services available in AWS
- Understand the differences between relational and NoSQL databases
- Describe AWS-managed relational and NoSQL database services
This course is designed for anyone who is new to cloud computing, so no prior experience with AWS is necessary. While it may be helpful to have a basic understanding of AWS and its services, as well as some exposure to AWS Cloud design, implementation, and operations, this is not required as all of the concepts we will introduce in this course will be explained and reinforced from the ground up.
Relational databases have been commercially available since the 1970s. They provide an efficient, intuitive, and flexible way to store and report on highly-structured data.
These structures, called schemas, are defined before any data can be entered into the database.
Schemas are designed and built based on reporting requirements.
This means that a database’s expected output drives the creation of the database and how data is stored inside it.
Once a schema has been defined, database administrators and programmers work backward from these requirements to define how data will be stored inside the database.
No data can be stored in a database until this work has been completed.
Schema changes to existing databases are expensive in terms of time and compute power. It also has a risk of corrupting data and breaking existing reports.
Data in a relational database is stored in tables. Each table--sometimes called a relation--contains one or more rows of data.
Each row--sometimes called a record--contains a collection of logically related data that is identified by a key.
The pieces of data stored in a row are called attributes or fields.
Visually, a table looks like a spreadsheet that has rows and columns.
Stored in one of the columns, each table has a primary key that uniquely identifies the information stored in each row.
Relationships between tables are created using these keys and there are rules that govern their behavior. The primary key in one table is a foreign key in another.
Data integrity is of particular concern in a relational database, there are a number of constraints that ensure the data contained in tables is reliable and accurate.
These reliability features--commonly referred to as ACID transactions--are atomicity, consistency, isolation, and durability.
Atomicity refers to the elements that make up a single database transaction. A transaction could have multiple parts. It is treated as a single unit that either succeeds completely or fails completely.
Consistency refers to the database’s state. Transactions must take the database from one valid state to another valid state.
Isolation prevents one transaction from interfering with another.
Durability ensures that data changes become permanent once the transaction is committed to the database.
Data in a relational database must be kept in a known and stable state.
As part of the requirements to maintain database stability, Primary and Foreign Keys are constrained--they have rules that govern them--to ensure the integrity of database tables.
Entity Integrity ensures that, in a table, the primary key is unique to the table and it has a value. Primary keys cannot be blank or null.
Referential Integrity requires that every value in a Foreign Key column exists as the Primary Key of its originating table. If four tables are related and a record is deleted in one of them, then the corresponding records in related tables must be deleted as well.
The standard user and application programming interface--or API--of relational databases is the Structured Query Language, SQL.
Pronounced as either Ess-Queue-Ell or Sequel, it can be used either interactively or programmatically to create, update, and maintain the data inside a relational database.
SQL is the dominant query language for relational databases.
SQL is an industry standard, it is interoperable between database engines and application programming languages, well-documented, and stable.
Security is one of the most important responsibilities of a database administrator.
Relational database engines have built-in features for securing and protecting data but planning and effort are required to properly implement them.
These features include user authentication, authorization, and audit logging.
As part of the structure, data stored in relational databases is highly normalized. Normalization is a process where information is organized efficiently and consistently before storing it.
Duplicate data is discarded.
Closely related fields are grouped together.
Data should only be stored one time in a relational database. Fields that are logically related, like a first and last name, should be stored in the same table.
Removing redundancy and keeping similar data close reduces storage costs and improves the efficiency of data retrieval.
Relational databases are not partition tolerant. A data partition, in this case, refers to the disk.
Adding another disk would be like creating a second copy of the database. This copy, or partition, is called a shard.
When a shard is created, it uses the original database’s schema. This is a horizontal partition of a database.
To use it, logic outside of the database must be created to direct queries to the correct database.
This is because relational databases are designed to validate how data is stored. They do not check to see if information belongs inside it.
To illustrate, here’s a weather database split into a pair of shards, Rain and Snow. They are identical except for the information stored inside them.
An application determines if data should be stored in Rain or if it should be stored in Snow.
If a record belonging in Rain ends up in Snow and it matches the database schema, it will be stored.
However, since that record belongs in Rain, the reports will be wrong and applications will break when trying to query data.
Because of this complexity, most of the time relational databases are scaled vertically.
Horizontal scaling adds a copy of the database server. Vertical scaling is growing the server; usually by adding memory, CPU, or expanding a disk volume.
Vertical scaling has limits. There are only so many resources that will fit inside a server. Once these limits have been reached, a database will either need to be redesigned or broken into shards.
AWS has six fully-managed database engines available inside the Relational Database Service, RDS.
They are Amazon Aurora, MySQL, Postgres, MariaDB, Oracle, and Microsoft SQL Server.
Amazon Aurora is AWS’s cloud-native version of MySQL and Postgres.
As a review…
Relational databases are highly-structured data stores.
The structure is called a schema.
The schema defines how data is stored in tables.
Inside tables there are rows and columns.
A row is a record and each column is an attribute or field of the record.
Tables have keys that identify data in a table.
A Primary Key uniquely identifies a row in a table.
Foreign Keys are used to connect data in a row to rows in other tables.
Scaling is usually done vertically by adding compute resources to an existing database.
Horizontal scaling is called sharding and requires logic outside of the database.
Relational databases are ideal for applications that do online transactional processing.
These OLTP applications include online banking, e-commerce sites, inventory management, human resource management, and financial services.
OLTP transactions usually perform specific tasks and involve a single record or a small selection of records.
An online banking customer might send money from a checking account to a saving account.
A transaction like this involves two accounts and no other customers of the bank.
But, what about analytical applications where hundreds, thousands, or millions of transactions need to be processed quickly, efficiently, and at a low cost?
That’s where non-relational databases are helpful. Though, unlike the various relational database engines that have similar needs around structured data, the size & shape of the unstructured or semi-structured data determine the type of non-relational database to choose.
These non-relational databases are often called NoSQL databases because, when they were first developed, they used something other than SQL to store and retrieve data.
However, over time, SQL has been adapted to be used with some of these non-relational databases. Because of this, NoSQL can also mean “Not Only SQL.”
If any type of data can be stored in a relational database, why bother with a non-relational database?
In the next lecture, let’s learn about NoSQL Databases, what they are, and what differentiates them from relational databases.
Course Introduction - The AWS Database Landscape - NoSQL Databases - Types of Managed NoSQL on AWS - Part 1 - Types of Managed NoSQL on AWS - Part 2 - Summary and Conclusion
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.