Differences Between AWS Database Types
The course is part of this learning path
In this section of the Cloud Practitioner learning path, we introduce you to the various Database services currently available in AWS that are relevant to the CLF-C01 exam.
- Identify and describe the various Database services available in AWS
- Understand the differences between relational and NoSQL databases
- Describe AWS-managed relational and NoSQL database services
This course is designed for anyone who is new to cloud computing, so no prior experience with AWS is necessary. While it may be helpful to have a basic understanding of AWS and its services, as well as some exposure to AWS Cloud design, implementation, and operations, this is not required as all of the concepts we will introduce in this course will be explained and reinforced from the ground up.
Hello and welcome to this lecture about the types of managed NoSQL databases available on AWS. Managed services are those where the provisioning and maintenance is done by AWS according to your specifications on your behalf.
This lecture will serve as an overview of four types of managed NoSQL database technologies available on AWS. I will cover Key-Value stores, Document stores, Column Family stores, and In-Memory stores. In the next lecture, I will continue my overview and discuss Graph databases, Time Series databases, Ledger databases, and Search databases.
This lecture--as well as the one that follows it-- will not cover implementation details. It is mostly a discussion of what’s possible.
I’m going to provide an overview of each NoSQL technology, name the service from AWS that uses it, and provide a use case.
Let’s get started.
In a relational database, data is stored in tables composed of rows and columns. These tables and the types of data they’re going to store are defined prior to application development. This allows for storage and access patterns to be optimized.
It also means that relational databases are relatively inflexible.
Key-Value Databases, also called Key-Value Stores, are often considered to be the simplest type of NoSQL database. They are typically more flexible than relational databases and offer fast performance for reads and writes.
The AWS managed NoSQL database that is a Key-Value store is DynamoDB.
Key-Value stores are designed for storing, retrieving, and managing associative arrays and are well suited for working with large amounts of data.
An associative array, also known as a dictionary or a hash table, stores data with a unique identifier called a key. The data stored, which could be one or more items, is the value.
These are simple examples of key-value pairs.
It is also possible to store lists as the value.
Key-Value Stores have no schema that defines the structure of the data. There is only the key and its associated value.
The key in a key-value pair must be unique. This is a unique identifier that allows access to the value associated with that key.
Before using a key-value store, it helps to have a naming convention for key names. It will help keep key-value stores consistent and minimize confusion.
The value in a key-value store can be text, numbers, a list of items, documents, or another key-value pair.
In key-value stores, data is stored and retrieved using operations such as get, put, and delete.
Queries to Key-Value Stores are simple. Lookups are based on the key and retrieval is often measured in milliseconds regardless of the size of the data returned.
Key-Value Stores are not optimized for search. It’s very expensive to scan in terms of time and cost.
They are not suitable for applications requiring frequent updates or complex queries involving specific data values.
There are several types of data and access patterns that are well suited for Key-Value Stores.
Web applications can store user profiles, shopping cart data, and preferences in a Key-Value Store.
Real-time recommendation engines and advertising systems are often powered by Key-Value Stores.
Key-Value Stores are commonly used for in-memory data caching. They can speed up applications by minimizing reads and writes to slower disk-based systems.
Binary objects, such as pictures and other multimedia items--can be stored in key-value databases. However, a better solution--in terms of time and cost--is to save binary files in object storage and use a key-value database for lookups.
DynamoDB is a key-value database. However, since it can store key-value pairs as a value, it is also a type of NoSQL Document database.
Document Databases were invented to store semi-structured data. Instead of having the structure defined as part of the database in advance--like a relational database--each document in the database has its own unique schema that defines its structure.
The AWS managed NoSQL document database service is Amazon DocumentDB. As a document database, Amazon DocumentDB is designed to store, query, and index JSON data.
Document Databases are similar to Key-Value Databases in that they also have a key and a value. The difference is that, in a Document Database, the value contains structured or semi-structured data. This structured/semi-structured value is referred to as a Document.
In semi-structured data, there is no separation between the schema and the data. Each document stored has its own unique schema that defines what it contains.
The database engine uses this structure of the stored data to create metadata that is used for database optimization and queries.
Consider an application to track patient records in a doctor’s office. A patient--a person--does not fit in a relational database row. There is no schema that can be used to describe every person on earth.
When visiting the doctor, data is generated and entered by multiple people. There’s insurance information, billing, height, weight, blood pressure, medications, and related information.
Defining a person’s medical history in rows is impractical and inefficient.
A more efficient way is to think of patient information as a collection of documents. At every appointment, a new document is added with updated information.
Document Stores scale horizontally. Data can be stored over multiple nodes that can number in the 1,000s.
One benefit that document store databases have over key-value databases is that, in a document store, the data inside the document can be queried.
This is different from a Key-Value store where a query returns the value in its entirety.
In a document store, queries can be run against the structure of a document as well as the elements inside it to return only the information required.
Document Databases have a variety of use cases. They are used in web applications, for managing user-generated content, shopping catalogs, gaming, and for storing sensor data from IoT devices.
Where a relational database uses rows to store similar types of data, a Column Store is a type of NoSQL database that stores data using a column-oriented model.
On AWS, the NoSQL column store available as a managed service is Amazon Keyspaces.
Using columns allows the database to precisely access data needed to answer a query without having to scan each row in a table and discard unwanted items.
Column Store databases are also referred to as:
- Column databases
- Column-Family databases
- Column-Oriented databases
- Wide-Column Stores
- Columnar databases
- Columnar stores
A column store database uses a concept called a keyspace to define the data it contains.
A keyspace is similar to a relational database’s schema. The keyspace contains a collection of column families that look like tables from a relational database.
The column families contain rows and these rows contain columns.
A closer look at a column family shows:
A Column Family consists of multiple rows.
Each row can contain a different number of columns.
Each column is limited to its row.
Columns are kept in their own row. They do not span all rows like a relational database does. Each column contains a name-value pair along with a timestamp.
Here’s how each row is constructed. From left to right there is a row key and one or more columns.
The row key is a unique identifier for that row.
Each column contains a name-value pair and a timestamp.
The timestamp is the date and time the data was inserted. This is often used to determine the most recent version of the data.
Some Column-Family databases have composite columns that allow for objects to be nested inside a column.
Column stores are efficient doing data compression and partitioning.
Due to their structure, columnar databases excel at doing aggregation-type queries. That is, they can SUM, COUNT, and calculate AVG values easily.
Columnar databases scale well. They are suitable for workloads that do Massively Parallel Processing where data is spread across a large cluster of compute nodes that could number in the 1,000s.
Columnar stores can be loaded fast and efficiently. A one-billion row table can be loaded into a columnar store in seconds with queries and analysis starting almost immediately.
From an end-user perspective, the metadata in a columnar database looks and feels like a relational database. Some columnar database engines are SQL compliant and support the same controls that maintain the data’s state.
NoSQL databases tend to be either Key-Value type stores or Document stores. Columnar Store databases are neither.
Columnar databases are typically used with analytical applications, data warehousing, and Big Data processing.
In-Memory data stores are used by applications that require real-time access to data. Since the data is stored in memory, In-Memory stores provide microsecond latency to applications.
These stores are used as caches and the managed NoSQL service available from AWS is Amazon ElastiCache.
Amazon ElastiCache has two NoSQL In-Memory database engines; Redis and Memcached.
Before I go too much farther, I think it’s important to explain that a caching system is not a database. It is something that sits in front of a database to improve throughput. It also removes the need for putting a caching layer inside an application.
The primary purpose of an in-memory key-value store is to provide inexpensive access to data with sub-millisecond latency.
Most data stores have areas of data that are frequently accessed but rarely updated. Querying a database and getting the results from disk is always slower and more expensive than locating a key in a key-value pair cache.
Some relational database queries are expensive to perform. This might be a query that requires data from multiple tables or one that does a number of calculations before returning a result.
By caching query results, the cost of the query is only incurred once. The data can be returned multiple times without needing to run the query again.
An In-Memory data store keeps its entire dataset in RAM and is not stored on disk. The reward is speed. However, there is a downside. The risk when using an In-Memory store is that when a machine goes down the data is lost.
Some In-Memory stores, like Redis, are able to add persistence for recovery by saving a transaction log to disk and taking snapshots of datasets stored in memory.
Cached data is stale data. It is important to know, before implementing an in-memory cache, if an application can tolerate stale data and, if it can, in what context.
As an example, if an application displays stock prices, customers might be willing to accept staleness with a disclaimer saying prices are delayed by 5 minutes. However, a stockbroker will want real-time data.
Caching should provide a speed or cost advantage. It doesn't make sense to cache data that is dynamic or that is seldom accessed.
For caching to provide a benefit, data should be relatively static and frequently accessed like a personal profile on a social media site.
An in-memory store is well-suited to be a frond-end for relational databases and key-value stores.
It can provide a high-performance middle-tier for applications having high request rates or low-latency requirements.
In-memory stores can be used to cache session data, web pages, and leaderboards.
This has been a high-level overview of Key-Value stores, Document stores, Column Family stores, and In-Memory stores available on AWS. In my next lecture, I'm going to continue the discussion and cover Graph databases, Time Series databases, Ledger databases, and Search databases.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.