Differences Between AWS Database Types
The course is part of this learning path
In this section of the Cloud Practitioner learning path, we introduce you to the various Database services currently available in AWS that are relevant to the CLF-C01 exam.
- Identify and describe the various Database services available in AWS
- Understand the differences between relational and NoSQL databases
- Describe AWS-managed relational and NoSQL database services
This course is designed for anyone who is new to cloud computing, so no prior experience with AWS is necessary. While it may be helpful to have a basic understanding of AWS and its services, as well as some exposure to AWS Cloud design, implementation, and operations, this is not required as all of the concepts we will introduce in this course will be explained and reinforced from the ground up.
Hello and welcome to this lecture about the types of managed NoSQL databases available on AWS. Managed services are those where the provisioning and maintenance are done by AWS according to your specifications on your behalf.
This lecture is a continuation of the previous one and will serve as an overview and discussion of Graph databases, Time Series databases, Ledger databases, and Search databases.
Like the previous lecture, it will not cover implementation details. It is mostly a discussion of what’s possible.
I will provide an overview of each NoSQL technology, name the service from AWS that uses it, and provide a use case.
Let’s get started.
A Graph database is a database that uses a graphical model to represent and store data about relationships. Relationship data is important for things such as building social networking applications, recommendation engines, doing fraud detection, creating knowledge graphs, and modeling the life sciences.
The AWS managed NoSQL graph database is Amazon Neptune.
Graph databases are composed of three elements, vertices, edges, and properties.
Vertices, also called nodes, are objects such as people or artifacts. Each node in a graph database has a unique identifier expressed in key-value pairs.
The singular of vertices is vertex. A vertex can represent data such as integers, string, people, locations, and buildings.
Edges represent the connection--or relationship--between two objects. Each edge is defined by a unique identifier that provides details about a starting or ending node along with a set of properties.
The vertices and edges can each have properties associated with them. This allows a graph database to depict complex relationships between otherwise unrelated data.
Here is a simple graph database.
The circles are the vertices and the arrows represent edges. Each edge has a property that defines the relationship.
The composer node, John Williams, has relationships with a number of movies. However, he’s got two relationships with the movie Schlinder’s List because he wrote the movie’s score and won an Academy Award for his work.
As more data is added to the database, the schema changes to match the relationships.
Depending on the graph database, they can process either transactional or analytical workloads.
Graph databases can process large sets of user profiles and interactions to build social networking applications.
Graph databases can store relationships between customer interests, friends, and purchase history to create recommendations.
Use graph databases to process financial and purchase transactions in near real-time to detect fraud patterns.
A knowledge graph stores information in a graph model and uses graph queries to enable the navigation of highly connected datasets.
Use a knowledge graph to add topical information to product catalogs, build and query complex models of regulatory rules, or model general information.
Graph databases can be used to create applications that store and navigate the life sciences.
Use a graph database to map a computer network and answer questions about hosts and application usage. If a malicious file is on a host, a graph database could be used to find the connection between the hosts that spread the malicious file and trace it back to the host that downloaded it.
Time-series Databases efficiently collect, synthesize, and derive insights from data that changes over time.
The AWS managed NoSQL database for time-stream data is Amazon Timestream.
In a Time-Series Database data is collected at regular intervals as the value and is stored with the time as the key.
While it’s possible to retrieve a single item from time-series data--like the price of an item--computation is usually applied over a range of time data to return a result.
The primary purpose of a Time-Series Database is to provide answers. A query will process a range of data, do the appropriate computations, and return the results.
For example, determining the MIN, MAX, and AVG of CPU utilization on a database server over the past seven days.
Time-series databases are ideal for DevOps applications that collect data millions of times per second and analyze that data in real-time to improve application performance and availability.
Use Time-Series databases to quickly analyze time-series data generated by IoT applications using analytic functions such as smoothing, approximation, and interpolation.
Time stream databases can be used to store and analyze clickstream data to understand user activity across applications over a period of time.
Use a Time Stream database to store and analyze time-series data for industrial equipment maintenance, trade monitoring, fleet management, and route planning.
Ledger Databases provide a centralized and trusted authority to maintain a scalable, immutable, and cryptographically verifiable record of transactions for an application.
The AWS managed NoSQL ledger database is the Amazon Quantum Ledger Database.
These databases maintain their trust, in part, by being fully auditable and transparent. All transactions are recorded in a log to track activity.
QLDBs are immutable. This means that the data in the database remains unchanged once saved. Instead, the action of updating data creates a new version of the record. Changes to the database do not overwrite existing database records.
Cryptographic verification is used to ensure data is immutable. When a record is committed, a hash is created by the database.
Hashing is an algorithm performed on data to produce a number called a checksum or hash. This hash is used to verify that data has not been modified, tampered with, or corrupted.
No matter how many times the hashing algorithm is run against the data, the hash will always be the same when the data is the same.
Quantum Ledger Databases use blockchain technology when creating hashes. This means they use two pieces of information to create a hash value; the record data and the hash of the previous record. This ensures that the entire chain of records is valid.
Anyone can create an audit log to show how data is used, but how can they legally prove that the data has not been altered?
Even with the best user interfaces and audit tracking, a skilled programmer can change electronic records without leaving a trace.
Blockchains can be used to build trust and ensure policy, governance, and regulation of data processes.
Banks often need a centralized ledger-like application to keep track of critical data such as credit and debit card transactions.
Instead of building a custom ledger with complicated auditing functionality, a ledger database can easily store an accurate and complete record of financial transactions.
Manufacturing companies have a need to reconcile data between supply chain systems to track the manufacturing history of a product.
A ledger database can be used to record the history of each transaction and provide the details of each individual batch of a product manufactured at a facility.
Insurance applications need a way to track the history of claims.
Instead of building complex auditing functionality using relational databases, insurance companies can use a ledger database to maintain the history of claims.
When conflicts arise, a ledger database can cryptographically verify the integrity of the claims data.
HR systems have to track and maintain a record of employee details such as payroll, bonus, benefits, performance history, and insurance.
By implementing a system-of-record application using a ledger database, companies can easily maintain a trusted and complete record of the digital history of employees in a single place.
Retailers need to access information on each stage of a product's supply chain.
With a ledger database, retail companies can track the full history of inventory and supply chain transactions.
Search engines help people find the information they need. Search databases are optimized to store and retrieve search-related data and typically offer specialized methods such as full-text search, complex search expressions, and the ranking of search results.
The managed NoSQL offering from AWS is the Amazon Elasticsearch Service.
Search databases securely ingest unstructured data from multiple locations, store and index it, and make it searchable.
Data ingestion is the process of taking raw data from a variety of sources, then parsing, normalizing, and enriching it.
The raw data sources include logs, system metrics, and web applications.
Once ingested, the data is indexed inside the search database. An index is a collection of documents that are related to each other. The search database, Elasticsearch, stores indexes as JSON documents. Each document has a set of keys that have corresponding values.
Elasticsearch uses a data structure called an inverted index that provides fast full-text searches.
An inverted index lists every unique word that appears in any document and identifies all of the documents where each word occurs.
Search databases can be used to provide a fast, personalized search experience for applications, websites, and data lake catalogs.
A real estate business could use a search database to help people find homes in a desired location, at a chosen price range, from among thousands of properties.
Search databases can be used to store, analyze, and correlate application and infrastructure log data to find and fix issues.
Use search databases to analyze network and systems logs for real-time threat detection and incident management.
Well, that covers the types of fully-managed NoSQL databases available on AWS. It was a fair amount of information. However, it should give you an idea of what’s possible in the AWS cloud.
In my next lecture, I’m going to do a quick summary of relational and non-relational databases, the AWS fully-managed options, and their use cases.
I’ll also give you some options for your next steps; depending on your needs and interests.
This has been a high-level overview of Graph databases, Time Series databases, Ledger databases, and Search databases.
NoSQL databases let developers divide complex applications into manageable pieces and create rich experiences for customers.
In the final lecture in this series, I'll summarize what I've covered about the managed database types on AWS and give you an idea of what next steps you should take.
Thanks for watching.
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.
Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.