This course covers the core learning objective to meet the requirements of the 'Designing Database solutions in AWS - Level 1' skill
- Understand when to Install databases on Amazon EC2 instances compared to AWS managed databases
- Understand the differences between the various AWS Database Types
- Analyze when to use Amazon RDS and Amazon DynamoDB for a given workload
Relational databases are highly structured repositories of data. They use schemas to define how information is organized and that schema must exist before the database can even be created.
This fixed nature of data structures makes relational databases sub-optimal for analytical processes where data is semi-structured or unstructured.
While relational databases are highly-structured repositories of information, non-relational databases do not use a fixed table structure. They are schema-less.
Since it doesn’t use a predefined schema that is enforced by a database engine, a non-relational database can use structured, semi-structured, and unstructured data without difficulty.
NoSQL is a general term that refers to a particular type of database model. It encompasses a wide variety of different models that don’t fit into the relational model.
Non-relational NoSQL-type databases have been around since the 1960s, but it wasn’t until the early 2000s that the NoSQL approach started to have broad appeal and a new generation of NoSQL systems began to hit the market.
Today, the term NoSQL describes a family of schema-less, non-relational, distributed data stores.
NoSQL databases are popular with developers because they do not require an upfront schema design; they are able to build code without waiting for a database to be designed and built.
It’s this flexibility--a dynamic approach to organizing data--that has been popular with companies needing to store unstructured or rapidly changing data.
The term NoSQL has two meanings. In the beginning, it described databases that used mechanisms other than SQL to manage data.
There was “No SQL” used when accessing and manipulating data.
The definition has been expanded to mean, “Not Only SQL.” Some systems use SQL along with other technologies and query languages.
There are people that argue that the one thing all NoSQL databases have in common is that they’re non-relational and that a better name would be, “NoREL.”
Personally, I don’t think I have enough free time to care that much about it.
NoSQL databases, in general, share a few basic characteristics.
They are non-relational, open-source, schema-less, horizontally scalable, and do not adhere to ACID constraints.
Most NoSQL databases access data using their own Application Programming Interface, API. However, some NoSQL databases use a subset of SQL for data management.
In many cases, the non-relational model is a good fit for an application’s requirements.
The data might be unstructured or semi-structured. The amount of data might be impractical for a relational database. Or, the data might be of one single type and doesn’t need the controls that come with a relational database.
Being open source is not a requirement of NoSQL databases. It’s more of a NoSQL observation. There are many relational and non-relational databases that open source projects. However, the developers of NoSQL databases lean towards providing open-source solutions.
Most NoSQL databases have no fixed schema.
Relational databases require a schema to be designed before the database is created. NoSQL databases don’t. Instead, schemas can be created dynamically as data is accessed or embedded into the data itself.
NoSQL databases have a reputation for being more flexible with the data they can accept and support agile and DevOps philosophies.
NoSQL databases are often run in clusters of computing nodes.
Data is partitioned across multiple computers so that each computer can perform a specific task independently of the others.
Each node performs its task without having to share CPU, memory, or storage with other nodes.
This is known as a shared-nothing architecture.
Most NoSQL databases relax ACID constraints found in relational databases.
NoSQL solutions were developed around the purpose of providing high availability and scalability in a distributed environment.
To do this, either consistency or durability has to be sacrificed. By relaxing consistency, distributed systems can be highly available and durable.
Using a NoSQL approach, inconsistent data is expected. There’s no problem as long as it’s recognized and managed appropriately.
Currently, there is no standard query language that is supported by all NoSQL databases.
NoSQL databases are a family of non-relational databases that include Key-Value Databases, Column Family Stores, Document Stores, and Graph Stores.
Key-Value databases are the simplest NoSQL data stores to use from an API perspective. Using a RESTful API, a client can get the value for the key, put a value for a key, or delete a key from the data store.
A Document Store Database is a database that uses a document-oriented model to store information. Each document contains semi-structured data that can be queried. Essentially, the schema for the data is built into the document, itself, and can change as needed.
A Graph Store is a database that uses a graphical model to represent and store information. It has two primary components, Vertices and Edges.
Those are some of the types of NoSQL databases that are available, and I'll eventually cover them in more detail, but why use them? What advantages do NoSQL databases have over relational databases?Scaling a NoSQL database is easier and less expensive than scaling a relational database because the scaling is horizontal instead of vertical. In general, for relational databases to scale, they must add memory, CPU, or storage. This is vertical scaling. However, NoSQL scaling is done by adding a compute or disk node. This is horizontal scaling. NoSQL databases generally trade consistency for performance and scalability.
Relational databases have four properties that support reliability. These properties, commonly referred to as ACID, are atomicity, consistency, isolation, and durability.
Consistency refers to the database's state. In a relational database, a transaction takes a database from one valid state to another valid state. With most NoSQL databases, it's possible for data to be inconsistent; a query might return old or stale data.
You might hear this phenomenon described as being eventually consistent. Over time, data that is spread across storage nodes will replicate and become consistent. What makes this behavior acceptable is that developers can anticipate this eventual consistency and allow for it. That said, some NoSQL databases do support strong consistency.
To review, NoSQL is a general term that refers loosely to a particular type of database model, or database management system.
NoSQL databases generally share a number of characteristics. They are Non-relational, databases, Open-source, Schema-less, and Horizontally Scalable.
Additionally, NoSQL databases do not generally adhere to the ACID principles found in relational databases and most do not use SQL to access data.
This is a good time to discuss the types of fully-managed NoSQL databases available from AWS. Or, it would be, but this is the end of this lecture.
In the next lecture, I'm going to describe, in some detail, the types of managed NoSQL database available on AWS. It won't be overly technical. It’s a discussion, really, about what’s possible and how to start thinking about your data.
Stephen is the AWS Certification Specialist at Cloud Academy. His content focuses heavily on topics related to certification on Amazon Web Services technologies. He loves teaching and believes that there are no shortcuts to certification but it is possible to find the right path and course of study.
Stephen has worked in IT for over 25 years in roles ranging from tech support to systems engineering. At one point, he taught computer network technology at a community college in Washington state.
Before coming to Cloud Academy, Stephen worked as a trainer and curriculum developer at AWS and brings a wealth of knowledge and experience in cloud technologies.
In his spare time, Stephen enjoys reading, sudoku, gaming, and modern square dancing.