In this course, we will introduce you to some common types of databases. Different data problems can be solved in a wide variety of ways, which is why so many different types of databases exist. After learning about a few different types of databases, you’ll be ready to explore specific implementations further.
During this course, I’ll introduce you to the following:
- SQL databases
- Key-value databases
- Graph databases
This course is for novice:
- Software engineers
- Data engineers
- DevOps engineers
- Site reliability engineers
- You should have at least a conceptual understanding of programming and be comfortable with data structures, data types, etc.
Different types of databases serve different use cases. In this lesson, I’ll introduce you to: document-oriented databases. Document-oriented databases are databases whose fundamental unit of data is called a document. Where SQL databases store data as rows of related columns - document-oriented databases store data as documents.
Documents are an example of an associative array. They consist of a collection of key-value pairs; which are often referred to as attributes or properties. I’ll use the term attributes going forward. Document-oriented databases are built on top of key-value databases. Key-value databases typically don’t make any considerations for the type of data being stored. They’re just a fancy lookup table. You get, set, or delete some data.
Unlike key-value databases, document-oriented databases don’t just store data. They understand documents. Allowing for behaviors such as individual attribute-updates and attribute-based queries. Though, unlike SQL there’s no standard query language for document-oriented databases. The syntax used varies between specific database implementations.
Document-oriented databases are designed around the concept of a document as an atomic unit of data. Most document-oriented databases represent documents using the popular JSON data format or some similar variation. The popular MongoDB document database uses a format called BSON which stands for Binary JSON. It follows the same basic structure as JSON with additional data types such as dates, binary data.
Inside of a document-oriented database documents are stored in a collection. Different database implementations might use different terms for a collection. For example, they might be called: tables, collections, namespaces, or something similar. However, conceptually they’re the same.
A collection is just a namespace for a group of related documents. For example, you might have a collection of products, articles, users, etc. Multiple collections can exist inside a single database. The name of a collection describes the type of documents being stored.
Documents inside of a collection aren’t required to use the same set of attributes. SQL databases enforce the constraints defined in the schema such as: columns having specific types and whether or not a column allows null values.
Document-oriented databases are considered to be schemaless because they don’t enforce any constraints when the data is saved to the database. Relational databases ensure that no data is saved if it doesn’t match what’s expected. Document-oriented databases don’t typically provide any such guarantees.
This shifts the way that developers have to work with data.
Any schema requirements need to be enforced in code.
Wanting to ensure that a certain attribute isn’t null or that it matches its expected data type is up to the developer.
Also, code needs to consider the different types of document in a collection.
If documents are going to include different attributes then developers are responsible for ensuring their code knows what to do with those attributes.
By removing the schema from the database, developers are able to determine how concepts should be modeled in code. Which turns out to be quite useful for a wide range of use cases. Let’s review a couple of example use cases. Document-oriented databases are commonly used for internet-of-things (or IoT) applications.
Imagine having thousands of devices regularly sending some application different types of telemetry data. A document database is a good choice in this use case because the exact data might differ between devices. Depending on the purpose of the device. The attributes might vary based on the device, however, the data is all still a part of the same telemetry collection.
Once saved this collection includes different types of documents. And it’s up to developers to know which types of documents exist and the purpose of each attribute.
Gaming is another common use case. Imagine a role-playing game where defeating a digital monster rewards a player with some item. A document database could be a good fit for storing items. Because items can have some common attributes and some specific attributes depending on the use case. Consider that different types of consumables might operate on different properties. For example, a health vs a speed boost potion. They’re both part of the same rewards collection. However, they include different attributes based on how the data will be used.
This is where document-oriented databases shine. Because they allow documents to be created with whichever attributes are required by the consuming code. There are several common document-oriented databases. Though, I’m going to call out two standouts which are so widely used that they’re worth knowing about. The first option is called MongoDB. It was released in 2009 and has since become one of the most popular document databases.
It’s known for high-availability configuration and ability to query documents. MongoDB allows documents to be queried by specifying an attribute, value, and optional operator. Consider these animal documents. The following query with a key of “animal” and a value of “cat” will return documents where the animal attribute exactly matches ‘cat’.
Here’s another example using an operator.
Notice this ‘dollar-sign gt’ operator here which stands for ‘greater-than.’ This query builds on top of the previous query - returning all cats weighing more than 5 kilograms. MongoDB includes many different operators allowing documents to be queried in a similar manner to SQL. Though, with a different syntax. The next document-oriented database option I want to introduce you to is called Elasticsearch. It was released in 2010 and has become widely used across a range of applications.
Here’s what Elasticsearch has to say about itself.
“Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning-fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.”
Elasticsearch is a document-oriented search engine with a wide range of search capabilities. Being a general-purpose search engine opens up many different use cases. Some examples of common data sets include applications logs, product catalogs, map data, etc.
Documents can be searched using full-text search with support for wildcards, date and numeric ranges, among others.
There are many other document-oriented databases available. Including many cloud-based options. Document-oriented databases can provide a lot of value for specific types of applications. However, each implementation can be quite different.
Different implementations provide different data consistency guarantees and query functionality. Which means engineers have to be mindful of the needs of their applications when selecting a database.
Okay, this seems like a natural stopping point. Here are your key takeaways for this lesson:
Document-oriented databases treat a document as an atomic unit of data.
Documents consist of attributes describing the concept being modeled.
Documents are stored inside of collections.
Documents are not required to adhere to a specific schema.
Allowing documents to include whatever data is required.
Removing the schema shifts the burden of data integrity to developers.
Enforcing that attributes aren’t null must happen in code.
Specific functionality is database implementation specific.
MongoDB includes a robust query mechanism.
Elasticsearch includes feature-rich document searching.
Okay, that's going to be all for this lesson. Thanks so much for watching. And I’ll see you in another lesson!
Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.