Azure Data Fundamentals
Microsoft Azure offers services for a wide variety of data-related needs, including ones you would expect like file storage and relational databases, but also more specialized services, such as for text searching and time-series data. In this course, you will learn which services to choose when implementing a data infrastructure on Azure. Two services that are especially important are Azure SQL Database and Azure Cosmos DB.
Identify the most appropriate Azure services for various data-related needs
People who want to learn Azure fundamentals
General knowledge of IT architecture, especially databases
Cosmos DB is a pretty amazing database service. It used to be called DocumentDB, but Microsoft added so much new functionality to it that they had to rename it. Cosmos DB is such an unusual database service that it can take a while to understand it. I’ll try to sum it up as briefly as I can.
Cosmos DB’s two most important features are that it’s global and it’s multi-model. Not only is it extremely easy to replicate a Cosmos DB database to multiple regions around the world, but Microsoft even provides service level agreements for latency, which isn’t something you can get with normal database services.
The multi-model feature may be even more innovative. Cosmos DB supports many different types of data models, including document, key-value, graph, and columnar. It isn’t the only database system that supports multiple models, but it’s the first to be offered as a global cloud service.
As I mentioned in the NoSQL lesson, Azure Table storage now has a premium offering as part of Cosmos DB. It’s accessed using the Table API. It’s still a key/attribute store, but it offers these additional features:
- Global distribution
- Dedicated throughput worldwide
- Single-digit millisecond latencies at the 99th percentile
- Guaranteed high availability, and
- Automatic secondary indexing
These features are not unique to the Table API. They are inherent capabilities of all Cosmos DB data models.
Next, let’s look at the document data model. It’s very similar to the key/value model, but it provides richer querying capabilities. At the moment, Cosmos DB supports two different options for document databases, the MongoDB API and the SQL API.
MongoDB is currently the most popular document database and it’s open source. It stores data in a JSON-like format, without a schema. With the MongoDB API, you can take applications that were written to work with a MongoDB database and point them to a Cosmos DB database instead. In many cases, you only need to change a connection string to make this work.
Cosmos DB’s second document database offering is the SQL API. This used to be called the DocumentDB API. The new name is actually kind of confusing because a document database is technically a NoSQL database. So isn’t it a contradiction to call it the SQL API? Well, it’s called that because it lets you use a SQL-like language to query JSON documents (which is how Cosmos DB stores the data).
Graph databases are more complex than document databases because they also store information about relationships between entities. For example, two people may be connected to each other on LinkedIn or Facebook. A graph database stores this relationship data separately from the user entities.
Graphs are a great way to model many real-world relationships, but you can still find ways to model them using other types of databases. Where graph databases really shine is with applications that need to traverse a graph from one entity to another, with other entities in between, such as a friend of a friend. Graph databases can perform these operations orders of magnitude faster than other types of databases. Some examples of application types that can benefit from graph databases are social networking, content management, geospatial, and product recommendations.
The last data model supported by Cosmos DB is the columnar model. This is the model used by the Apache Cassandra database system. Cosmos DB provides the Cassandra API for applications that are written to use a Cassandra database.
Cosmos DB also provides an SLA on availability. It guarantees 99.99% availability for both single-region and multi-region databases. It also provides a 99.999% read availability SLA on multi-region databases.
It can provide such strong HA guarantees because of its highly redundant architecture. Not only is the database replicated across regions, but it’s also replicated within each region.
In the unlikely event of a regional failure, Azure will automatically failover your Cosmos DB databases to another region. You can make this process more efficient if you set a preferred regions list for each region. For example, if the West US region goes down, and your preferred regions list for that region has North Europe as the next region on the list, then that’s where Azure will failover to. When the region comes back online, Azure will automatically perform a recovery and bring the database back online in that region.
And that’s it for Cosmos DB.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).