Introduction to Cosmos DB
Introduction to Using Cosmos DB
Introduction to Creating an App with Cosmos DB
The course is part of these learning pathsSee 3 more
Cosmos DB is one of many database solutions in a crowded market. From DynamoDB to Cassandra to CockroachDB, the questions one would naturally ask when examining Cosmos DB are, “what makes it special and how can I get started with it?”
This course answers both of those questions as thoroughly and concisely as possible. This course is for anyone with an interest in database technologies or creating software using Microsoft Azure. Whether you are a DevOps engineer, a database admin, a product manager, or a sales expert, this course will help you learn how to power your technology with one of Azure's most innovative data solutions.
From this course, you will learn how to make use of Cosmos DB's incredible flexibility and performance. This course is made up of 9 comprehensive lectures to guide you from the basics to how to create your own app using Cosmos DB.
- Learn the basic components of Cosmos DB
- Learn how to use Cosmos DB via the web portal, libraries, and CLI tools
- Learn how to create an application with Cosmos DB as the data backend
- People looking to build applications using Microsoft Azure
- People interested in database technologies
- General knowledge of IT architecture
- General knowledge of databases
The first and most obvious prerequisite for using Cosmos DB is having a Microsoft Azure account. Once that's ready to go, you'll need to go ahead and create a Cosmos DB account. This is as simple as clicking Create a resource in the top left of the web UI and then clicking Databases, and finally Cosmos DB. You'll then be prompted to input some basic information about your new Cosmos DB resource. A name to uniquely identify it within your account, a resource group name, a region, an option for a replica fail over region, and crucially an API type, which will define what sort of data model you use. Recall that you can select from MongoDB, Cassandra, SQL, AzureTable, and Gremlin. Once everything is filled in, click create and wait a few minutes. Now at this point, you have a Cosmos DB resource that's ready to use, but no actual database. To actually create a database and start writing and reading data, we would take many different approaches.
Well, we can take many different approaches. We could use a programming language like Python with the Cosmos DB API, we could use Powershell or the Azure Command Line tool, and we'll cover those options in the following sections. For our less technical friends, the easiest way to use Cosmos DB is to use either Azure Storage Explorer or Azure Data Explorer. So, what exactly are these things? What exactly are Data Explorer and Storage Explorer? What's the difference here? Well here's the simple version. Storage Explorer is a downloadable tool. It provides a graphical user interface for managing Azure storage. It works not only with Cosmos DB, but also Data Lake and Azure Queues, and other arbitrary file and blob storage components. So, it's a great option if you are storing data in many places and you want a single, simple piece of software to track everything. Data explorer by contrast is specific to Cosmos DB and it works in the browser. You don't download or install anything. Through the Data Explorer web UI, you can point and click to create databases, browse data, make configuration changes to your Cosmos DB resources, or whatever else you need to do.
Data Explorer is a new-ish feature. It's still technically in preview as of the time of this course, so keep that ain mind. So which should you use? Well honestly, it doesn't matter a lot. The feature set is pretty similar in terms of basic things. Data Explorer is better suited for Cosmos DB specifically though. It has the benefit of working right in the browser too. So for our purposes, we're gonna focus on Data Explorer, but feel free to download Storage Explorer and play around with it. See the linked documentation for the quick start guides. Setting up your first database using Data Explorer is pretty simple. After clicking on Data Explorer in the left hand menu, you'll see a New Collection option in the middle of the page. Click that, give a name to your database, and a name to its first collection. You'll also get a storage capacity allotment in gigabytes and you'll also have to set a throughput. Now this later value is really important. It's measured in RUs, request units, which we talked about in the last section. This will define the number of requests that you could serve and what sort of latency you will get. Now let's a take a moment to full understand how request units work.
How many should we set for our account? What's the formula? Azure offers some good general advice in the documentation. Be sure to check out the links. They also offer a nice little request unit calculator where you can actually upload a sample json document and input some parameters. You know, so just reads and deletes and expected updates per second. And the calculator will just spit out an estimated optimal value for how many RUs you need. The basic formula is to multiply the number of desired reads per second by the average item size in kilobytes, then perform the same calculation for desired writes per seconds, and then add those two numbers, writes per second, reads per second. So, let's do a practical example. If your item size is, say, two kilobytes on average and you need 300 reads per second and 500 writes per second, you would get 600 and 1,000 respectively, and therefore in this scenario, you would opt for roughly 1600 RUs, request units. There are a few other important variables to consider though. Document indexing, complexity of query patterns, and critically the desired consistency level. Recall the five.
There's the strong consistency level, bounded staleness, session, consistent prefix, and eventual consistency. All of these parameters, they all have an impact on the proper RU setting. So for this reason, we strongly recommend checking the documentation and use their RU calculator to get a good recommendation. So let's return to the collection we're creating. So we have a database name, we have a collection name, we have our throughput setting in RUs, we have our storage capacity. One other thing the console will ask for is a unique key. So, this is an optional parameter. It's meant to add more data integrity to our storage and lets you ensure the uniqueness of values per partition key. For our purposes in a dev environment, we can leave it blank. And then after you click okay, you'll have an empty database. So let's go ahead and create some sample data. We can do this right from the same portal by selecting Documents and then selecting New Document. And this will pop open a text editor. Type in some text in json format. You can then just click save. You can copy and paste json from wherever. Don't worry too much about having specific fields for now.
Cosmos DB does not impose and schema on data by default. Once you click save, you'll be able to query the data you just created. You can do that right from the Data Explorer UI as well. So congrats! You now know how to set up your Cosmos DB endpoint using a GUI. You can create the basic configuration, you can add some arbitrary data, you can query it, and you can handle basic maintenance for things like request units. Magical stuff. But of course, we can't stop there. Unless you have a very unusual app, you probably want software to handle querying your database instead of humans. So in the next lesson, we will learn all about the Cosmos DPI and how you can get computers to do all of this stuff for you. See you there, space cowboy.
About the Author
Jonathan Bethune is a senior technical consultant working with several companies including TopTal, BCG, and Instaclustr. He is an experienced devops specialist, data engineer, and software developer. Jonathan has spent years mastering the art of system automation with a variety of different cloud providers and tools. Before he became an engineer, Jonathan was a musician and teacher in New York City. Jonathan is based in Tokyo where he continues to work in technology and write for various publications in his free time.