1. Home
  2. Training Library
  3. Microsoft Azure
  4. Courses
  5. Introduction to Azure DocumentDB

Creating a Collection


Course Intro
Getting the Most from DocumentDB
Use Cases
3m 12s
5m 20s
Start course
1h 36m

It's been common, if inconsistently applied, knowledge for many years that relational databases are a less-than-ideal fit for some types of software problems. Indeed, entire categories of software development tooling, such as object-relational mappers (ORMs), exists to bridge the gap between highly normalized relational data and in-memory, object-oriented representations. In practice, ORMs can create as much complexity as they alleviate, so developers began looking at the relational database itself as ripe for potential disruption.

Thus came the rise of NoSQL and databases that eschew the traditional rows/columns/tables/foreign keys metaphor for other choices like JSON document stores, graph databases that represent data and relationships as nodes with connecting edges, key/value stores that act as a glorified hashtable, and others. The wide range of options meant you could choose the right tool for your particular needs, instead of trying to squeeze a relational database square peg into your application's round hole. Solutions like MongoDB, Cassandra, Redis, and Neo4j rose to prominence and became de facto industry standards for developers interested in leveraging the power and flexibility NoSQL.

While NoSQL was a boon to software developer productivity, the initial product offerings did little to alleviate the administrative burden of managing your database. Server provisioning, backups, data security at-rest and in-transit... all these challenges (and many more) remained as developers adopted NoSQL in greater numbers. Fortunately for them and all of us, the rise of the cloud and managed database service offerings like Azure DocumentDB brought us the best of both worlds: fast, flexible, infinitely-scalable NoSQL with most of the administrative headaches assumed by a dedicated team of experts from Microsoft. You focus on your data and your application, and rely on a 99.99% SLA for the rest!

In this "Introduction to Azure DocumentDB" course, you’ll learn how to use Azure DocumentDB (DocDB) in your applications. You'll create DocDB accounts, databases, and collections. You'll perform ad-hoc and application-based queries, and see how features like stored procedures and MongoDB protocol support can help you. You'll also learn about ideal DocDB use cases and the pricing model. By the end of this course, you’ll have a solid foundation to continue exploring NoSQL and DocumentDB.

An Introduction to Azure DocumentDB: What You'll Learn

Lecture What you'll learn
Intro What to expect from this course
DocumentDB Overview A high-level overview of the DocumentDB feature set
Overview of Managing DocumentDB A discussion of DocumentDB features for managing resources, data, scalability, configuration, and so on
Creating an Account Creating a top-level DocDB account in the Azure portal
Creating a Collection Creating and configuring a DocDB collection in the Azure portal
Importing Data Discussion and demonstration of moving data into a DocDB collection
Overview of Developing with DocumentDB A discussion of DocumentDB features from a development point of view
SQL Queries How to author queries in the Azure portal
Programming with DocumentDB Reading and writing data in code, using the .NET SDK
Stored Procedures Authoring DocDB stored procedures and executing them using the DocDB REST API
MongoDB Protocol Support Configuring and using DocDB's MongoDB protocol support
Use Cases A brief discussion of scenarios well-suited for DocDB use
Pricing A review of the DocDB pricing model, and discussion of cost estimation and Total Cost of Ownership
Ecosystem Integration A short review of DocDB integration with other Azure services
Summary Course wrap up

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.


Welcome back. Now that we've created our DocumentDB account, let's next create a collection inside of that account. So that we can import some data and run some queries. To do that I'll click on the Add Collection button. This will take me to the Add Collection blade. The first thing I need to do is to specify a collection ID. This name needs to be unique across the DocumentDB account. The data that I'll be working with are US postal codes. So I'll use a collection ID of postal code. Next, we need to specify the storage capacity of the collection. Now the data that I'll be working with is only a handful of megabytes, about 20 megabytes or so in size. So it's quite modest. I'll pick the smallest possible storage capacity. But certainly, pick the capacity that you need for your application scenario. Next, we need to specify our reserve throughput capacity. Again, I have very modest expectations and needs here, as this is just a simple demo collection. So I'm going to specify the smallest possible throughput capacity. 400 request units per second. Again, use the capacity planning tool that we discussed elsewhere in the course. Determine what your needs are for your application, and then specify that as appropriate here. Next, we need to specify, or at least optionally have the possibility to specify a partition key. And this is a JSON document path that DocumentDB will use to partition documents in the collection across the physical internal infrastructure. Again, I can specify this or I can leave it blank. If I leave it blank and don't specify a partition key then my collection will only exist on a single partition. Single partition collections are fine if you have very modest data sizes like I do. And certainly, if you have very modest client access needs. You don't anticipate running many queries, many simultaneous queries per second, that sort of thing. And again that's my case here with this simple demo. In most real-world cases, a partition collection will be the right choice. I'm going to leave mine blank here, but certainly, as a general rule, you'll want to identify a useful partition key and specify that here. The last thing we have that we can do is we can specify either a new database that we want to create, or use an existing database inside of which our collection will exist.

Now that we've created our collection, let's explore some of the other navigation features in the Account blade for DocumentDB. The first thing we can do is we can click on the Browse tab which allows us to browse into our set of databases and collections. You can see we have a single database called default, and a new collection called postal codes. And we just have some basic information here. Obviously, if we had more than one collection, we would see all of them appear in here filtered by each of the databases. If we click on the Scale tab, this gives us the ability to specify or change the provision throughput for each of our collections. And again this is something that you would typically do, if you were interested in doing it at all, it's typically something that's done programmatically, say with PowerShell or the Rest APIs that are exposed for DocumentDB. And this allows you to provision greater or lesser throughput for your collections. Say on a periodic basis, during the day, or certain hours of the day, you may need additional throughput capacity. But perhaps in the evening hours, you may not need as much. And so this is the location where you can specify some of those details. If we click on the Settings tab, we see some additional things that we can specify on a per collection basis. Notably, Indexing Policy. That is, by default all contents, all JSON paths in all documents in collections will be indexed. But if we choose, we can specify certain paths that we don't want to be indexed. So that's something that we can specify here. The other thing that we can specify is a time to live policy for documents on a per collection basis. And this allows us to expire, if we wish, it allows us to expire documents such that they'll be automatically purged from collections after a certain length of time.

If we click on the Document Explorer, this allows us to look into the contents, the data that's actually in the collections themselves. So again here's my postal codes collection. If I just click on a random document, this one has ID 01071. Then we can see the actual JSON that constitutes that document. Clicking on Query Explorer allows us to run SQL queries against our collections. Again, we can filter by specific databases or specific collections. So if we wanted to do something like select star from c where c.state equals, and we'll say Georgia, which is where I live. I'll run this query. And sure enough we can see all of, we see a JSON array of all of the document contents for all documents in this collection where state equals GA. Script Explorer allows us to examine, create and examine stored procedures, triggers, or user defined functions. Again, these are authored using JavaScript. As of this moment, we don't have any of these for the collection that we've created. But if we did have some, then certainly we would see them here. In a later demo, we'll demonstrate creating some store procedures for a collection and then executing them. The last thing to make note of is this Add Azure Search tab. Azure Search is a large-scale cloud-enabled search service that provides the capability to build very fast, very large indexes across a very large data sets. Azure Search has really nice integration with DocumentDB, such that you can build indexes against JSON documents in DocumentDB database.

About the Author

Josh Lane is a Microsoft Azure MVP and Azure Trainer and Researcher at Cloud Academy. He’s spent almost twenty years architecting and building enterprise software for companies around the world, in industries as diverse as financial services, insurance, energy, education, and telecom. He loves the challenges that come with designing, building, and running software at scale. Away from the keyboard you'll find him crashing his mountain bike, drumming quasi-rythmically, spending time outdoors with his wife and daughters, or drinking good beer with good friends.