Getting the Most from DocumentDB
An Introduction to Azure DocumentDB
It's been common, if inconsistently applied, knowledge for many years that relational databases are a less-than-ideal fit for some types of software problems. Indeed, entire categories of software development tooling, such as object-relational mappers (ORMs), exists to bridge the gap between highly normalized relational data and in-memory, object-oriented representations. In practice, ORMs can create as much complexity as they alleviate, so developers began looking at the relational database itself as ripe for potential disruption.
Thus came the rise of NoSQL and databases that eschew the traditional rows/columns/tables/foreign keys metaphor for other choices like JSON document stores, graph databases that represent data and relationships as nodes with connecting edges, key/value stores that act as a glorified hashtable, and others. The wide range of options meant you could choose the right tool for your particular needs, instead of trying to squeeze a relational database square peg into your application's round hole. Solutions like MongoDB, Cassandra, Redis, and Neo4j rose to prominence and became de facto industry standards for developers interested in leveraging the power and flexibility NoSQL.
While NoSQL was a boon to software developer productivity, the initial product offerings did little to alleviate the administrative burden of managing your database. Server provisioning, backups, data security at-rest and in-transit... all these challenges (and many more) remained as developers adopted NoSQL in greater numbers. Fortunately for them and all of us, the rise of the cloud and managed database service offerings like Azure DocumentDB brought us the best of both worlds: fast, flexible, infinitely-scalable NoSQL with most of the administrative headaches assumed by a dedicated team of experts from Microsoft. You focus on your data and your application, and rely on a 99.99% SLA for the rest!
In this "Introduction to Azure DocumentDB" course, you’ll learn how to use Azure DocumentDB (DocDB) in your applications. You'll create DocDB accounts, databases, and collections. You'll perform ad-hoc and application-based queries, and see how features like stored procedures and MongoDB protocol support can help you. You'll also learn about ideal DocDB use cases and the pricing model. By the end of this course, you’ll have a solid foundation to continue exploring NoSQL and DocumentDB.
An Introduction to Azure DocumentDB: What You'll Learn
|Lecture||What you'll learn|
|Intro||What to expect from this course|
|DocumentDB Overview||A high-level overview of the DocumentDB feature set|
|Overview of Managing DocumentDB||A discussion of DocumentDB features for managing resources, data, scalability, configuration, and so on|
|Creating an Account||Creating a top-level DocDB account in the Azure portal|
|Creating a Collection||Creating and configuring a DocDB collection in the Azure portal|
|Importing Data||Discussion and demonstration of moving data into a DocDB collection|
|Overview of Developing with DocumentDB||A discussion of DocumentDB features from a development point of view|
|SQL Queries||How to author queries in the Azure portal|
|Programming with DocumentDB||Reading and writing data in code, using the .NET SDK|
|Stored Procedures||Authoring DocDB stored procedures and executing them using the DocDB REST API|
|MongoDB Protocol Support||Configuring and using DocDB's MongoDB protocol support|
|Use Cases||A brief discussion of scenarios well-suited for DocDB use|
|Pricing||A review of the DocDB pricing model, and discussion of cost estimation and Total Cost of Ownership|
|Ecosystem Integration||A short review of DocDB integration with other Azure services|
|Summary||Course wrap up|
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
Welcome back. Now that we've created our DocumentDB account, let's next create a collection inside of that account. So that we can import some data and run some queries. To do that I'll click on the Add Collection button. This will take me to the Add Collection blade. The first thing I need to do is to specify a collection ID. This name needs to be unique across the DocumentDB account. The data that I'll be working with are US postal codes. So I'll use a collection ID of postal code. Next, we need to specify the storage capacity of the collection. Now the data that I'll be working with is only a handful of megabytes, about 20 megabytes or so in size. So it's quite modest. I'll pick the smallest possible storage capacity. But certainly, pick the capacity that you need for your application scenario. Next, we need to specify our reserve throughput capacity. Again, I have very modest expectations and needs here, as this is just a simple demo collection. So I'm going to specify the smallest possible throughput capacity. 400 request units per second. Again, use the capacity planning tool that we discussed elsewhere in the course. Determine what your needs are for your application, and then specify that as appropriate here. Next, we need to specify, or at least optionally have the possibility to specify a partition key. And this is a JSON document path that DocumentDB will use to partition documents in the collection across the physical internal infrastructure. Again, I can specify this or I can leave it blank. If I leave it blank and don't specify a partition key then my collection will only exist on a single partition. Single partition collections are fine if you have very modest data sizes like I do. And certainly, if you have very modest client access needs. You don't anticipate running many queries, many simultaneous queries per second, that sort of thing. And again that's my case here with this simple demo. In most real-world cases, a partition collection will be the right choice. I'm going to leave mine blank here, but certainly, as a general rule, you'll want to identify a useful partition key and specify that here. The last thing we have that we can do is we can specify either a new database that we want to create, or use an existing database inside of which our collection will exist.
Now that we've created our collection, let's explore some of the other navigation features in the Account blade for DocumentDB. The first thing we can do is we can click on the Browse tab which allows us to browse into our set of databases and collections. You can see we have a single database called default, and a new collection called postal codes. And we just have some basic information here. Obviously, if we had more than one collection, we would see all of them appear in here filtered by each of the databases. If we click on the Scale tab, this gives us the ability to specify or change the provision throughput for each of our collections. And again this is something that you would typically do, if you were interested in doing it at all, it's typically something that's done programmatically, say with PowerShell or the Rest APIs that are exposed for DocumentDB. And this allows you to provision greater or lesser throughput for your collections. Say on a periodic basis, during the day, or certain hours of the day, you may need additional throughput capacity. But perhaps in the evening hours, you may not need as much. And so this is the location where you can specify some of those details. If we click on the Settings tab, we see some additional things that we can specify on a per collection basis. Notably, Indexing Policy. That is, by default all contents, all JSON paths in all documents in collections will be indexed. But if we choose, we can specify certain paths that we don't want to be indexed. So that's something that we can specify here. The other thing that we can specify is a time to live policy for documents on a per collection basis. And this allows us to expire, if we wish, it allows us to expire documents such that they'll be automatically purged from collections after a certain length of time.
About the Author
Josh Lane is a Microsoft Azure MVP and Azure Trainer and Researcher at Cloud Academy. He’s spent almost twenty years architecting and building enterprise software for companies around the world, in industries as diverse as financial services, insurance, energy, education, and telecom. He loves the challenges that come with designing, building, and running software at scale. Away from the keyboard you'll find him crashing his mountain bike, drumming quasi-rythmically, spending time outdoors with his wife and daughters, or drinking good beer with good friends.