Getting the Most from DocumentDB
An Introduction to Azure DocumentDB
It's been common, if inconsistently applied, knowledge for many years that relational databases are a less-than-ideal fit for some types of software problems. Indeed, entire categories of software development tooling, such as object-relational mappers (ORMs), exists to bridge the gap between highly normalized relational data and in-memory, object-oriented representations. In practice, ORMs can create as much complexity as they alleviate, so developers began looking at the relational database itself as ripe for potential disruption.
Thus came the rise of NoSQL and databases that eschew the traditional rows/columns/tables/foreign keys metaphor for other choices like JSON document stores, graph databases that represent data and relationships as nodes with connecting edges, key/value stores that act as a glorified hashtable, and others. The wide range of options meant you could choose the right tool for your particular needs, instead of trying to squeeze a relational database square peg into your application's round hole. Solutions like MongoDB, Cassandra, Redis, and Neo4j rose to prominence and became de facto industry standards for developers interested in leveraging the power and flexibility NoSQL.
While NoSQL was a boon to software developer productivity, the initial product offerings did little to alleviate the administrative burden of managing your database. Server provisioning, backups, data security at-rest and in-transit... all these challenges (and many more) remained as developers adopted NoSQL in greater numbers. Fortunately for them and all of us, the rise of the cloud and managed database service offerings like Azure DocumentDB brought us the best of both worlds: fast, flexible, infinitely-scalable NoSQL with most of the administrative headaches assumed by a dedicated team of experts from Microsoft. You focus on your data and your application, and rely on a 99.99% SLA for the rest!
In this "Introduction to Azure DocumentDB" course, you’ll learn how to use Azure DocumentDB (DocDB) in your applications. You'll create DocDB accounts, databases, and collections. You'll perform ad-hoc and application-based queries, and see how features like stored procedures and MongoDB protocol support can help you. You'll also learn about ideal DocDB use cases and the pricing model. By the end of this course, you’ll have a solid foundation to continue exploring NoSQL and DocumentDB.
An Introduction to Azure DocumentDB: What You'll Learn
|Lecture||What you'll learn|
|Intro||What to expect from this course|
|DocumentDB Overview||A high-level overview of the DocumentDB feature set|
|Overview of Managing DocumentDB||A discussion of DocumentDB features for managing resources, data, scalability, configuration, and so on|
|Creating an Account||Creating a top-level DocDB account in the Azure portal|
|Creating a Collection||Creating and configuring a DocDB collection in the Azure portal|
|Importing Data||Discussion and demonstration of moving data into a DocDB collection|
|Overview of Developing with DocumentDB||A discussion of DocumentDB features from a development point of view|
|SQL Queries||How to author queries in the Azure portal|
|Programming with DocumentDB||Reading and writing data in code, using the .NET SDK|
|Stored Procedures||Authoring DocDB stored procedures and executing them using the DocDB REST API|
|MongoDB Protocol Support||Configuring and using DocDB's MongoDB protocol support|
|Use Cases||A brief discussion of scenarios well-suited for DocDB use|
|Pricing||A review of the DocDB pricing model, and discussion of cost estimation and Total Cost of Ownership|
|Ecosystem Integration||A short review of DocDB integration with other Azure services|
|Summary||Course wrap up|
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
An important factor when considering any managed cloud service is the price. Let's take a look at how DocumentDB is priced and some helpful tools for focusing on cost in your specific scenario. As a managed service with a high service level agreement, DocumentDB relies on the notion of reserved capacity. That is, when you create collections in DocDB, you specify your anticipated needs in terms of storage and throughput, meaning the amount of data per second you expect to read or write. This can be a strange concept for developers and IT administrators used to purchasing, quote-unquote, as much server capacity as the budget allows and then hoping for the best. In the case of DocumentDB, the burden is placed on you up front to understand how your application works. Do you do many frequent reads and writes of small amounts of data, or perhaps less frequent reads and writes against larger chunks of data, or some combination thereof? Can you perform most queries against a handful of records at a time? Or does your application perform expensive scans to aggregate data in real time? On the other hand, the DocumentDB pricing model allows savvy developers to optimize their application's data usage and potentially spend much less than they would on traditionally provisioned static server infrastructure. For example, applications that experience seasonal or periodic bursts of activity can be configured to scale up or down with up to hourly granularity. This alone can significantly reduce cost relative to the purchase of static data center resources that may otherwise sit unused for extended periods of time.
DocumentDB pricing works across two main vectors, the amount of storage consumed and the amount of throughput capacity reserved. Note that these elements are computed on a per collection basis. In the case of reserved throughput, you are paying for the amount requested, whether it's used or not. Storage is simply billed at the rate of 25 cents in U.S. dollars per gigabyte per month computed on an hourly basis. Throughput is measured in terms of something called request units. A request unit, or RU for short, is a logical measure of the CPU, RAM, and input-output needed to read a one-kilobyte document. All operations that DocDB performs have a deterministic cost in terms of request units. So you estimate and reserve the number of request units needed per second to fulfill your application's needs and you're billed at the rate of eight one-thousandths of a cent in U.S. dollars per 100 reserved request units. Billing is computed on an hourly basis. You pay for an hour's reserved capacity even when configured for only part of that hour.
As I mentioned, the reserved capacity estimates can be tricky to determine for those unfamiliar with that process. To make this job a bit easier, the DocumentDB team created an online capacity planning tool to help you estimate the number of request units you'll need to reserve. To use the tool, you upload sample JSON documents for your application and specify the estimated storage total of each as well as estimated create, read, update, and delete operations per second. The calculator will use this data to estimate the total number of request units needed to satisfy those estimates. Of course, this won't take into account fluctuations in application or user behavior, so use this as a starting point and not an absolute truth. We'll include the URL for this capacity planning tool in the course notes.
Finally, I'd like to call your attention to a paper written by some members of the DocumentDB team highlighting the total cost of ownership benefits of DocDB in a few key data scenarios. In the paper, the TCO of DocumentDB is compared to Apache Cassandra running both in the cloud on virtual machines and in an on-premises data center environment, as well as compared with Amazon DynamoDB. The examined scenarios originally appeared in a white paper published by the Amazon DynamoDB team. The results indicate a significant price performance benefit for use of a managed NoSQL service like DocumentDB or DynamoDB. The paper itself is short, approachable, and easy to read. I'd encourage you to download it and determine for yourself if the results are reflective of the scenarios you can benefit from. Again, we'll include the URL in the show notes.
About the Author
Josh Lane is a Microsoft Azure MVP and Azure Trainer and Researcher at Cloud Academy. He’s spent almost twenty years architecting and building enterprise software for companies around the world, in industries as diverse as financial services, insurance, energy, education, and telecom. He loves the challenges that come with designing, building, and running software at scale. Away from the keyboard you'll find him crashing his mountain bike, drumming quasi-rythmically, spending time outdoors with his wife and daughters, or drinking good beer with good friends.