Getting the Most from DocumentDB
It's been common, if inconsistently applied, knowledge for many years that relational databases are a less-than-ideal fit for some types of software problems. Indeed, entire categories of software development tooling, such as object-relational mappers (ORMs), exists to bridge the gap between highly normalized relational data and in-memory, object-oriented representations. In practice, ORMs can create as much complexity as they alleviate, so developers began looking at the relational database itself as ripe for potential disruption.
Thus came the rise of NoSQL and databases that eschew the traditional rows/columns/tables/foreign keys metaphor for other choices like JSON document stores, graph databases that represent data and relationships as nodes with connecting edges, key/value stores that act as a glorified hashtable, and others. The wide range of options meant you could choose the right tool for your particular needs, instead of trying to squeeze a relational database square peg into your application's round hole. Solutions like MongoDB, Cassandra, Redis, and Neo4j rose to prominence and became de facto industry standards for developers interested in leveraging the power and flexibility NoSQL.
While NoSQL was a boon to software developer productivity, the initial product offerings did little to alleviate the administrative burden of managing your database. Server provisioning, backups, data security at-rest and in-transit... all these challenges (and many more) remained as developers adopted NoSQL in greater numbers. Fortunately for them and all of us, the rise of the cloud and managed database service offerings like Azure DocumentDB brought us the best of both worlds: fast, flexible, infinitely-scalable NoSQL with most of the administrative headaches assumed by a dedicated team of experts from Microsoft. You focus on your data and your application, and rely on a 99.99% SLA for the rest!
In this "Introduction to Azure DocumentDB" course, you’ll learn how to use Azure DocumentDB (DocDB) in your applications. You'll create DocDB accounts, databases, and collections. You'll perform ad-hoc and application-based queries, and see how features like stored procedures and MongoDB protocol support can help you. You'll also learn about ideal DocDB use cases and the pricing model. By the end of this course, you’ll have a solid foundation to continue exploring NoSQL and DocumentDB.
An Introduction to Azure DocumentDB: What You'll Learn
|Lecture||What you'll learn|
|Intro||What to expect from this course|
|DocumentDB Overview||A high-level overview of the DocumentDB feature set|
|Overview of Managing DocumentDB||A discussion of DocumentDB features for managing resources, data, scalability, configuration, and so on|
|Creating an Account||Creating a top-level DocDB account in the Azure portal|
|Creating a Collection||Creating and configuring a DocDB collection in the Azure portal|
|Importing Data||Discussion and demonstration of moving data into a DocDB collection|
|Overview of Developing with DocumentDB||A discussion of DocumentDB features from a development point of view|
|SQL Queries||How to author queries in the Azure portal|
|Programming with DocumentDB||Reading and writing data in code, using the .NET SDK|
|Stored Procedures||Authoring DocDB stored procedures and executing them using the DocDB REST API|
|MongoDB Protocol Support||Configuring and using DocDB's MongoDB protocol support|
|Use Cases||A brief discussion of scenarios well-suited for DocDB use|
|Pricing||A review of the DocDB pricing model, and discussion of cost estimation and Total Cost of Ownership|
|Ecosystem Integration||A short review of DocDB integration with other Azure services|
|Summary||Course wrap up|
If you have thoughts or suggestions for this course, please contact Cloud Academy at email@example.com.
Let's begin with an overview of DocumentDB features and a comparison with the typical relational database setup. Azure DocumentDB is a cloud-hosted managed database service, this means that all infrastructure needed to run and scale DocumentDB is provisioned automatically for you and maintained by Microsoft Azure datacenter personnel. You have no burden to manually backup or replicate your data for high availability or disaster recovery purposes. In fact, DocumentDB provides these capabilities automatically. Two rolling backup copies are maintained at all times, backups are generated on a four-hour cycle and have no impact on availability or performance of your live database. Your DocumentDB collections are replicated both locally, within a designated Azure region, and optionally across regions, if you wish your data to span geographic locations, for disaster recovery or application performance reasons.
Another aspect of DocumentDB's managed service offering is support for horizontal scaling via automatic data partitioning. When creating a collection you can designate a JSON document path whose value acts as a partition key. DocumentDB will then distribute your partition documents across physical infrastructure to maximize throughput and availability. Aside from a few small configuration settings, this partitioning is entirely transparent to you. Similar techniques can, of course, be employed with other self-hosted database solutions, but in those cases, the administrative and maintenance burden of partitioning for high scale falls to you.
Finally, DocumentDB pricing is computed on a per unit cost basis, across two axes, the amount of storage consumed by your data and the throughput you reserve in anticipation of the needs of your specific application or scenario. Contrast this with the self-hosted database service where the cost of the database is typically associated with capital infrastructure expense, software licensing and salaries for administrative personnel. We'll cover more on pricing a bit later in the course.
Azure DocumentDB is a NoSQL database. You may be familiar with relational databases that store data across formally defined tables, rows and columns and use foreign key relationships to link data elements together. NoSQL databases come in many flavors, but most do not have formal notions of rows, columns and explicit relationships, nor do most have a notion of predefined data schema. Instead, data is grouped together in documents, key-value pairs, or as nodes in a connected graph. These data structures fit naturally with many specific business and scientific problems and for some use cases are much better suited than a classic relational model. More on NoSQL and DocumentDB use cases later on in the course.
As the name implies, a document is the atomic data unit in DocumentDB, documents are simply arbitrary JSON text up to two megabytes in size. Documents have no predefined schema, at least none enforced by DocumentDB. All attribute value pairs in the document are automatically indexed upon insert or update into DocumentDB. This allows SQL queries to operate efficiently against document contents. Backing up a bit, the topmost resource in DocumentDB is an account. Accounts are a logical container of one or more databases and provide the ability to set configuration and replication across multiple databases at once. Databases are merely logical containers for document collections, nothing more, nothing less. Collections are where much of the action occurs in DocumentDB. Collections contain one or more documents and are in some sense roughly equivalent to tables in a relational database. Collections are the scope of billing within DocumentDB, that is, you configure estimated storage and throughput capacity on a per collection basis. As noted previously, collections can be partitioned and typically are in a real world scenario, though at this time partitioning is optional. Finally, remember that DocumentDB is a schema-less database, so there is no requirement that all documents within a collection have the same JSON shape. It's often convenient and cost-effective in fact to store many document types, that is documents with different shapes within the same collection.
Here's a bit more context on some of the differences between a managed NoSQL service like DocumentDB and a typical self-hosted relational database. Some of this we've already covered, but a few key points, as stated previously DocumentDB auto-indexes all document contents by default. If you wish to constrain this behavior so that only some document content is indexed you can do so. Note that, in general, an indexed definition in a relational database is a very manual process and a frequent source of performance and scalability problems. DocumentDB maintains data partitions on your behalf, while some relational databases have support for similar behavior, note that a primary benefit of the relational model, that is the enforcement of referential integrity via foreign keys, is not possible across partition boundaries. In many such cases, a NoSQL database like DocumentDB could be a better choice.
In DocumentDB data consistency is configurable across the logical collection footprint, whether that's local to one region or across multiple regions. In a relational database, consistency is configurable only within the scope of a single transaction and typically local to a given physical server. And finally, note that DocumentDB only supports transactions within a single data partition, and also does not support pessimistic data locks. These are design choices meant to optimize throughput and scale of the service. Self-hosted relational databases often support local and distributed transactions and optimistic or pessimistic locking behavior. In spite of developer familiarity with these techniques, distributed transactions and strong locking work poorly in high scale cloud-enabled applications and are no longer considered ideal practices.
About the Author
Josh Lane is a Microsoft Azure MVP and Azure Trainer and Researcher at Cloud Academy. He’s spent almost twenty years architecting and building enterprise software for companies around the world, in industries as diverse as financial services, insurance, energy, education, and telecom. He loves the challenges that come with designing, building, and running software at scale. Away from the keyboard you'll find him crashing his mountain bike, drumming quasi-rythmically, spending time outdoors with his wife and daughters, or drinking good beer with good friends.