Getting the Most from DocumentDB
An Introduction to Azure DocumentDB
It's been common, if inconsistently applied, knowledge for many years that relational databases are a less-than-ideal fit for some types of software problems. Indeed, entire categories of software development tooling, such as object-relational mappers (ORMs), exists to bridge the gap between highly normalized relational data and in-memory, object-oriented representations. In practice, ORMs can create as much complexity as they alleviate, so developers began looking at the relational database itself as ripe for potential disruption.
Thus came the rise of NoSQL and databases that eschew the traditional rows/columns/tables/foreign keys metaphor for other choices like JSON document stores, graph databases that represent data and relationships as nodes with connecting edges, key/value stores that act as a glorified hashtable, and others. The wide range of options meant you could choose the right tool for your particular needs, instead of trying to squeeze a relational database square peg into your application's round hole. Solutions like MongoDB, Cassandra, Redis, and Neo4j rose to prominence and became de facto industry standards for developers interested in leveraging the power and flexibility NoSQL.
While NoSQL was a boon to software developer productivity, the initial product offerings did little to alleviate the administrative burden of managing your database. Server provisioning, backups, data security at-rest and in-transit... all these challenges (and many more) remained as developers adopted NoSQL in greater numbers. Fortunately for them and all of us, the rise of the cloud and managed database service offerings like Azure DocumentDB brought us the best of both worlds: fast, flexible, infinitely-scalable NoSQL with most of the administrative headaches assumed by a dedicated team of experts from Microsoft. You focus on your data and your application, and rely on a 99.99% SLA for the rest!
In this "Introduction to Azure DocumentDB" course, you’ll learn how to use Azure DocumentDB (DocDB) in your applications. You'll create DocDB accounts, databases, and collections. You'll perform ad-hoc and application-based queries, and see how features like stored procedures and MongoDB protocol support can help you. You'll also learn about ideal DocDB use cases and the pricing model. By the end of this course, you’ll have a solid foundation to continue exploring NoSQL and DocumentDB.
An Introduction to Azure DocumentDB: What You'll Learn
|Lecture||What you'll learn|
|Intro||What to expect from this course|
|DocumentDB Overview||A high-level overview of the DocumentDB feature set|
|Overview of Managing DocumentDB||A discussion of DocumentDB features for managing resources, data, scalability, configuration, and so on|
|Creating an Account||Creating a top-level DocDB account in the Azure portal|
|Creating a Collection||Creating and configuring a DocDB collection in the Azure portal|
|Importing Data||Discussion and demonstration of moving data into a DocDB collection|
|Overview of Developing with DocumentDB||A discussion of DocumentDB features from a development point of view|
|SQL Queries||How to author queries in the Azure portal|
|Programming with DocumentDB||Reading and writing data in code, using the .NET SDK|
|Stored Procedures||Authoring DocDB stored procedures and executing them using the DocDB REST API|
|MongoDB Protocol Support||Configuring and using DocDB's MongoDB protocol support|
|Use Cases||A brief discussion of scenarios well-suited for DocDB use|
|Pricing||A review of the DocDB pricing model, and discussion of cost estimation and Total Cost of Ownership|
|Ecosystem Integration||A short review of DocDB integration with other Azure services|
|Summary||Course wrap up|
If you have thoughts or suggestions for this course, please contact Cloud Academy at email@example.com.
Let's dive next into DocumentDB's Management and Configuration options and take a look at how to create accounts, databases, and collections in the Azure portal. DocumentDB guarantee's that 99% of reads and writes will execute within 10 and 15 milliseconds, respectively. Additionally, the service itself provides a four nines service-level agreement for availability. One of the highest for any managed cloud service in Azure or otherwise. These guarantees are supported for databases of a gigabyte or even terabyte size. The managed infrastructure of the DocumentDB's service combined with replication and partitioning behavior combine to make the effective size of a DocumentDB database virtually limitless.
As mentioned, DocumentDB supports both local and cross-region replication. It also supports data partitioning via a configured JSON document path. Meaning the resolved value of that path for each document is used to place that document in the correct physical partition in the DocumentDB service. This allows application load to be balanced across provisioned infrastructure. To support fast queries, DocumentDB indexes all JSON attributes and values. If desired, this behavior can be configured to reduce index update latency. DocumentDB supports an optional Time-to-Live value for documents to auto-purge data as it's no longer needed. There's also a full REST API for creating, configuring, and removing resources like accounts, databases, collections, documents and so on. And finally, it's important to understand that DocumentDB supports four possible data consistency levels. Configurable on a per request basis.
Let's explore those a bit further now. Most databases support a single consistency model, but DocumentDB supports four. This allows you to tune DocumentDB to work optimally in a variety of application scenarios. The consistency modes can be configured globally at the account level but also overridden on a per request basis, as needed. The first option is strong consistency, which is the model most familiar to those who know typical relational databases. Use of this model ensures fully consistent reads and writes for all users working on the same data. Reads will always return the latest committed updates and writes are committed only when a quorum of replicas acknowledges the change. So as you might guess, this model is the most expensive in terms of reserved throughput expense and also incurs the highest request latency. The second option, bounded staleness, allows for some stale data to be returned from a query within a certain time window or for a specified number of prior data revisions. This model intentionally trades off the potential visibility of stale data within those specified parameters in exchange for the ability to span a given collection across multiple Azure geographic regions. Something you can't do with strong consistency. The third option, session, is very useful for applications that have a formal notion of client session or where large subsets of application data are not shared across users or devices. This option provides full consistency guarantees but only within a single client session. This trades off behavioral limitations of the application and of the database in exchange for improved throughput in performance for applications that happen to fit this scenario. The last option, eventual consistency, is typical for many NoSQL databases where writes are asynchronously merged to all replicas and it's possible that prior to all replicas seeing those final changes, a client could see an older value of a piece of data after previously seeing a newer value of that same data. This model provides the lowest possible read and write latency in exchange for behavior that will work for some but admittedly not all application scenarios.
The take home point here is to be aware that these options exist and to think very explicitly about your application and which model works best for you. Finding an appropriate balance between consistency and throughput is one of the most critical issues for any application team targeting the cloud. DocumentDB gives you the knobs to twist to find that proper balance.
About the Author
Josh Lane is a Microsoft Azure MVP and Azure Trainer and Researcher at Cloud Academy. He’s spent almost twenty years architecting and building enterprise software for companies around the world, in industries as diverse as financial services, insurance, energy, education, and telecom. He loves the challenges that come with designing, building, and running software at scale. Away from the keyboard you'll find him crashing his mountain bike, drumming quasi-rythmically, spending time outdoors with his wife and daughters, or drinking good beer with good friends.