Getting the Most from DocumentDB
An Introduction to Azure DocumentDB
It's been common, if inconsistently applied, knowledge for many years that relational databases are a less-than-ideal fit for some types of software problems. Indeed, entire categories of software development tooling, such as object-relational mappers (ORMs), exists to bridge the gap between highly normalized relational data and in-memory, object-oriented representations. In practice, ORMs can create as much complexity as they alleviate, so developers began looking at the relational database itself as ripe for potential disruption.
Thus came the rise of NoSQL and databases that eschew the traditional rows/columns/tables/foreign keys metaphor for other choices like JSON document stores, graph databases that represent data and relationships as nodes with connecting edges, key/value stores that act as a glorified hashtable, and others. The wide range of options meant you could choose the right tool for your particular needs, instead of trying to squeeze a relational database square peg into your application's round hole. Solutions like MongoDB, Cassandra, Redis, and Neo4j rose to prominence and became de facto industry standards for developers interested in leveraging the power and flexibility NoSQL.
While NoSQL was a boon to software developer productivity, the initial product offerings did little to alleviate the administrative burden of managing your database. Server provisioning, backups, data security at-rest and in-transit... all these challenges (and many more) remained as developers adopted NoSQL in greater numbers. Fortunately for them and all of us, the rise of the cloud and managed database service offerings like Azure DocumentDB brought us the best of both worlds: fast, flexible, infinitely-scalable NoSQL with most of the administrative headaches assumed by a dedicated team of experts from Microsoft. You focus on your data and your application, and rely on a 99.99% SLA for the rest!
In this "Introduction to Azure DocumentDB" course, you’ll learn how to use Azure DocumentDB (DocDB) in your applications. You'll create DocDB accounts, databases, and collections. You'll perform ad-hoc and application-based queries, and see how features like stored procedures and MongoDB protocol support can help you. You'll also learn about ideal DocDB use cases and the pricing model. By the end of this course, you’ll have a solid foundation to continue exploring NoSQL and DocumentDB.
An Introduction to Azure DocumentDB: What You'll Learn
|Lecture||What you'll learn|
|Intro||What to expect from this course|
|DocumentDB Overview||A high-level overview of the DocumentDB feature set|
|Overview of Managing DocumentDB||A discussion of DocumentDB features for managing resources, data, scalability, configuration, and so on|
|Creating an Account||Creating a top-level DocDB account in the Azure portal|
|Creating a Collection||Creating and configuring a DocDB collection in the Azure portal|
|Importing Data||Discussion and demonstration of moving data into a DocDB collection|
|Overview of Developing with DocumentDB||A discussion of DocumentDB features from a development point of view|
|SQL Queries||How to author queries in the Azure portal|
|Programming with DocumentDB||Reading and writing data in code, using the .NET SDK|
|Stored Procedures||Authoring DocDB stored procedures and executing them using the DocDB REST API|
|MongoDB Protocol Support||Configuring and using DocDB's MongoDB protocol support|
|Use Cases||A brief discussion of scenarios well-suited for DocDB use|
|Pricing||A review of the DocDB pricing model, and discussion of cost estimation and Total Cost of Ownership|
|Ecosystem Integration||A short review of DocDB integration with other Azure services|
|Summary||Course wrap up|
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
Welcome back. Now, let's take a closer look at what it looks like to program against DocumentDB. Specifically, I'll use the .net SDK here, but recall that there are other SDKs for programming environments like NOJS, Python, Java, and others. So for this demonstration, what I'd like to do is, in addition to showing you code, is also demonstrate the geo-redundancy, or geo-replication features of DocDB. So what I'd like to do is click on the 'Replicate Data Globally' tab here and show you that I've configured ... I've gone ahead and configured my DocDB account to geo-replicate itself across a couple different regions. You'll see that my primary region, my primary write region, is Western Europe, but I also have two additional read regions in Central US and Western US. So what I'm going to do is show you some code that will read from the two read regions at the same time that I'm destroying documents in the write region. And we should see that the count for the collections should go down quite quickly and sort of stay in sync as those changes are made in the Write region.
So let's flip over to Visual Studio. I'm using Visual Studio 2017, but certainly this will work in other cases as well. The DocumentDB .net SDK has been out for several years. So, in my project, I have two small console applications. First, I have a reader application and a remover application. These do exactly what they sound like. The reader simply reads a collection and gets the total number of documents in that collection and writes the count to the console. And it just does that in a loop. I'll show you the code in a moment. The remover does exactly what it sounds like. This is the code that will remove ... Connect to a collection and remove a document every so many seconds from the primary master replica.
So, walking through this code briefly, the first thing that I do is I need to get some configuration information so that I can connect to DocumentDB. So I'm just pulling that from my Config file here. I have this class called 'Destroyer of Worlds', and this is my class, which is going to ... This is where my logic resides to delete the documents every few seconds. I'll show you that in just a second. This is where I'm just specifying some additional connection config information for my connection to DocDB. And here's my main ... The main logic in my console lab. This simply spins up a task or a background thread and just loops endlessly. And every three seconds, wakes up and invokes the logic on the destroyer to go ahead and remove one of the documents from the collection. So if we skip down to that code, this is fairly simple. First thing we do is we spin up a free giteration of the loop. We spin up a DocumentDB client so that we can connect to DocDB. We specify the collection URI. We create the collection URI. This is the collection that we want to remove a document from.
The first thing we do is we're just going to grab just a random document. So I'm just saying, 'select top one star from the collection'. Which is, again, I'm spreading this out across all partitions of my collection. This is a multi-partition collection. So I'm just going to grab the first one that comes back, and I'm going to use that. So I execute the query by calling 'client.create document query'. Specify the collection that I want to talk to. And the sequel that I want to execute for the query. When that comes back, the result come back as an array. So I just grab the first item from the array. And the ... One other thing to note is that, when the document comes back, documents ... These documents have, in DocumentDB, have an underscore self property. And this is basically a URI self-reference that I can use to refer to that same document in subsequent queries, or subsequent operations, which is exactly what I'm going to do here because this is the document that I'm going to next delete.
So, in order to delete that document, I have to specify a partition key. Because this is a partition collection, I have to tell DocumentDB use ... For this delete operation, do that ... Perform this delete operation against this specific partition. And this is just one of the things that you have to specify when you're using a partition collection. You can't perform a write operation that might work against any possible partition. You have to specify this specific partition that you want to work against. And so finally, to delete a document, it's very simple and straightforward. I just call 'delete document a-sync'. This is an asynchronous method. So I'm ... Because I'm just in a console application, I'm just calling wait to wait for that method to complete. Obviously, if I were in a web application or some other server application, I wouldn't use wait here, but per kind of standard .net programming best practices, I would instead say 'defer execution' and wait for the task that comes back from the asynchronous method. Wait for that to complete before moving on, or just wait for ... Wait for that to complete, and then use another thread to complete execution. But in this case, it's okay for me to wait. So I specify the document URI. Again, that self URI that I get from the document when I read it. I specify that here, I specify my request options, and then just delete the document. So again, very simple. That's all I'm doing here is, again, just waking up every few seconds and deleting another document. And then we'll just, you know, dump something into the console to let us know, 'Hey, we deleted another one'. So, that's the remover application.
So now let's switch over, and we'll look at the reader. The reader is similar. It essentially spins up, gets some connection configuration information, and then runs in a loop. And, again, every so many seconds, it's just examining the target collection, and then returning the total count of documents from that collection, and then dumping that to the console. A couple other specific details or differences between this and the other application. One of them is, that you'll notice here, I have opportunity to specify the read region that I want to use. Because all I'm doing is reading and I'm not doing any write, I can actually perform reads against any replica that I want to. And this is a great feature of DocumentDB when you're using geo-replicated accounts and databases and collections. With a .net SDK, you can specify the location or even the set of locations that you are interested in reading from. And so this is really great. So, for instance, if you have ... Say you have users that span multiple geographies, and you'd like to give them kind of an optimal experience for, you know, kind of the lowest possible latency for reading data from your collections. Then, the .net SDK allows you, again, to specify those locations that you'd like those specific users to be targeting for read operations. So we're going to leverage that behavior exactly in this code so that we can see the counts in the different regions go down as we're deleting from the master region, which, again, is in Western Europe. So ... Again, this looks very similar. We just have an endless loop. Every three seconds, we wake up. And we're just going to execute some logic. And again, pretty similar.
We're going to connect to DocumentDB. We're going to specify the collection that we want. And then all we're doing here, we're just leveraging the aggregate syntax support in DocDB to return the count of the collection. Because it is a partition collection, I have to enable cross-partition queries. But I issued a query. And then I'm returning ... I'm really, I'm returning two things from this execute method. I'm returning both the actual count value that I get back, but I'm also returning the read endpoint. So this is the URI of the specific replica in DocDB that I'm reading from. And, again, I'm going to dump both of these things to the console window so that you can see nothing up my sleeve. I'm actually reading from ... I'm not reading just from the master replica, I'm actually reading from the secondary replicas. Okay, so that's enough preamble. And you ... Hopefully, you get a sense of kind of ... This is obviously fairly basic code, as far as reading and writing data from DocDB, but hopefully it gives you a bit of a feel for what it looks like, certainly from C Sharp, and using the .net SDK.
So let's fire these up and actually witness some of this ... The behavior that we're describing here. So I will spin up a couple instances of the reader application. So it's asking me what region I want to read from. So I'm going to pick Central US. Recall that this is one of my secondary replicas. It's just a 'read-only' replica. I can't actually write to this to this region. Write and DocumentDB always go to the master, which in this case, is Western Europe. But because I can read from anywhere where I have a replica, I'm going to pick Central US in this case. So that'll start up here in a moment. So I also want to read from West US. If I can type. There we go. So let's take a look at both of these. And we should see ... Yeah. We should see, if I zoom in a bit. We should see for Central US, we have 29,398 documents in Central US. And we have the same 29,398 in West US as well. So, that's good. That means DocumentDB is doing the right thing. It's replicating ... It has replicated data across all of my regions and all of my replicas so that they're all in sync. So now let's fire up the remover application, which is just going to start removing documents from Western US. And then we should see the count start to go down in the other regions. So we've destroyed one already. So yeah. You can already see in ... Certainly in West US, we have 29,397. And the same 29,396 in this case now, since we're continuing in Central US. And this will just keep going, again, as we delete more and more documents every few seconds.
About the Author
Josh Lane is a Microsoft Azure MVP and Azure Trainer and Researcher at Cloud Academy. He’s spent almost twenty years architecting and building enterprise software for companies around the world, in industries as diverse as financial services, insurance, energy, education, and telecom. He loves the challenges that come with designing, building, and running software at scale. Away from the keyboard you'll find him crashing his mountain bike, drumming quasi-rythmically, spending time outdoors with his wife and daughters, or drinking good beer with good friends.