Getting the Most from DocumentDB
It's been common, if inconsistently applied, knowledge for many years that relational databases are a less-than-ideal fit for some types of software problems. Indeed, entire categories of software development tooling, such as object-relational mappers (ORMs), exists to bridge the gap between highly normalized relational data and in-memory, object-oriented representations. In practice, ORMs can create as much complexity as they alleviate, so developers began looking at the relational database itself as ripe for potential disruption.
Thus came the rise of NoSQL and databases that eschew the traditional rows/columns/tables/foreign keys metaphor for other choices like JSON document stores, graph databases that represent data and relationships as nodes with connecting edges, key/value stores that act as a glorified hashtable, and others. The wide range of options meant you could choose the right tool for your particular needs, instead of trying to squeeze a relational database square peg into your application's round hole. Solutions like MongoDB, Cassandra, Redis, and Neo4j rose to prominence and became de facto industry standards for developers interested in leveraging the power and flexibility NoSQL.
While NoSQL was a boon to software developer productivity, the initial product offerings did little to alleviate the administrative burden of managing your database. Server provisioning, backups, data security at-rest and in-transit... all these challenges (and many more) remained as developers adopted NoSQL in greater numbers. Fortunately for them and all of us, the rise of the cloud and managed database service offerings like Azure DocumentDB brought us the best of both worlds: fast, flexible, infinitely-scalable NoSQL with most of the administrative headaches assumed by a dedicated team of experts from Microsoft. You focus on your data and your application, and rely on a 99.99% SLA for the rest!
In this "Introduction to Azure DocumentDB" course, you’ll learn how to use Azure DocumentDB (DocDB) in your applications. You'll create DocDB accounts, databases, and collections. You'll perform ad-hoc and application-based queries, and see how features like stored procedures and MongoDB protocol support can help you. You'll also learn about ideal DocDB use cases and the pricing model. By the end of this course, you’ll have a solid foundation to continue exploring NoSQL and DocumentDB.
An Introduction to Azure DocumentDB: What You'll Learn
|Lecture||What you'll learn|
|Intro||What to expect from this course|
|DocumentDB Overview||A high-level overview of the DocumentDB feature set|
|Overview of Managing DocumentDB||A discussion of DocumentDB features for managing resources, data, scalability, configuration, and so on|
|Creating an Account||Creating a top-level DocDB account in the Azure portal|
|Creating a Collection||Creating and configuring a DocDB collection in the Azure portal|
|Importing Data||Discussion and demonstration of moving data into a DocDB collection|
|Overview of Developing with DocumentDB||A discussion of DocumentDB features from a development point of view|
|SQL Queries||How to author queries in the Azure portal|
|Programming with DocumentDB||Reading and writing data in code, using the .NET SDK|
|Stored Procedures||Authoring DocDB stored procedures and executing them using the DocDB REST API|
|MongoDB Protocol Support||Configuring and using DocDB's MongoDB protocol support|
|Use Cases||A brief discussion of scenarios well-suited for DocDB use|
|Pricing||A review of the DocDB pricing model, and discussion of cost estimation and Total Cost of Ownership|
|Ecosystem Integration||A short review of DocDB integration with other Azure services|
|Summary||Course wrap up|
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
Welcome back. Let's look a little bit now at Stored Procedures in DocDB and see what it looks like to both create them and call them. Now, you can create Stored Procedures from any of the languages or platforms that have SDK Support or any of the languages that can call the Management API, the REST API for DocDB. So you can create Stored Procedures that way if you like. I'm going to show you what it looks like in the portal.
So let me navigate down to Script Explorer from my top level DocDB account. And I'm going to click on a new collection that I've created called sales. And you can see that I have a Stored Procedure already called record sale. So I'm gonna click on this and show you what this looks like. The first thing that you'll note is that this is by no means not the most elegant editing experience that you're ever going to find, so I wouldn't necessarily recommend that you spend a ton of time authoring lots of Stored Procedures, you know very complex or very many Stored Procedures in this user interface. But certainly this is good for kind of small, kind of test scenarios where you're just trying to try a few things out. And there's also a bit of a testing experience in this as well, so if I zoom this back up, yeah, you can see that I can add inputs here and I can see results, so I can, there's a save and execute button up here that I could use if I wanted to. But we're not gonna do that, we'll just author this Stored Procedure here. I'll show you what this looks like and then we'll call it from elsewhere.
So, my record sale Stored Procedure is meant to do exactly what it sounds like. This collection is just modeling a simple kind of sales process, or the results of a sales process, where each document records a sales amount and also specifies a region, meaning north, south, east or west, where this sale occurred. So this is a partition collection and so I'm using that region property on all of the documents in the collection to do partitioning. And this is kind of like a, you know, kind of an easy way to hopefully provide a bit of kind of balanced partitioning and keep the size of my partitions relatively equal. Certainly there is a lot of guidance online for coming up with a good partition key, a partitioning strategy for your particular application scenario, and I'd encourage you to read that as you get more and more, as you dig further and further into DocDB. But just know that this, in my case, this sales collection does have one, it is partitioned by region.
So basically what the idea of this Stored Procedure is is that each time I want to record a sale, I pass in a document which corresponds to that sale, and I'm going to create the document, meaning I'm going to add it to my collection. But then I want to do a little bit of extra work as well. And so the pattern that I'm going to show you here, this is actually a fairly common pattern, if you're unfamiliar with NoSQL databases, you may be unfamiliar with this pattern. But recall that in part of our discussion earlier in the course, we discussed the notion that collections don't have to, all the documents in a collection don't have to have the same schema. In fact in some cases, it's quite useful to have multiple documents or different documents with different schemas in the same collection. So this is one of those cases, or this pattern is an example of one of those cases. So the idea is that most of the documents, in fact almost all of the documents in this collection will just be individual, kind of discreet sales documents, each representing a single sale. But what I'm also interested in doing is performing some pre-aggregation of sales information to kind of make it easier for me to go back and query aggregated information across all of my sales, at least by, all of my sales by each partition. The idea is I could, if I wanted to, if I wanted to get say, the total amount of sales that I've had in a region or the total number of sales, that sort of thing, I could obviously perform an Ad Hoc Query on a just-in-time basis and kind of compute that information. But you know, I have to scan across all of the documents, even with the fast indexing of DocumentDB, that's still a relatively expensive operation and again, remember that we're paying a per request unit cost for every query that we run in DocDB. So the idea is that even if we can do it quickly, it still may be relatively expensive and so the more we can do kind of ahead of time to pre-aggregate some of this information, then, in the long run, this will probably benefit us from a cost standpoint.
So ultimately, I've updated my stats and now I need to call replace document to, again, replace the old document, the old statistics document with this new one that has the updated information. So I call underscore, underscore replace document. And again, if I have an error then I'll throw that back to the client, otherwise I'm good, and I can stop. So some of you may be looking at this thinking well Josh, earlier in the discussion, we've talked a little bit about DocumentDB support for triggers, and this sounds like maybe something that a trigger could perform as well. And that's actually true, so I'm just showing you one example or one way that this could be done. You could certainly perform this kind of behavior using something else as well. So you don't necessarily take this as gospel, we just take this as one possible means to achieve this, this particular pattern. Okay, so I've created this Stored Procedure already, it's already in my collection and ready to go.
So the last thing I'll show you is just the body of my request. And if I zoom in, you can see that this is a JSON body, it's an array of parameters or array of data and it takes a single object. This is the document that I'm going to insert. It has a total of 200 and the region is south, so that's what I'm going to insert into my collection. So if I click send here ... Then you can see I don't get anything back in the body. If I zoom in you can see there's nothing, nothing comes back, but I did get an HTTP 200, which means that my request succeeded. So, if we go back to the portal, and I will click on document explorer. Navigate to my sales collection. And we can see that the most recent document is south and yeah, so here's our document here. DocDB's added an ID for us since we didn't add one ourselves. We're in the south region and we have a total of 200. So that's my document, that's good. So now let's check to see, did the stats document for the south actually, was that actually updated with our aggregated statistics. So if I look here I can see that this is the other document with a region of south, so this should be our statistics document. So if I click on this, then yeah, sure enough, we can see this is stats equals true, max sale is 200, sales total is 200, sales count is one. And we're in the south region. So if I go back and run this maybe one more time, we could say, make this 1300 ... And run this again. You can see I have a 200, so let's go back and refresh our stats document. There we go. We can see sales total is 1500, we've had two sales in the south region, and our max sale is 1300, which is the one that we just added just a moment ago.
Josh Lane is a Microsoft Azure MVP and Azure Trainer and Researcher at Cloud Academy. He’s spent almost twenty years architecting and building enterprise software for companies around the world, in industries as diverse as financial services, insurance, energy, education, and telecom. He loves the challenges that come with designing, building, and running software at scale. Away from the keyboard you'll find him crashing his mountain bike, drumming quasi-rythmically, spending time outdoors with his wife and daughters, or drinking good beer with good friends.