Getting the Most from DocumentDB
An Introduction to Azure DocumentDB
It's been common, if inconsistently applied, knowledge for many years that relational databases are a less-than-ideal fit for some types of software problems. Indeed, entire categories of software development tooling, such as object-relational mappers (ORMs), exists to bridge the gap between highly normalized relational data and in-memory, object-oriented representations. In practice, ORMs can create as much complexity as they alleviate, so developers began looking at the relational database itself as ripe for potential disruption.
Thus came the rise of NoSQL and databases that eschew the traditional rows/columns/tables/foreign keys metaphor for other choices like JSON document stores, graph databases that represent data and relationships as nodes with connecting edges, key/value stores that act as a glorified hashtable, and others. The wide range of options meant you could choose the right tool for your particular needs, instead of trying to squeeze a relational database square peg into your application's round hole. Solutions like MongoDB, Cassandra, Redis, and Neo4j rose to prominence and became de facto industry standards for developers interested in leveraging the power and flexibility NoSQL.
While NoSQL was a boon to software developer productivity, the initial product offerings did little to alleviate the administrative burden of managing your database. Server provisioning, backups, data security at-rest and in-transit... all these challenges (and many more) remained as developers adopted NoSQL in greater numbers. Fortunately for them and all of us, the rise of the cloud and managed database service offerings like Azure DocumentDB brought us the best of both worlds: fast, flexible, infinitely-scalable NoSQL with most of the administrative headaches assumed by a dedicated team of experts from Microsoft. You focus on your data and your application, and rely on a 99.99% SLA for the rest!
In this "Introduction to Azure DocumentDB" course, you’ll learn how to use Azure DocumentDB (DocDB) in your applications. You'll create DocDB accounts, databases, and collections. You'll perform ad-hoc and application-based queries, and see how features like stored procedures and MongoDB protocol support can help you. You'll also learn about ideal DocDB use cases and the pricing model. By the end of this course, you’ll have a solid foundation to continue exploring NoSQL and DocumentDB.
An Introduction to Azure DocumentDB: What You'll Learn
|Lecture||What you'll learn|
|Intro||What to expect from this course|
|DocumentDB Overview||A high-level overview of the DocumentDB feature set|
|Overview of Managing DocumentDB||A discussion of DocumentDB features for managing resources, data, scalability, configuration, and so on|
|Creating an Account||Creating a top-level DocDB account in the Azure portal|
|Creating a Collection||Creating and configuring a DocDB collection in the Azure portal|
|Importing Data||Discussion and demonstration of moving data into a DocDB collection|
|Overview of Developing with DocumentDB||A discussion of DocumentDB features from a development point of view|
|SQL Queries||How to author queries in the Azure portal|
|Programming with DocumentDB||Reading and writing data in code, using the .NET SDK|
|Stored Procedures||Authoring DocDB stored procedures and executing them using the DocDB REST API|
|MongoDB Protocol Support||Configuring and using DocDB's MongoDB protocol support|
|Use Cases||A brief discussion of scenarios well-suited for DocDB use|
|Pricing||A review of the DocDB pricing model, and discussion of cost estimation and Total Cost of Ownership|
|Ecosystem Integration||A short review of DocDB integration with other Azure services|
|Summary||Course wrap up|
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
Welcome back. Let's take a closer look at DocumentDB support for SQL query syntax. From inside the portal on the account tab, let's take a look at query explorer. Just click there. And going to highlight my postal codes collection. So you can see right away that we have something that looks pretty familiar to anybody who's ever done any sort of SQL programming or written SQL queries. The C in this case select star from C, the C represents the collection itself. So if we merely run this query as is, then we won't get every single document, we'll get sort of a page set of documents but, if we look on the right-hand pane here we can see the JSON that corresponds to the first 100 or so documents that are pulled up. And we can page through these if we want to. Of course, that's pretty basic and not terribly interesting from a query perspective, so let's do something a little bit more interesting. Perhaps we can filter these documents. You can see where C.state equals Georgia, You can see where C.state equals Georgia, which happens to be where I live. So we run the query, and sure, we get a filtered set. Looks like we still have more than 100, we can keep paging, but you can see that we have documents in here, where state equals GA for Georgia. Okay, so that's a bit on filtering so that's fine.
So, of course, that's fairly simple filtering so let's do something a little bit more complex. We'll use the in keyword, so we'll say select star from C, where C.state is in, from C, where C.state is in, and then we'll pick a range of states. We'll say Georgia, Florida, and Utah. We'll say Georgia, Florida, and Utah. So now if we run this, and of course what we'll see first is a bunch of states probably in, yeah in Georgia so if we, might have to page through a bit to get to some results in a different state. Keep going, yeah okay, now here we are so we've got, here are some results from the state of Florida as well. Okay, so you get the idea there. So we can pick a filter with a range of values. We can also do order bys so perhaps we can do something like we'll go back to a simpler query, we'll say where's state equals New York we'll say where's state equals New York and, oh we'll say order by. We have a population field so let's use that. order by population and we'll go descending, so we'll start with the most populated city first. So if we run this query, then yes, okay now we have the city of Brooklyn, has a population over 100,000, and you can see that the population numbers go down as we kind of go through this list. So sure, now we've kind of ordered our results.Again this all fairly standard for anybody familiar with SQL but just to give you an idea of the kinds of things that DocDB supports. Document DB doesn't support every single Document DB doesn't support every single kind of antsy SQL feature or keyword, but it does support a number of them and the documentation is very thorough online to give you a detailed kind of perspective on exactly what's supported and what isn't supported.
So continuing on we can do some additional things like some comparison operators, we can say where select star, where population is greater than, we can say 50,000, say. where population is greater than, we can say 50,000, say. So just something like that. And sure, so here's population just over 50,000. And sure, so here's population just over 50,000. Over 50,000, over 50,000, and so on. So you get that. Let's see what else do we have. Oh, we can also say top so we can say top, maybe we only want the top five, and we'll run this way. and we'll run this way. And we'll just instead of getting all the results we'll just get five of them. So you can see there are only five documents here. The other interesting thing we can do is getting into some things that you can't necessarily usually do with certainly all relational documents is we can actually return instead of kind of returning an entire document we can return subsets so we can kind of do a bit of projection if we want to. So instead of returning everything I'm going to say city, from C.city and state, say city, from C.city and state, from C.state. Oops, don't need that, there we go. And now we'll give this a name as location. We don't want this star. From C, where C.population is greater than 50,000. Sure, we'll keep that. So if we run this, then you can see instead of getting the entire document back that matches our filter criteria the entire document back that matches our filter criteria then we're just getting a projection. We're just getting a subset of the data that we care about so we have this kind of top level location element and its value is this compound element city and state.
So the last thing that I wanted to show is DocumentDB's support for aggregates, specifically, DocumentDB supports count sum, min, max, and average. count sum, min, max, and average. These are relatively new additions to the feature set for DocumentDB that wasn't something that it originally supported. So to use say count, we would say something like this So to use say count, we would say something like this to give us the number of documents or sorry, the number of cities with a population greater than 50,000 in our collection. So if we run this, then sure we get a value of 449. You can imagine that for a partition collection, where we have data across multiple physical nodes, there's a bit of extra complexity involved in actually supporting functionality like this, which is one of the reasons that DocumentDB didn't originally support it.
About the Author
Josh Lane is a Microsoft Azure MVP and Azure Trainer and Researcher at Cloud Academy. He’s spent almost twenty years architecting and building enterprise software for companies around the world, in industries as diverse as financial services, insurance, energy, education, and telecom. He loves the challenges that come with designing, building, and running software at scale. Away from the keyboard you'll find him crashing his mountain bike, drumming quasi-rythmically, spending time outdoors with his wife and daughters, or drinking good beer with good friends.