Getting the Most From Azure Storage
The course is part of these learning pathsSee 3 more
The Azure Storage suite of services form the core foundation of much of the rest of the Azure services ecosystem. Blobs are low-level data primitives that can store any data type and size. Tables provide inexpensive, scalable NoSQL storage of key/value data pairs. Azure queues provide a messaging substrate to asynchronously and reliably connect distinct elements of a distributed system. Azure files provide an SMB-compatible file system for enabling lift-and-shift scenarios of legacy applications that use file shares. Azure disks provide consistent, high-performance storage for virtual machines running in the cloud.
In this Introduction to Azure Storage course you'll learn about the features of these core services, and see demonstrations of their use. Specifically, you will:
- Define the major components of Azure Storage
- Understand the different types of blobs and their intended use
- Learn basic programming APIs for table storage
- Discover how queues are used to pipeline cloud compute node together
- Learn to integrate Azure files with multiple applications
- Understand the tradeoffs between standard/premium storage and unmanaged/managed disks
Okay so let's take a little bit of a deeper dive into the table storage service. Specifically what I'd like to do is I'd like to walk you through some code that reads and writes data to and from the table storage service, and this code will look fairly similar to other code you might use for, say, the blob service or the queue service or files, that sort of thing, some of the details are a bit different of course because the nature of the data is slightly different but a lot of the patterns do follow and do kind of flow through all of the different services as far as their APIs go so this should be a good representative sample.
Now this code is written in C# targets the . NET framework, however, remember that there are SDKs for Node, Java, C++, Python, and some others, and of course you could also use the REST APIs to talk directly to the storage service from many platform that can make HTTP calls so anything I'm about to do here, you can do on again any platform that can talk HTTP.
Okay so the specific feature that I want to demonstrate here is the storage service's notion of concurrency control using ETags and so basically what happens is every time you update a value or every time you read a value from the storage service, you're given a unique identifier essentially for that version of that piece of data, and so then when you go to update that piece of data, you pass in that ETag, that version identifier, and the storage service compares that to its current value that it's tracking for the version identifier of that piece of data, and if those match, if yours matches the one that's most current, then the storage service of course will allow your update because it can be assured that there have been no other changes that have been made since you last read that piece of data, however, of course if there are changes, the ETags don't match.
In other words, you have a stale or out of date ETag, then the storage service will disallow your update and provide you with an error message that you can use to then go read the latest. You can recognize that error, go read the latest value of that piece of data, and then perhaps attempt your update again so that's what I'm going to demonstrate here.
I'll walk this code a line at a time. Don't worry too much if you don't know C# or you don't know . NET or you're not even really a programmer per se. I will kind of walk line by line. I'll explain what's happening and again just try to get the big picture of how multiple clients accessing the same data, how that's handled with the storage service, just an important kind of basic concept in the Azure storage service.
Okay so let's go ahead and fire this up, and I'll step through a line at a time. These first few lines, we're just creating essentially a connection object so that we can connect to our table storage service using a secure connection string so that's all that's happening here. Now I'm asking for a reference to a table in table storage.
The table that I want to create is called PurchaseOrders, and if it doesn't already exist, then I want to create it. In this case, I believe it already exists but again that line of code there checks to see if it exists and will create it for me if it doesn't. So now I need to create an entity or if you recall, a row in table storage.
Again, an entity corresponds to a row in a table, and so an entity has two main properties that allow you to kind of uniquely identify it. The first is the partition key. This is that first argument here. I can give any value I want, any string value. In this case, I'm using something like a customer number like customer 100.
The idea is that a partition key allows the storage service, the table storage service to horizontally scale your data across multiple physical partitions. In other words, if I'm going to insert, say, 50 million entities into my table, then table storage doesn't necessarily want to store all 50 million of those entities, all 50 million pieces of data in the same part of physical storage because that won't necessarily scale very well if I have 50 million data elements there and say I have five million client application instances that are trying to hit all that data.
I'm just making up the numbers here but the point is give it enough requests all trying to target the same amount of data on the same physical infrastructure, then that's how you run into scalability problems, and so horizontal partitioning is kind of a tried and true strategy for separating requests horizontally across increasing number or increasing amount of hardware and infrastructure, and so this partitioning is done by using this partition key, and what happens is any entities that have that same partition key will be physically grouped close together, and so what I'm sort of presupposing here or what I might design, if you will, is basically saying is that customers, what I want to do is I want to group my data physically by customers so all of the customer 100 records, no matter how many orders customer 100 places in my system over the course of my relationship with customer 100, all of the data for those orders will be stored in close proximity to one another so if I want to do things like aggregate all of the orders in some way to find out the total amount that customer 100 has spent with my company, those sorts of things, then those queries, those aggregations are very, very fast and very easy for table storage to resolve precisely because all that data is going to co-located.
It's going to be physically grouped close together. That's the partition key. The second value here, which I'm just using, again I'm making up a value here but I'm just using an order number, order 1,001. That's just a unique value within that partition so again if I try to insert another order after this, another entity after this one with the same partition key, customer 100 and the same order number, 1,001, then I would get an error from the table storage service saying, "Oh sorry, you already have a row key "with the same value".
Just be aware. When you're working with table storage, that's exactly the two most important pieces of information that you have the provide for your entities are that partition key and that row key. It kinda lets you uniquely identify this entity in this entire universe of your table storage account.
Okay so now we've got our partition key and our row key. Now we just need to fill out some of the other attributes of this entity. Again this corresponds to a purchase order so I'm just making some things up like a sales ID. This would be like the salesperson who sold the order, and then I'm just assuming there is a single element per purchase order so a description, unit cost, and quantity, and I'm just making up some values here.
They're not super important details but just know that that's what's happening. So now I want to insert this entity. I want to insert this order into my PurchaseOrders table so I create an insert operation and then call execute on my table, and if all goes well, I don't get any errors, then sure enough, my entity is written to my table so I'm just writing something to the console here so you can see my application running here on the console.
It says, "Oh you're entity was written. "Please hit enter to proceed". So what we're gonna do is when I hit enter, I'm going to be prompted to kind of keep updating this, kinda continuously updating this same entity, the same order with an updated quantity, and this is just sort of an artificial example of course but the idea is, what I want to demonstrate is that when you have two applications that are examining and working with the same piece of data, trying to update the same piece of data in table storage, then we need to make sure that we have or working with the latest copy of that entity if we want to update it, otherwise we get these concurrency violations.
This is my first application. This is my custom code that I've written here. For my second application, I want to use the storage explorer so I'm going to go open up the storage explorer. I have it already. I have it already looking at my current, this storage account so if I zoom in here, you can see that I got this storage account.
I happen to have access to everything in there so you can kind of see blobs, tables, queues, et cetera, and so you can see my PurchaseOrders table is in there so when I click on that, let me zoom back out, and now you can see here is my entity in the table so you can see that my partition key is customer 100, my row key is order 1,001.
Let me zoom in here in case you can't see it, and you can see the other attributes as well, and most importantly, you can see a quantity of 50. What I want to do, I'll switch back to my custom code here and just kinda do a happy path. We'll update the quantity. I'll hit enter. Yeah, we're gonna step through.
What I'm gonna do is I'm going to hit F5 in my debugger and that just kinda run through the rest of this code. That way, we won't have to step through it but the application is going to ask me, "So what do you want to change the order quantity to"? So I'd like to change it, instead of 50, I'll change it to something like 34.
It doesn't really matter. Hit enter. I didn't get an error. Basically it's just gonna keep asking me, "Okay, what do you want to change it to now"? That sort of thing. Obviously not the world's best application but you kinda get the basic idea of what's happening here. If I go back to the storage explorer and I refresh this entity, then you can see now that the quantity is 34 instead of 50.
It looks like so far so good, right? Okay so now let's make this a little bit more complicated and let's edit the quantity here in storage explorer so I'm gonna click edit and I'm going to come down to quantity. Instead of 34, I'm gonna change it here. I'll make it something like, it doesn't matter again but I'll make it 300.
I'll update it. Okay, I zoom back out. Okay so now we can see that it's 300. Now it's 300 and I have the latest version of that entity here in storage explorer. That's what I'm looking at. This is right where I just edited that value and now it's been properly refreshed, however, it hasn't been refreshed in my custom application yet because all it's been doing is it's just been sitting here waiting for me to update the quantity since the last time it read the value, which was prior to me updating the value in storage explorer so what I'm gonna try to do is I'm gonna try to update the quantity yet again here in my custom application, and I'll just change it to something like 11, something like that, and this time of course, we get a problem.
We actually get an error saying, "The remote server returned an error, "412, preconditioned failed". 412 happens to be the HTTP status code, the error code that's returned, and precondition failed, that's kind of a obscure, it's not a super helpful, super friendly error message per se but basically what this means, what it's saying is that that ETag comparison failed because I had previously read a value, I had updated the value of this entity in the database and then had the latest of that entity return to me, to my custom application and I had a value for that ETag but then I updated the value in the second application in the storage explorer but that meant that in my custom application, I didn't have the latest value anymore.
I didn't have the latest value of that ETag in particular so when I try a subsequent update operation from there, it fails and I get this error message. The way my code works, if I go back to the code real quick, which I did the part that I didn't step through, this is where I attempt the update operation here and I have an error catching mechanism that looks for the presence of any error, and if I get an error, it dumps it to the console, which is what we saw, and then I retrieve the latest version of the entity in that case.
I realized, "Okay. "Something has gone wrong. "I need to refresh my version of the entity "so then I can kind of keep updating it if I want to". So we refresh it there in the code, and then we kinda loop back to the top, and then ask again. "Okay, what do you want to change the order to now"? Or the quantity to now.
We should be able to update it from here now that we've received the error and we've refreshed. So yeah, I changed it to 65, and sure enough, I don't get the error this time. It looks like all was well. Just to verify, I'll go back to storage explorer and refresh, and yes, you can see that the quantity is 65.
Okay so hopefully that gives you a little bit of a sense of how concurrency control works in table storage, and again this works the same in blob storage as well. It uses the same underlying mechanism for doing concurrency control there so you can rely on the same ETag mechanism when you're talking to blobs as well.
About the Author
Josh Lane is a Microsoft Azure MVP and Azure Trainer and Researcher at Cloud Academy. He’s spent almost twenty years architecting and building enterprise software for companies around the world, in industries as diverse as financial services, insurance, energy, education, and telecom. He loves the challenges that come with designing, building, and running software at scale. Away from the keyboard you'll find him crashing his mountain bike, drumming quasi-rythmically, spending time outdoors with his wife and daughters, or drinking good beer with good friends.