This course is focused on the portion of the Azure 70-534 certification exam that covers designing an advanced application. You will learn how to create compute-intensive and long-running applications, select the appropriate storage option, and integrate Azure services in a solution.
Welcome back. In this lesson we'll be talking about storage options. Now, this is something we have talked about at different times throughout the learning path. However, because it is an exam objective, we're going talk about it some more. And we're going start with a discussion of storage options that are available through Azure Storage service. And then we'll talk about SQL and NoSQL. So let's get started with Azure Storage.
First up, object storage, also called Blob storage. Blob storage is great for unstructured data, and the typical use is for things such as videos, images, data backups, et cetera. And Azure allows us to change the storage tier, and we can have a less expensive option for data that we will access infrequently, and more expensive for data that we'll access often.
The next storage option is going to be Azure Table. Azure Table storage can store petabytes of semi-structured data, and this is a partitioned schema-less database that is exceptionally scalable. To actually use it, you need to interact with it through one of the many libraries or the REST API.
Next up, let's talk about File storage. Now, this is basically a Cloud-based file share. This is going to allow you to easily lift and shift applications that require a file share.
The last option in Azure storage we're gonna talk about is the Queue storage. And now, Queue storage allows us to store lots of small messages that we can take out of the queue, process, and remove. Queue storage is going to allow us to engineer more robust systems. It allows us to queue up work to be handled by some worker at some point in the future. And because of this, we can use queues to decouple systems, because we can use a queue to communicate between different components, so that allows us to scale these independently of each other.
Microsoft has some great articles about different cloud patterns that are useful, and we've even talked about one of them already, and that's the competing consumer pattern. Though there are other patterns out there, and I recommend that you check them out at goo.gl/rX8CkB. And that's a Google shortened URL, so it is case-sensitive. Queues are exceptionally versatile and one of the go-to tools when it comes to decoupling systems.
Now, another exam objective is understanding the performance limitations of storage options. And that can be a bit of a moving target, so what I recommend is checking out this Microsoft article, and the shortened URL is going to be goo.gl/HxKmkJ.
All right, let's shift our attention from Azure storage to database options. If we don't want to use a platform as a service option, then we can host any database we want on an ISVM or on-prem. Running databases on VM's means that we have to manage the servers and everything running on them and that includes the operating system and software patches, database backups. We have to secure everything. We have to worry about scaling and everything else that comes along with managing this sort of thing yourself. There is nothing new here. Running this stuff yourself on ISVMs is basically the same thing as running it on-prem. If you need to run this stuff on virtual machines, then that is an option you can use.
However, we can also use the platform as a service option that Azure offers. We can use Azure SQL, Document DB, and Azure Table, which we mentioned earlier. With Azure SQL, we can easily scale up and out and we can off-load the management tasks incurred with hosting a SQL database on ISVM's. What we get is 35 days worth of backups, active Geo-replication, though we do have some limits. We have a limit on the max database size and those correspond to the pricing tier and the largest a database can be is currently a thousand gigs. There are also limits on the maximum amount of concurrent workers, sessions and logins. When it comes to selecting between Azure SQL and running on Azure ISVM's, you need to determine if these limits are too restrictive or not.
Now this isn't something that we've covered so far, so let's talk about it now. When it comes to data storage and access, there are different consistency and concurrency levels. Let's start with consistency. The two most common consistency levels are strong and eventual. A data system with strong consistency means that after a consumer writes data, the system guarantees that the next time that data is read, it's going to match what was written. This tends to be what we see with relational databases, when a record is going to be written in a transaction, the record is locked and then released once that data has been persisted. For eventual consistency things are a bit different. When we write some data, it will eventually be persisted and replicated out to any replicas, and until it's replicated out, a user querying for that data, may get some stale data. Eventual consistency is useful when the data just isn't that important. Think of something like Twitter. If a use tweets, the tweet is then replicated out to the databases and, for a brief moment of time, some users may not see that tweet and then it's going to reach all of the replicas and it will be available.
Now determining which type of consistency is right for your application comes down to reviewing the trade-offs between consistency, availability and latency. I mention that in addition to consistency, there's also different concurrency models and these include, optimistic, pessimistic and last-write wins. With optimistic, whatever is writing the data assumes that it's going to succeed and when it attempts to save, it checks to see if the data has changed since the last time it read the data and if it has, then the system is going to let us know that a conflict has been triggered. With pessimistic, it's the opposite. The system writing the data locks the record until it's done writing. And with last-write wins, it's pretty much what it sounds like. The last data written is what's going to be read the next time the record is read.
So, as I mentioned, Document DB is going to allow us to change the consistency to support whatever is going to work best for our application. SQL uses strong consistency and while the default is for pessimistic concurrency, it can however, support optimistic as well. We've talked a lot about different options for storage. Some options are going to allow us to run complex queries in order to filter the data down to just what we need. Others are fetched by some key and will return the corresponding data, but regardless of the type, they're not mutually exclusive. We'll usually require multiple data storage solutions for an application and it boils down to finding the correct set of options for our particular scenario. One final point that's worth noting while we're covering data storage is caching. Don't forget that you may want to consider caching for things like infrequently updated data, for data that's fetched by long-running queries, and for repetitive queries such as fetching a blog. And you can even prime the cache as needed to avoid cache misses.
Okay. Let's wrap up our lesson here. In our next lesson, we're going to be looking at some of the different services that we can integrate into our apps. So, if you're ready, then let's get started.
About the Author
Ben Lambert is the Director of Engineering and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps.
When he’s not building the first platform to run and measure enterprise transformation initiatives at Cloud Academy, he’s hiking, camping, or creating video games.