This course covers the Design an application storage and data access strategy part of the 70-534 exam, which is worth 5–10% of the exam. The intent of the course is to help fill in an knowledge gaps that you might have, and help to prepare you for the exam.
Welcome back, in this lesson we're going to be covering some of the storage options available in Azure.
Let's start with the storage option aptly named Azure storage. Azure storage is an extremely flexible and massively scalable storage platform and it's a fundamental building block for Azure.
As an example of how it's fundamental for IaaS VM's the virtual hard disks are stored in Azure storage. You can use Azure storage to store up to 500 terabytes of data and it can handle up to 20,000 IO operations per second and it's elastic which ensure consistent performance.
By default, Azure storage is resilient so it creates three replicas inside of a region. Plus you can enable geo-replication to the paired region. Geo-replication by default is for disaster recovery since replicated data isn't accessible. It only becomes available if Microsoft initiates a failover.
However, you can use red-only geo-replication and this will, as the name suggests, allow you to use those replicas for read-only access. And that can provide lower latency for end users. You can interact with Azure storage via a REST API and any of the libraries that are built on top of it.
So let's look into some of the different storage options that are built into Azure storage. Let's start off with blob storage. Azure blob storage allows you to store unstructured data in the cloud as blobs or objects. blob storage can store any type of text of binary data such as documents, pictures, videos, backups, et cetera. You can think of it as kind of a file system in the cloud.
blobs are available in two tiers, hot which is optimized for storing data that is accessed frequently and cool, which is optimized for storing data that is infrequently accessed and long lived. This diagram shows how blobs fit into the storage hierarchy.
We first create a storage account and we create multiple containers to organize our data inside of it. A storage account can store up to 500 terabytes of data shared across all sub services. Inside each container you can then store multiple blobs.
In this example we have a storage account for movies then we have containers for different movie genres such as sci-fi, comedy, action, romance, et cetera. And then we can use this to store videos that fit into these genres. An Azure storage account provides us with a unique address so that you can store and access a set of Azure storage types.
When you create a storage account the name is selected which must be globally unique, is used to set the URL that we'll use to access that stored data. In the example here we created an account called movies storage account and this generates the url https://movies.storageaccount.blob.core.windows.net to access the blobs.
And there are also URLs like this for tables, queues, and files. A container allows you to subdivide blobs into categories. You can create an unlimited number of them but you're restricted to using lowercase names. And they only operate at a single level. So you can't create real subfolders with containers.
However, when uploading blobs you can specify a folder structure. Though these folders are virtual and they're only to help you organize blobs. The container names are used to generate the URL used to access the blobs by simply appending a forward slash then the container name to the URL for the storage account.
In the example that we have here we have /scifi. Once we have a storage account and containers we can start uploading blobs. The most common type of blob is a block blob and that's designed to hold any type of binary data or text file.
However, we have two other types, we have append blobs and page blobs. The append blob is optimized for data which you intend to add to for example a log file and a page blob is used for very large files of data such as an entire hard disk. It's optimized for frequent reads and writes just like you'd use on hard drive.
A blob's name is used to generate the URL used to access the blob by simply appending a forward slash and then the name of the blob to the URL after the container. In this example we append a forward slash and then totalrecall.avi. The next storage option that we're gonna talk about is Azure storage file storage.
File storage allows us to create file shares in the cloud which can help us to migrate a legacy application to Azure. Server message blog or SMB is available to use with file storage and we can use SMB 2.1 and 3.0. Applications running in Azure can mount the file share in the same way that you would use SMB to mount a file share on an internal machine.
And since it's basically just a cloud hosted file share you can use existing tools and APIs. As an example, the standard file system APIs will work as well as file system tools related to powershell commandlets.
Other uses apart from migrating legacy applications include things such as sharing application settings and configuration files. Storing diagnostic data, and this may be things like logs, metrics, crash dumps, et cetera. And storing tools and utilities for development and administration on Azure.
This diagram illustrates how files it into the storage hierarchy. You first create a storage account and then you can create file shares. Inside of each file share you can create a hierarchy of directories to store files in.
In this case we've repeated the blob storage example and created a file share for movies. So we have directories for the different movie genres. Our examples are sci-fi, comedy, action, et cetera. And then we can store videos that fit into those genres in those folders.
Okay, up next let's cover table storage. Tables allow you to store structured or semi-structured albeit non relational data in the cloud in a Schema-less design. It fits into the NoSQL category of key/values store.
However, it's not suitable for data sets that require complex joins and foreign keys, stored procedures, and things like this. For those it's probably better to use a SQL option such as Azure SQL. One of the main benefits of Tables is the ability to quickly execute queries against large amounts of data. And you can use the OData protocol and related libraries to access the data.
This diagram illustrates how tables fit into the storage hierarchy. We have a storage account and then we can create multiple tables. Each table is a collection of entities used to organize our data. Each entity holds a set of properties, which are similar to rows in a SQL database and each property is a name value pair.
In this example we have a storage account for movies then we have tables to hold details about directors and actors. In addition to the user defined properties there are also three system properties in each entity. We have the partition key, which is used to segment the data and ensures that data in the same partition key can quickly be queried and updated.
Then we have the row key which is used as a unique identifier for an entity. And then there's the timestamp which is a system managed last update value. The tables are accessed using a similar URL to blobs and files. In this case we have a high level address to access tables within a storage account followed by a string for the table name. Table storage is useful for things such as user information, address books, and variable metadata.
Alright, let's move on to queue storage. Queues offer up a resilient messaging service. These messages can be accessed using HTTP and HTTPS and this allows you to develop decoupled applications that communicate to each other through queued messages.
You can also develop components that run in different environments such as cloud, desktop, on-prem, and mobile devices and you can connect these together using queues.
The main uses for queues are things like scheduling a backlog of work to be processed asynchronously, passing messages between web and worker roles, supporting flexible scaling for different components, and building workflows.
This diagram here shows how queues fit into the storage hierarchy. We first create a storage account and then we have multiple queues and each queue is a collection of messages. In this case we have a storage account for movies, we have a queue to hold messages containing instructions for an application to make updates to the tables holding the director and actor data.
So queue storage gives you a way to create messages to pass instructions between applications which enables asynchronous workflows. This allows you to design a set of decoupled applications including solutions using services hosted in Azure or on-prem.
The queues are accessed using similar URLs to the blob's files and tables. In this case we have a high level address to access queues within a storage account followed by a string for the queue name.
Alright, let's talk about some general security options for Azure storage. Azure storage recommends the use of HTTPS and while supported HTTP connections are not recommended. Azure storage objects are private by default.
To access data inside you need an access key. Though it is possible to mark a blob container as public and then the contained blobs are accessible via the URL.
We can also give a user temporary permission to part of our Azure storage data via a shared access signature which allows us to define a start time, end time, resource type, permissions, and a signature and this allows a user access to that resource until the token expires.
Okay, that covers Azure storage however there are more options than just what's in Azure storage. We also have some SQL and NoSQL databases. Let's start with the SQL databases. Azure SQL provides a relational database platform as a service. This can be considered a database as a service.
Alternatively, you could use IaaS VMs to configure them yourselves and host SQL server or any other database. Azure SQL has different performance levels determined by service tiers and the VM sizes inside of the given tier. The levels of performance provided are measured in terms of database transaction units abbreviated DTU and is available through the three service tiers basic, standard, and premium.
Azure SQL allows you to select a single database or if you have multiple database you can use elastic pools and elastic pool gives you the ability to scale as needed based on the database load. The elastic pool option also has the same three service tiers and pricing is more flexible with pools making it a cost effective option when our database usage is highly variable.
The charges are based on elastic database transaction units known as eDTUs which correspond to DTUs except elastic databases don't use up any eDTUs until there is some actual database usage. Azure SQL is without a doubt a feature rich database and being fully managed makes it quite appealing.
However, there are plenty of scenarios where we're going to need to use something else, maybe something like MySQL and for that we can use the MySQL option for the marketplace which is provided by a Microsoft partner called ClearDb.
If you're going to lift and shift an existing MySQL based application to Azure then this is the option to look at. We won't go into detail here just know that MySQL which is a very popular open source database is provided to Azure through ClearDb as a managed platform.
Okay now let's go into some of the NoSQL options available in Azure. Let's start with Microsoft's own option DocumentDb. DocumentDb is a NoSQL document-based database offering. It allows developers to store JSON documents and it allows developers to omit the object relational mapper in the application code.
Collections are able to span multiple partitions and can scale to handle near unlimited volumes of storage. And a partition can contain up to 10 gigs of JSON documents. And after that you're going to need to create more partitions. We can adjust the consistency with DocumentDb and that ranges from eventual to strong in order to meet different scenarios. This allows us to focus on our use case and adjust accordingly.
It's designed for high availability and automatically replicates three times inside of a given region. And it can also replicate geographically for worldwide read-only access. Using a document database allows us to have an object with all of the data we need and that means we don't need to normalize the data like we would if we were using SQL.
Another NoSQL option similar in a lot of ways to DocumentDb is MongoDB. MongoDB is a database that you can install on-prem or on IaaS VMs and Azure also has a managed version of it in Preview.
If you need something that's not in Preview that's in general release you can also use the Mongo labs version which is on the Marketplace. So if you need to use Mongo check out the Marketplace for some of the options.
Alright, that's going to wrap up not only this lesson but this entire course. Now this was a brief course; however, storage is only about five to 10% of the exam.
So in the next course we'll cover the next domain objective which is designing advanced applications. So whenever you're ready to keep learning I'll see you in the next course.
About the Author
Ben Lambert is the Director of Engineering and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps.
When he’s not building the first platform to run and measure enterprise transformation initiatives at Cloud Academy, he’s hiking, camping, or creating video games.