image
Overview of Azure Storage
Overview of Azure Storage
Difficulty
Intermediate
Duration
50m
Students
11861
Ratings
4.8/5
starstarstarstarstar-half
Description

Microsoft Azure offers services for a wide variety of data-related needs, including ones you would expect like file storage and relational databases, but also more specialized services, such as for text searching and time-series data. In this Course, you will learn how to design a data implementation using the appropriate Azure services. Two services that are especially important are Azure SQL Database and Azure Cosmos DB.

Azure SQL Database is a managed service for hosting SQL Server databases (although it’s not 100% compatible with SQL Server). Even though Microsoft takes care of the maintenance, you still need to choose the right options to scale it and make sure it can survive failures.

Azure Cosmos DB is the first multi-model database that’s offered as a global cloud service. It can store and query documents, NoSQL tables, graphs, and columnar data. To get the most out of Cosmos DB, you need to know which consistency and performance guarantees to choose, as well as how to make it globally reliable.

Learning Objectives

  • Identify the most appropriate Azure services for various data-related needs

  • Design an Azure SQL Database implementation for scalability, availability, and disaster recovery

  • Design an Azure Cosmos DB implementation for cost, performance, consistency, availability, and business continuity

Intended Audience

  • People who want to become Azure cloud architects

  • People preparing for a Microsoft Azure certification exam

Prerequisites

  • General knowledge of IT architecture, especially databases

Transcript

If you need to store data on Azure, but you don’t need a full-blown database, then Azure Storage is the way to go. First, it’s durable and highly available because it stores all data redundantly. Second, it’s secure, because all data is encrypted automatically, and you can set fine-grained access control to it. Third, it’s scalable, because you can always add more data without having to worry about provisioning hardware to hold it. There is a 5 petabyte limit, but if you need more than that, you can contact Azure Support. Fourth, it’s a managed service, so you don’t have to worry about maintenance. And finally, it’s accessible over the web.

Azure Storage supports four types of data: blobs, files, queues, and tables. A single storage account can contain all of these types of data. There’s a separate section for each of them in a storage account. However, the blob section isn’t called “Blobs”; it’s called “Containers”. That’s because before you can upload any blobs, you have to create a container to hold them. Note that this type of container has nothing to do with containers that are used to run applications. 

So what is a blob anyway? Blob stands for binary large object, but really a blob is just a file. So why is there a distinction between blob storage and file storage? The difference is in how they’re organized. Blobs aren’t really organized at all. Sure, you can use slashes in their names, which makes it look like they have a folder structure, but they’re not actually stored that way.

An alternative is File storage, also known as Azure Files. It has the sort of hierarchical structure you’d expect in a filesystem. In fact, it’s SMB-compliant, so you can use it as a Windows file share. This makes it easy to move an on-premises file server to Azure. Even better, you can make this file share globally accessible over the web, if you want. To do that, users need a shared access signature token, which allows access to particular data for a specific amount of time.

You might be tempted to use File storage instead of Blob storage, even when you don’t need an SMB-compliant file share, but bear in mind that it’s significantly more expensive than Blob storage. If you just need a place to put files, whether they’re documents or videos or logs or anything else, then you should use Blob storage, which is by far the cheapest of all the storage types.

Just so you know, there is a way to make Blob storage hierarchical. Azure Data Lake Storage Gen2 is a hierarchical storage system that’s built on top of Blob storage, and you can use it by simply selecting “Enable hierarchical namespace” when you create a storage account. It’s designed to be used with data analytics services, such as Azure Synapse Analytics, though, so it’s not really a general-purpose file system.

Okay, let’s go back to basic Blob storage. Although it’s very inexpensive, there are options to make it even cheaper. You can choose from three access tiers: hot, cool, and archive. Hot storage is the tier you’ll probably use the most often. It’s intended for data that gets accessed frequently. If you have data that doesn’t get accessed frequently, then you should consider the cool storage tier.

It’s optimized for data that still needs to be retrieved immediately when requested, even though it doesn’t get accessed very often. An example would be a video file that people rarely watch. The cool tier has a much lower storage cost, but a much higher cost for reads and writes. The data also needs to be in the cool tier for at least 30 days.

If you have data that will almost never be accessed and you can live with it taking up to 15 hours to access when you do need it, then the archive tier is a way to save lots of money. It’s 5 times cheaper than the cool tier for storage costs, but it’s dramatically more expensive for read operations. The data also needs to reside in the archive tier for at least 180 days.

You can move data between the tiers anytime you want, but if you do it before the minimum duration for the cool or archive tiers, then you’ll be charged an early deletion fee. For example, if you put data in the archive tier and then move it back to the cool tier 90 days later, you’ll be charged half of the early deletion fee, since you moved the data when there was still half of the 180-day minimum left to go.

Amazingly, you can set each individual blob to a different tier. Although that’s very flexible, it’d be cumbersome to have to set the tier for every blob. So, instead, you set a default tier of either hot or cool for your storage account, and then you have the option to change the tier for specific blobs when needed. You could also configure lifecycle management policies that would automatically move blobs to different tiers based on certain conditions, such as how often blobs are accessed. Okay, that’s it for Blob storage.

Queue storage is a very different option. It’s intended for passing messages between applications. One application pushes messages onto the queue and another application asynchronously retrieves those messages from the queue, one at a time, and processes them. 

I find Table storage to be the most surprising type of Azure Storage. It’s a NoSQL datastore with storage costs that are about the same as File storage and with way cheaper transaction costs. I think Microsoft realized what a good deal this is too, because they now have a premium version of Table storage that’s part of their Cosmos DB service.

I should also mention that there’s another type of Azure Storage. It’s what’s used for the disks that are attached to virtual machines. But these disks are created when you create a VM, so you don’t need to create them yourself in Azure Storage. That’s why you’ll only see options for blobs, files, queues, and tables in your storage accounts.

Although data is always stored redundantly, there are six different options, depending on your needs. Note that not all of these options are available in every region or for every type of data. We’ll go from least redundant to most redundant. 

Locally-redundant storage (or LRS) is replicated across racks in the same data center. This means that if there’s a disaster at that data center, your data could be lost. Although this is highly unlikely, you should only use locally-redundant storage if you can easily reconstruct your data.

Zone-redundant storage (or ZRS) is replicated across three zones within one region, so if an entire zone goes down, your data will still be available.

Geo-redundant storage (or GRS) is replicated across two regions, so even if an entire region goes down, your data will still be available. However, in the event of a regional disaster, you’d have to perform a geo-failover before you could access your data in the secondary region.

That’s why you may want to consider using read-access geo-redundant storage (or RA-GRS). It’s the same as geo-redundant storage except that if there’s a disaster in your primary region, then you can read your data from the secondary region immediately. You won’t have write access, though, so if you can’t wait until Microsoft restores availability in the primary region, then you’ll have to copy your data to yet another region and point your applications to the new location.

Geo-zone-redundant storage (or GZRS) is almost the same as GRS, but there’s a critical difference. With GRS, data is copied to three places in one location in the primary region. With GZRS, data is copied to three availability zones in the primary region. In other words, it’s a combination of zone-redundant storage and geo-redundant storage.

Similarly, read-access geo-zone-redundant storage (or RA-GZRS) is the same as RA-GRS except that data is copied to three availability zones in the primary region.

I should also mention that for all of the geo-redundant options, it takes a while for data changes in the primary region to replicate to the secondary region, so if the primary region fails, the data in the secondary region will likely be slightly out-of-date.

Naturally, each of these redundancy options has a different price, with locally-redundant storage being the cheapest and read-access geo-zone-redundant storage being the most expensive. Note that RA-GRS is actually more expensive than GZRS even though it’s less redundant because it provides instant access to your data in the secondary region. 

Aside from the redundancy level and the default access tier, there’s yet another option you need to set when you create a storage account: the performance level. In most cases, you should leave it with the default, which is called “Standard”. This option will give you what’s called a “general-purpose v2” account. If you need faster performance, then you can select “Premium”. It does cost more for Premium performance, though, and it limits your redundancy options, so you should only select it if you really need it.

So, once you’ve created a storage account, how do you get data into it? There are many ways to do it. If you just need to upload a small number of files from your desktop, then you can go into your storage account in the Azure Portal and select “Upload”. You can also download files using the portal.

If you need to upload lots of files, then it would be faster to use AzCopy, which is a command-line utility that runs on Windows, macOS, and Linux. It lets you copy files and folders to or from a storage account, and it even lets you copy them between Azure and other cloud providers, such as AWS.

If you’d rather use a graphical user interface than a command-line tool, then you can use Azure Storage Explorer. It lets you do everything the AzCopy command can do, but it also lets you manage files in your storage accounts instead of just copying them. For example, you can change the access tier of a blob.

Azure File Sync gives you a very different way of copying files from a storage account to your on-premises environment. If you’re using Azure Files to host file shares, then you can create a local cache of the file share by installing the Azure File Sync agent on a Windows server. This will let your users access the file share more quickly than having to connect to the central one in Azure. That’s a very specialized use case, of course, rather than a more general way to copy files from Azure Storage.

If you’re migrating from an on-premises environment to Azure, then you should consider using Azure Migrate and Azure Data Box. Azure Migrate does a lot more than copy data. First, it discovers your on-premises servers, web apps, and databases. Then it assesses them, telling you the size and cost of the equivalent Azure services. Finally, it helps you do the migration.

If you need to send a large amount of data during the migration, the solution is to use Azure Data Box. Here’s how it works. Microsoft ships a Data Box storage device to your datacenter. Then you copy your data to the device and ship it back to Microsoft. When they receive it, Microsoft will transfer the data from the device to your Azure storage account. You can also use Data Box to do the reverse, that is, export your data from Azure to your datacenter if you need to at some point in the future. Due to the time and expense involved with Azure Data Box, you would typically use it only if you need to transfer more than 40 terabytes of data.

And that’s it for this overview of Azure Storage.

 

About the Author
Students
202401
Courses
97
Learning Paths
163

Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).