Azure Data Fundamentals
Microsoft Azure offers services for a wide variety of data-related needs, including ones you would expect like file storage and relational databases, but also more specialized services, such as for text searching and time-series data. In this course, you will learn which services to choose when implementing a data infrastructure on Azure. Two services that are especially important are Azure SQL Database and Azure Cosmos DB.
Identify the most appropriate Azure services for various data-related needs
People who want to learn Azure fundamentals
General knowledge of IT architecture, especially databases
If you need to store data, but you don’t need a full-blown database, then Azure Storage is the way to go. First, it’s durable and highly available, because it stores all data redundantly. Second, it’s secure, because all data is encrypted automatically, and you can set fine-grained access control to it. Third, it’s scalable, because you can always add more data without having to worry about provisioning hardware to hold it. There is a 500 terabyte limit, but if you need more than that, you can contact Azure Support. Fourth, it’s a managed service, so you don’t have to worry about maintenance. And finally, it’s accessible over the web.
Although data is always stored redundantly, there are four different options, depending on your needs. Note that not all of these options are available in every region or for every type of data. We’ll go from cheapest and least redundant to most expensive and most redundant.
Locally-redundant storage (LRS) is replicated across racks in the same data center. This means that if there is a disaster at that data center, your data could be lost. Although this is highly unlikely, you should only use locally-redundant storage if you can easily reconstruct your data.
Zone-redundant storage (ZRS) is replicated across three zones within one region, so if an entire zone goes down, your data will still be available.
Geo-redundant storage (GRS) is replicated across two regions, so even if an entire region goes down, your data will still be available. However, in the case of a regional disaster, you’d have to wait for Microsoft to perform a geo-failover before you could access your data in the secondary region.
That’s why you may want to consider using read-access geo-redundant storage (RA-GRS). It’s the same as geo-redundant storage except that if there’s a disaster in your primary region, then you can read your data from the secondary region immediately. You won’t have write access, though, so if you can’t wait until Microsoft restores availability in the primary region, then you’ll have to copy your data to yet another region and point your applications to the new location.
You can copy data to and from Azure Storage using the AzCopy utility, Azure PowerShell, or the Azure Storage SDK, which is available for a variety of programming languages.
Azure Storage supports four types of data: blobs, files, queues, and tables. Blob stands for binary large object, but really a blob is just a file. So why is there a distinction between blob storage and file storage? The difference is in how they’re organized. Blobs aren’t really organized at all. Sure, you can use slashes in their names, which makes it look like they have a folder structure, but they’re not actually stored that way.
File storage, on the other hand, has the sort of hierarchical structure you’d expect in a filesystem. In fact, it’s SMB-compliant, so you can use it as a file share. This makes it easy to move an on-premises file server to Azure. Even better, you can make this file share globally accessible over the web, if you want. To do that, users need a shared access signature token, which allows access to particular data for a specific amount of time.
You might be tempted to use File storage instead of Blob storage, even when you don’t need an SMB-compliant file share, but bear in mind that it’s significantly more expensive than Blob storage. If you just need a place to put files, whether they’re documents or videos or logs or anything else, then you should use Blob storage, which is by far the cheapest of all the storage types.
There are options for making Blob storage even cheaper too. You can choose from three storage tiers: hot, cool, and archive. Hot storage is the tier you’ll probably use the most often. It’s intended for data that gets accessed frequently. If you have data that doesn’t get accessed frequently, then you should consider the cool storage tier.
It’s optimized for data that still needs to be retrieved immediately when requested, even though it doesn’t get accessed very often. An example would be a video file that people rarely watch. The cool tier has a much lower storage cost, but a much higher cost for reads and writes. The data also needs to be in the cool tier for at least 30 days.
If you have data that will almost never be accessed and you can live with it taking up to 15 hours to access when you do need it, then the archive tier is a way to save lots of money. It’s 5 times cheaper than the cool tier for storage costs, but it’s dramatically more expensive for read operations. The data also needs to reside in the archive tier for at least 180 days.
You can move data between the tiers anytime you want, but if you do it before the minimum duration for the cool or archive tiers, then you’ll be charged an early deletion fee. For example, if you put data in the archive tier and then move it back to the cool tier 90 days later, you’ll be charged half of the early deletion fee, since you moved the data when there was still half of the 180-day minimum left to go.
Queue storage is a very different option. It’s intended for passing messages between applications. One application pushes messages onto the queue and another application asynchronously retrieves those messages from the queue, one at a time, and processes them. We’ll talk more about messaging systems in another course.
I find Table storage to be the most surprising type of Azure Storage. It’s a NoSQL datastore with storage costs that are about the same as File storage and with way cheaper transaction costs. I think Microsoft realized what a good deal this is too because they now have a premium version of Table storage that’s part of their CosmosDB service. We’ll talk about both versions of Table storage again in other courses.
I should also mention that there’s another type of Azure Storage. Disk storage is what’s used for the disks that are attached to virtual machines, but we don’t need to go into the details here.
Microsoft is very good about providing options for hybrid solutions where you can use both Azure and your existing on-premises infrastructure. StorSimple is Microsoft’s hybrid offering in the storage area. It’s a virtual array that you install at your own site and it’s primarily used to do backup, recovery, and storage tiering.
Of course, you don’t need to use StorSimple to copy or move your data to the Azure cloud, but it makes the process so much easier. For example, here’s how storage tiering works. You provision enough local storage for your current needs, and then as you need more capacity, it moves your infrequently accessed data to the cloud. It even deduplicates and compresses the data before sending it, so your Azure storage costs are as low as possible. This all happens seamlessly in the background.
If you have a disaster locally and need to recover data from the cloud, StorSimple restores the metadata first and then restores the data itself as needed. That way, you can get up and running again very quickly.
And that’s it for Azure Storage.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).