Getting the Most From Azure Storage
The course is part of these learning pathsSee 3 more
The Azure Storage suite of services form the core foundation of much of the rest of the Azure services ecosystem. Blobs are low-level data primitives that can store any data type and size. Tables provide inexpensive, scalable NoSQL storage of key/value data pairs. Azure queues provide a messaging substrate to asynchronously and reliably connect distinct elements of a distributed system. Azure files provide an SMB-compatible file system for enabling lift-and-shift scenarios of legacy applications that use file shares. Azure disks provide consistent, high-performance storage for virtual machines running in the cloud.
In this Introduction to Azure Storage course you'll learn about the features of these core services, and see demonstrations of their use. Specifically, you will:
- Define the major components of Azure Storage
- Understand the different types of blobs and their intended use
- Learn basic programming APIs for table storage
- Discover how queues are used to pipeline cloud compute node together
- Learn to integrate Azure files with multiple applications
- Understand the tradeoffs between standard/premium storage and unmanaged/managed disks
Let's now touch on some additional topics to help you get the most from Azure Storage and your own solutions. Azure Storage provides a multilayered approach to securing your data, both while at rest and in transit on the network. All Azure Storage APIs leverage TLS to secure data payloads between client and storage layer.
This includes REST APIs called directly, as well as those called indirectly via language specific SDKs. If you use the Server Message Block 3. 0 protocol support in Azure Files, you can also leverage that protocol support for encryption of data in transit. Finally, an additional layer of encryption exists in Azure's SDK level support for client-side encryption, which encrypts and decrypts data as it moves between Azure Storage and an accessing client application.
Note that client-side encryption can be used with TLS to provide a layered security approach for in-flight data. Data at rest in Azure Storage can be secured in a few different ways. First, you can turn on server-side encryption in your Azure Storage account to ensure that all blobs are encrypted when saved into the account.
Note that server-side encryption only works for blobs. Second, you can use the client-side encryption APIs to encrypt any data in Azure Storage and then store that data in its encrypted form in your storage account. Client-side encryption ensures that the necessary keys and metadata needed to decrypt your data will travel with it and are accessible to you when needed.
Finally, if you're using Azure Disks you can enable the disk encryption feature to ensure that VM disks you create and manage in your storage account are always secure. Beyond data encryption, Azure provides you the ability to secure both the management and data access surface areas of your Azure Storage accounts.
Each storage account has two private keys that provide full access to all storage functionality. These keys are ideally administrative in nature and they shouldn't be used for application level access. They can be rotated and regenerated as needed. To provide secure access for client applications, Azure Storage supports Shared Access Signature tokens, which can grant time and behavior restricted access to narrowly scoped segments of a storage account.
From the management functionality side, Azure Storage supports Role Based Access Control to restrict the set of Azure Active Directory roles who have access to a given storage account, as well as the specific privileges they have within it. This feature works for storage in the same way it works for other services within the Azure ecosystem.
For more information on security best practices with Azure Storage, follow the URL on screen. Performance guidelines for Azure Storage tend to fall into broad categories of general advice and feature specific guidance. Let's talk about general advice first. One of the most important things you can do to avoid performance issues with Azure Storage is to remain well below the targeted maximums for data size, bandwidth capacity, and transaction volumes.
These ceilings are publicly available values used to maintain the storage SLA for all concurrent Azure customers. As you begin to approach these limits in your storage account, the storage subsystem may throttle your request by returning HTTP 500 or 503 errors, which ultimately has a negative impact on the performance of your application.
Typical advice for dealing with this issue is to horizontally partition into multiple storage accounts to spread the load across more physical infrastructure. Another important consideration is the distance between client applications and the storage resources they're accessing. Transmitting large datasets over a large geographic distance can result in significant latency and a poor overall user experience.
When possible try to locate accessing applications close to the data they interact with. Two important considerations for SDK access are to avoid unbounded parallelism, which can swamp a storage endpoint with a significant and costly spike in requests, and to use a request retry strategy with an exponential instead of linear backoff pattern.
More on this in a different slide. Focusing more specifically on Azure tables. The single biggest performance consideration is to choose an effective partitioning strategy that minimizes the need to query data across partitions. This is useful advice for most no SQL data stores but no less tricky to implement.
You need a thorough understanding of your application's data access patterns to effectively partition the data to match them. Many, maybe even most developers don't take the time to build this knowledge. Beyond partitioning, be aware of the effect of normalization in your data and look for evidence of expensive table scans used to chase logical data relationships during big queries.
Consider denormalizing to store commonly access data in close proximity to one another and minimize query costs. The most important performance consideration with blobs is to choose the correct blob type. Block blobs are useful general purpose mechanisms but for implementing application or transaction logs an append blob is better.
Page blobs have a fixed length and are optimized for random access along fixed 512-byte boundaries. This makes them more specialized and less generally applicable than block blobs. Another consideration for blobs is to use blob metadata to define key application level properties you might want to access independent of the blob itself.
This avoids the data access anti-pattern of accessing the blob merely to interrogate initial metadata and determine if you want to continue accessing that blob or move on to another one. Finally, when using Azure queues, try to constrain your message size as much as possible. Queues will typically perform and scale better with message of a few kilobytes in size or less.
For more information and details on Azure Storage performance considerations, follow the URL on screen. Managing access to Azure Storage data by multiple concurrent client applications is done differently for each storage type. For blobs you can choose between optimistic and pessimistic concurrency models as appropriate to your needs.
The optimistic model works through use of an ETag header value. This is a unique identifier assigned to a data element upon each update. When new updates are attempted, the blob service will allow them only if the updating request includes the current identifier for that element, meaning no other updates have happened since this client first read that data element.
The pessimistic model in blob storage uses blob or container leases to allow at most one writer at a time. The pessimistic model is conceptually a bit simpler but can impact scalability and throughput, so use that model only when absolutely needed. Table storage uses an optimistic concurrency model with ETags, similar to blob storage.
Note that in tables the granularity of concurrency checks is for individual entities or rows, while in blobs it can be either blobs or blob containers. There is no table-level concurrency control in Azure Table Storage. Azure queues have no explicit notions of optimistic or pessimistic concurrency. Multiple writers simply add messages to a queue as needed, without any guarantee of message ordering.
Concurrent readers consume queue messages through the use of message visibility windows. That is, a reader receives a message and starts processing it, but at that point the message is not removed from the queue, rather it becomes invisible to other readers for a designated time period. Either the reader completes message processing and permanently removes it from the queue, or the visibility timeout expires, in which case another reader will see the message and pick it up for processing.
This two-faced processing approach avoids costly queue locking, maximizes throughout of the queue infrastructure as a whole. Azure Files and Disks have no over concurrency controls of their own, but instead rely on native file system locking behavior to govern who can do what and when with a contained file.
Applications that rely on cloud native services like Azure Storage must be prepared to differentiate temporary infrastructure and connectivity issues from true application level or unrecoverable platform errors. Transient errors can and should be retried, since in many cases the desired operation can be completed successfully on a subsequent attempt.
However, care must be taken to use an appropriate retry strategy, and naive approach will retry the same request over and over again with a fixed time duration between attempts, or worse, no delay at all. This can and often will have the effect of swamping an already overburdened service with additional requests.
At best, the continued stress will cause the service to take longer to recover; at worst, it will fall over in the face of a request load it simply can't handle. A better solution is to use an exponential backoff strategy where retries occur a fixed number of times with an increasing delay placed between each subsequent request.
This can give an overloaded service time to recover or auto scale and begin working efficiently again. For more information on handling transient cloud faults in Azure, including for the storage services, follow the URL on screen. I've included here some links to additional tools you can use to manage and administer Azure Storage accounts and contain data.
There's also a link to the Azure Storage Emulator, a tool that runs on your local machine and simulates the blob, table and queue storage types. The emulator is a useful developer tool for local testing and debugging of software that integrates with Azure Storage. The details of pricing in Azure Storage will vary depending upon which service, blobs, tables, queues, and so on, you're interested in.
I'll direct you to the online Azure pricing calculator for up-to-date and specific details. See the URL on screen. However, in general, you can expect to be charged based on the total amount of data you store; the mix of operations you perform, that is, reads, writes, blob container listings, and so on; the total number of such operations; the type of redundancy you configure for your storage account, local versus geo-redundant; and finally, the amount of data that leaves the region within which your storage account resides, since any data that leaves a storage account incurs some bandwidth charge, typically a few cents per gigabyte, if it crosses a data center boundary.
About the Author
Josh Lane is a Microsoft Azure MVP and Azure Trainer and Researcher at Cloud Academy. He’s spent almost twenty years architecting and building enterprise software for companies around the world, in industries as diverse as financial services, insurance, energy, education, and telecom. He loves the challenges that come with designing, building, and running software at scale. Away from the keyboard you'll find him crashing his mountain bike, drumming quasi-rythmically, spending time outdoors with his wife and daughters, or drinking good beer with good friends.