The course is part of these learning paths
Azure Data Implementation
Microsoft Azure offers services for a wide variety of data-related needs, including ones you would expect like file storage and relational databases, but also more specialized services, such as for text searching and time-series data. In this course, you will learn how to design a data implementation using the appropriate Azure services. Two services that are especially important are Azure SQL Database and Azure Cosmos DB.
Azure SQL Database is a managed service for hosting SQL Server databases (although it’s not 100% compatible with SQL Server). Even though Microsoft takes care of the maintenance, you still need to choose the right options to scale it and make sure it can survive failures.
Azure Cosmos DB is the first multi-model database that’s offered as a global cloud service. It can store and query documents, NoSQL tables, graphs, and columnar data. To get the most out of Cosmos DB, you need to know which consistency and performance guarantees to choose, as well as how to make it globally reliable.
Identify the most appropriate Azure services for various data-related needs
Design an Azure SQL Database implementation for scalability, availability, and disaster recovery
Design an Azure Cosmos DB implementation for cost, performance, consistency, availability, and business continuity
People who want to become Azure cloud architects
People preparing for Microsoft’s 70-535 exam
General knowledge of IT architecture, especially databases
SQL Server has been one of the most popular relational databases for a very long time, so it’s not surprising that Microsoft has made it available as a managed Azure service as well. But Azure supports other relational databases too. Microsoft offers a surprisingly high level of support for the most popular open-source relational databases, MySQL and PostgreSQL.
If you have existing databases running on either of these systems and you want to migrate them to Azure, then you should use Azure Database for MySQL and Azure Database for PostgreSQL. These are managed services that give you lots of extra goodies, including high availability, backups, security, and compliance. There’s even a free service that can migrate your existing databases to Azure with minimal downtime. The managed service price is quite reasonable, so there isn’t much reason to run these databases directly on virtual machines, because you’d have to manage them yourself.
OK, back to SQL Server. The most obvious way to migrate your SQL Server databases is to use Azure SQL Database. However, if you’re currently running SQL Server 2016 or higher, there’s another option you may want to consider. It’s called SQL Server Stretch Database.
It’s very similar to StorSimple, except it’s for database records instead of files. It migrates cold table rows (that is, infrequently queried rows) to Azure. This happens in the background automatically. And you can still query the data that has migrated to Azure too. The only difference is that the query will take a bit longer than usual.
Stretch Database is a good solution if your on-site data keeps growing. First, instead of buying more storage to handle the growth, you can just migrate your cold data to Azure, which is usually cheaper and easier. Second, as your data grows, backups take longer to run. Eventually, your backups may not be able to run within your backup window. By migrating your cold data to the cloud, your on-site backups run faster and your cloud backups happen automatically.
Of course, if your users almost never need to query your cold data, then it would be cheaper to store it offline than in Azure. But in most organizations, people do need to query cold data occasionally, and it would be a pain to bring that data back online every time it’s needed, so storing it in Azure is usually a better choice.
Microsoft even provides a tool called Stretch Database Advisor that tells you which databases and tables would be good candidates for Stretch Database. It also tells you about potential blocking issues, because there are a number of constraints that can prevent data from being migrated, such as tables that have more than 1,023 columns.
If you’d rather bite the bullet and just move an entire database to Azure, then there are two ways to do it. Ideally, you would move it to Azure SQL Database, which is a managed version of SQL Server. One of the big advantages is that you wouldn’t need to worry about maintenance anymore, because Microsoft takes care of all patches and updates, as well all of the underlying infrastructure.
It isn’t always possible to move an existing SQL Server database to Azure SQL Database, though, because they’re not 100% compatible with each other. Microsoft provides a Data Migration Assistant that tells you what reengineering, if any, would be required to move a database to the SQL Database service.
If the reengineering effort would be too great, then the other option is to move the database to a SQL Server instance running on a virtual machine in Azure. The disadvantage is that you would have to take care of maintaining the instance.
Microsoft has a new option called SQL Database Managed Instance that gives you the best of both worlds. It’s nearly 100% compatible with SQL Server, but it’s managed, so you don’t have to worry about maintenance.
When you create a new instance of Azure SQL Database, you have to decide which service tier to choose. There are three service tiers: Basic, Standard, and Premium.
They all have the same uptime guarantee, but there are big differences in performance and features. Basic has low CPU and very low storage capacity compared to the other two. Its maximum storage size is only 2 gig, while the other two can go up to at least a terabyte. It also lacks support for columnstore indexing, which is useful for data warehouse applications. The main advantages of Premium over Standard are much faster I/O and support for in-memory processing.
As you can see, there’s even more complexity in the service tiers because there are references to single databases, elastic pools, DTUs, and eDTUs. The difference between the single database model and the elastic pool model is that an elastic pool can contain multiple databases that share resources with each other. This model works well if the usage levels of your various databases are unpredictable.
For example, suppose that you have five databases and they all experience spikes in demand occasionally, but they don’t all experience these spikes at the same time. Putting them in a pool of shared resources would be much cheaper than provisioning five single databases because you’d have to provision enough resources for each of the single databases to handle spikes, even though they wouldn’t need that capacity most of the time. With elastic pools, you can also set minimum and maximum resources for each database to ensure that one database doesn’t consume all of the resources in the pool.
Now, to make sure you have enough capacity in either a single database or an elastic pool, you need to look at the DTUs or eDTUs. DTU stands for Database Transaction Unit. It represents a bundle of compute, storage, and I/O resources. eDTU is the same thing except it’s for elastic pools.
There’s a huge difference in the maximum number of DTUs between the Basic tier and the Standard and Premium tiers. This puts a hard limit on what you can do with the Basic tier.
The Standard and Premium tiers let you choose from lots of difference performance levels. The Standard tier performance levels go all the way from S0 to S12, each with a higher number of DTUs. One obvious result of this is the increasing number of concurrent connections supported by each performance level.
The Premium tier supports the same number of concurrent connections as the highest Standard performance level does, so it may seem like you wouldn’t want to pay more for the Premium tier unless you needed in-memory processing or more than one terabyte of storage. But there’s another reason that’s not shown in this table. Going back to the previous table, you can see that the Premium tier provides a massively higher number of I/O operations per second per DTU. That’s almost 20 times as fast as the Standard tier, so if you need high IOPS, Premium is the way to go. Why is it so much faster? Because the Premium tier stores data on direct attached disks, while the Basic and Standard tiers use Blob storage.
To offer even more flexibility, Microsoft has introduced a new vCore-based model, which lets you choose your compute, storage, and I/O resources independently. It’s still in beta at the moment, so I’m not going to cover it here.
Regardless of which service tier you choose, high availability within a region is handled automatically. SQL Database uses Always ON Availability Groups technology from SQL Server to provide HA. The good news is that you don’t have to do anything, even if there is a failure, because failover is automatic too. It may take up to 30 seconds to recover, but that should be acceptable in most cases.
To protect against regional failures, you need to configure active geo-replication. This lets you create up to four read replicas in different regions. If the primary database becomes unavailable, then you (or your application) can failover to one of the secondary databases. That secondary then becomes the new primary.
To make this work, you need to ensure that your secondaries have user authentication configured in the same way as the primary. One way to do this is to use Azure Active Directory so you don’t need to manage credentials manually on your databases.
You should also use the same firewall rules for secondaries as you do for your primary. One way to do this is to use database-level firewall rules, which are replicated with the database.
Active geo-replication also has the side benefit that you can use the secondary databases to make queries faster for users in other regions. This is possible because the secondaries are read replicas, so as long as a user doesn’t need to write to the database, they can connect to the nearest secondary instead of the primary.
Even with high availability, backups are still important, so you can recover from cases of accidental corruption or deletion. Luckily, SQL Database automatically takes care of backups too. It saves them in read-access geo-redundant storage (RA-GRS) to protect against regional failures. Transaction log backups happen about every 5 or 10 minutes, so you won’t lose much data if you do a point-in-time recovery.
Backups are saved for 7 days in the Basic tier and 35 days in the Standard and Premium tiers. If you need to save them for longer than that, then you can configure a long-term backup retention policy for each database. You can keep backups for up to 10 years.
Normally, you would restore data to the same server where it was originally located, but if you need to restore a database to a different server, then you can use Geo-restore. As the name implies, this is usually used to restore a database to another geographic region. If you’re not using active geo-replication, then this is the way to failover to another region. It typically takes about 12 hours, though, so it’s not a good solution for mission-critical applications. It can also result in the loss of up to an hour’s worth of data, since it can take up to an hour for the backup copy in the primary region to be replicated to the secondary region. If the primary region goes down, then the secondary region may not have the latest backup.
And that’s it for relational database storage.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).