Azure Data Implementation
Microsoft Azure offers services for a wide variety of data-related needs, including ones you would expect like file storage and relational databases, but also more specialized services, such as for text searching and time-series data. In this course, you will learn how to design a data implementation using the appropriate Azure services. Two services that are especially important are Azure SQL Database and Azure Cosmos DB.
Azure SQL Database is a managed service for hosting SQL Server databases (although it’s not 100% compatible with SQL Server). Even though Microsoft takes care of the maintenance, you still need to choose the right options to scale it and make sure it can survive failures.
Azure Cosmos DB is the first multi-model database that’s offered as a global cloud service. It can store and query documents, NoSQL tables, graphs, and columnar data. To get the most out of Cosmos DB, you need to know which consistency and performance guarantees to choose, as well as how to make it globally reliable.
Identify the most appropriate Azure services for various data-related needs
Design an Azure SQL Database implementation for scalability, availability, and disaster recovery
Design an Azure Cosmos DB implementation for cost, performance, consistency, availability, and business continuity
People who want to become Azure cloud architects
People preparing for a Microsoft Azure certification exam
General knowledge of IT architecture, especially databases
I hope you enjoyed learning about Azure’s data services. Let’s do a quick review of what you learned.
Azure Storage has 4 different redundancy options. Locally-redundant storage is replicated across racks in the same data center. Zone-redundant storage is replicated across three zones within one region. Geo-redundant storage is replicated across two regions. Read-access geo-redundant storage is the same as geo-redundant storage except that if there’s a disaster in your primary region, then you can read your data from the secondary region immediately.
Azure Storage supports five types of data: blobs, files, queues, tables, and disks. Blob storage holds data objects. You can choose from three storage tiers. Hot storage is for data that gets accessed frequently. Cool storage is for data that doesn’t get accessed more than once every 30 days and that needs to be retrieved immediately when requested. Archive storage is for data that doesn’t get accessed more than once every 180 days and that can take up to 15 hours to access when you do need it.
File storage is SMB-compliant, so you can use it as a file share. Queue storage is for passing messages between applications. Table storage is a very simple and inexpensive NoSQL datastore. Disk storage is for the disks that are attached to virtual machines.
StorSimple is a virtual array that you install at your own site. It moves your infrequently used data to the cloud and lets you access that data seamlessly when you need it.
Azure Purview is an index to all of an organization’s data. Each data source has to be manually registered in the catalog.
Azure Synapse Analytics includes both a SQL-based data warehouse and a Spark-based data lake. SQL pools are used primarily for business reporting, and Spark pools are used primarily for data exploration, processing, and analytics.
Two other services that can run Spark jobs are Azure Databricks and HDInsight.
With Azure Data Factory, you can create data processing pipelines. This lets you automate data movement and transformation.
Azure Analysis Services lets you create data models to make sense of existing data. It sits between databases and business intelligence clients, such as Power BI.
Azure Database for MySQL and Azure Database for PostgreSQL are managed services that provide high availability, backups, security, and compliance for MySQL and PostgreSQL.
SQL Server Stretch Database migrates cold table rows to Azure, but still lets you query them. This saves you money and helps your backups run faster.
Azure SQL Database is the preferred option for moving from SQL Server to Azure, but it’s not 100% compatible with SQL Server, so there may be some reengineering required. It offers three service tiers. Basic is for databases with less than 2 gig of data. Standard can hold up to 1 terabyte and Premium can hold up to 4 terabytes. Premium offers the fastest I/O and support for in-memory processing.
Standard and Premium offer a number of different performance levels, measured in DTUs, which stands for Database Transaction Units. A DTU represents a bundle of compute, storage, and I/O resources.
You also need to choose between the single database model and the elastic pool model. An elastic pool can contain multiple databases that share resources with each other. This model works well if the usage levels of your various databases are unpredictable.
SQL Database is highly available within a region automatically. To make it highly available across regions, you need to configure active geo-replication. This lets you create up to four read replicas in different regions.
SQL Database takes care of backups automatically, too, but it even protects against regional failures out of the box. Transaction log backups happen about every 5 or 10 minutes. Geo-restore lets you restore a database to another geographic region from where it was backed up.
Azure Table storage is a key/attribute store. It’s schemaless and it automatically creates a primary index. Azure Redis Cache is a simple key/value store. It runs in memory, which is why it’s used as a cache. Azure Data Lake Storage is a NoSQL repository for all kinds of data. Azure Search creates an index of text data so your users can run searches. Time Series Insights collects time-stamped data, such as data from IoT devices, and lets you run queries on it.
Cosmos DB is globally distributed and it provides SLAs for latency, throughput, consistency, and availability.
It has APIs to support various data models, including Table storage, MongoDB, SQL, Graph, and Cassandra.
It offers 5 data consistency levels. Strong consistency guarantees completely consistent reads. Bounded staleness guarantees that reads may lag behind writes by a limited amount of time. Session consistency guarantees consistency for each client session. Consistent Prefix guarantees that read operations will never see out-of-order writes. Eventual consistency guarantees that if no new writes are made to an item, then eventually all the replicas will have the same value for that item.
You need to tell Cosmos DB how many RUs per second to provision. This throughput capacity is applied to each region associated with your database account.
Cosmos DB guarantees 99.99% availability for both single-region and multi-region databases. It also provides a 99.999% read availability SLA on multi-region databases. To make the automatic failover process more efficient, set a preferred regions list for each region.
Cosmos DB automatically takes snapshots of your databases every four hours. Only the last two snapshots are retained.
You can change the indexing behavior on a Cosmos DB collection by configuring a custom indexing policy. The three options for the index update mode are Consistent, Lazy, and None.
Now you know how to identify the most appropriate Azure services for various data-related needs; design an Azure SQL Database implementation for scalability, availability, and disaster recovery; and design an Azure Cosmos DB implementation for cost, performance, consistency, availability, and business continuity.
To learn more about Azure’s data services, you can read Microsoft’s documentation. Also watch for new Microsoft Azure courses on Cloud Academy, because we’re always publishing new courses. Please give this course a rating, and if you have any questions or comments, please let us know. Thanks and keep on learning!
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).