The course is part of these learning paths
Azure Data Implementation
Microsoft Azure offers services for a wide variety of data-related needs, including ones you would expect like file storage and relational databases, but also more specialized services, such as for text searching and time-series data. In this course, you will learn how to design a data implementation using the appropriate Azure services. Two services that are especially important are Azure SQL Database and Azure Cosmos DB.
Azure SQL Database is a managed service for hosting SQL Server databases (although it’s not 100% compatible with SQL Server). Even though Microsoft takes care of the maintenance, you still need to choose the right options to scale it and make sure it can survive failures.
Azure Cosmos DB is the first multi-model database that’s offered as a global cloud service. It can store and query documents, NoSQL tables, graphs, and columnar data. To get the most out of Cosmos DB, you need to know which consistency and performance guarantees to choose, as well as how to make it globally reliable.
Identify the most appropriate Azure services for various data-related needs
Design an Azure SQL Database implementation for scalability, availability, and disaster recovery
Design an Azure Cosmos DB implementation for cost, performance, consistency, availability, and business continuity
People who want to become Azure cloud architects
People preparing for Microsoft’s 70-535 exam
General knowledge of IT architecture, especially databases
In this lesson, we’ll cover services that don’t fall into the usual categories of basic storage, transactional databases, or NoSQL datastores. These services help you work with data in other ways, such as finding, transforming, and analyzing it.
First up is Azure Data Catalog. Organizations typically have so much data in so many different places that it’s hard to find what you’re looking for. The purpose of Data Catalog is to act as an index to all of those data sources, so you can discover them. Of course, in order for this to work, your employees need to register their data sources in the catalog. The data itself stays where it is, but its location and the metadata about it get added to the catalog. The metadata includes things like column names and data types. Users can also add additional information about a data source, such as a description or some tags.
Once various data sources are registered, people can search the catalog to find what they’re looking for. They still need to open the data using another tool, though, since this is just a catalog.
Another way to deal with pockets of data is to collect it in either a data lake or a data warehouse. These serve two different, but related, needs.
Azure SQL Data Warehouse gives us a clue about its purpose in the name itself. It’s intended for SQL queries. That also implies that it stores data in structured, relational tables. If you have raw data that’s not in a nicely structured format, then you’ll probably need to process it before you store it in SQL Data Warehouse.
Azure Data Lake Store, on the other hand, will store any kind of data, whether it’s structured or not. For example, you could store everything from documents to images to social media streams.
Data warehouses are generally used for business reporting, while data lakes are more often used for data analytics and exploration. In fact, one common setup is to process data in the data lake and then export it to the data warehouse.
The two services are designed to work with different types of software, too. SQL Data Warehouse is built on SQL Server, so it works well with that ecosystem of software. Data Lake Store, in contrast, is built to work with Hadoop. That’s because Hadoop excels at processing unstructured data.
One final difference is that SQL Data Warehouse is certified for compliance with over 20 standards, including HIPAA. Data Lake Store does not have regulatory compliance. This is another reason why it makes sense to use SQL Data Warehouse to serve data to a wider audience.
Both services are designed for performing massive queries at high speed. With SQL Data Warehouse, you write queries using T-SQL. With Azure Data Lake Store, you can run queries using Hadoop, but you’ll get higher performance by using Azure Data Lake Analytics. It’s a pay-as-you-go service that lets you write queries using U-SQL, which is a combination of SQL and C#.
If you do want to use Hadoop to process your data, then you can use Azure HDInsight. It supports a wide variety of open-source big data frameworks, including Hadoop, Spark, Hive, Storm, and many others. You don’t even have to use Data Lake Store if you don’t want to because it also supports Azure Storage.
One difference between Data Lake Analytics and HDInsight is ease of use. With HDInsight, you need to spin up a cluster to run a processing job. So if you need to run jobs quite often, you can leave a cluster running all the time, which could be expensive, or you could spin clusters up and down as you need them, which would be kind of a pain.
One way to make HDInsight work more like Data Lake Analytics is to use yet another service, Azure Data Factory. It lets you create workflows to automate data movement and data transformation. One of its many capabilities is spinning up and down HDInsight clusters as needed, but it can do far more than that.
With Data Factory, you can create data processing pipelines. For example, a pipeline could copy data from SQL Server to Data Lake Store, run a Spark job on the data using an HDInsight cluster, and store the results in SQL Data Warehouse, all without any human intervention. It can even automate Data Lake Analytics queries and machine learning jobs. It’s a great data processing tool.
One more data analytics tool is Azure Analysis Services. It lets you create data models that make sense of existing data. One of the problems with the multitude of data in organizations is that it can be hard to understand how all of that data relates to the real world. Using a data model is easier than working with the raw data. Analysis Services also makes browsing large amounts of data faster because it uses in-memory caching.
However, end users don’t browse directly through Analysis Services. Instead, they use one of the supported client tools, such as Power BI, Tableau, or Excel.
And that’s it for Azure Data Services.
About the Author
Guy launched his first training website in 1995 and he's been helping people learn IT technologies ever since. He has been a sysadmin, instructor, sales engineer, IT manager, and entrepreneur. In his most recent venture, he founded and led a cloud-based training infrastructure company that provided virtual labs for some of the largest software vendors in the world. Guy’s passion is making complex technology easy to understand. His activities outside of work have included riding an elephant and skydiving (although not at the same time).