Designing Data Flows in Azure
Data Flow Basics
Designing a Data Flow Solution
The course is part of these learning pathsSee 2 more
This Designing Data Flows in Azure course will enable you to implement the best practices for data flows in your own team. Starting from the basics, you will learn how data flows work from beginning to end. Though we do recommend an idea of what data flows are and how they are used, this course contains some demonstration lectures to really make sure you have got to grips with the concept. By better understanding the key components available in Azure to design and deploy efficient data flows, you will be allowing your organization to reap the benefits.
This course is made up of 19 comprehensive lectures including an overview, demonstrations, and a conclusion.
- Review the features, concepts, and requirements that are necessary for designing data flows
- Learn the basic principles of data flows and common data flow scenarios
- Understand how to implement data flows within Microsoft Azure
- IT professionals who are interested in obtaining an Azure certification
- Those looking to implement data flows within their organizations
- A basic understanding of data flows and their uses
Related Training Content
For more training content related to this course, visit our dedicated MS Azure Content Training Library.
Because data flow will rarely consist of a single component, in this lesson, I wanted to touch on the collection of components and considerations that are necessary when designing a data flow. Let's start with Infrastructure as a Service versus Platform as a Service. When designing a data flow in Azure, you need to consider that almost any service that will be part of the data flow is likely available in Azure as an Infrastructure as a Service-based solution. Generally speaking, this would be mostly virtual machines. What this means is that you could very easily lift-and-shift your resources from an on-prem environment to Azure. However, by doing so, you also burden yourself with all of the support goodies that an on-prem solution offers, including patching, antivirus, management, et cetera. However, if you move to a Platform as a Service-based solution, you're going to reduce or eliminate those tedious management and support tasks for your data flow solution, while also improving functionality and scalability. As such, in cases where there is a Platform as a Service-based solution available, the general rule of thumb is to leverage them whenever possible.
An example of this would be to leverage Azure Data Factory instead of using SSIS from SQL. When it comes to HA and DR, your data flow infrastructure is no different from any other IT solution. You need to ensure the availability and resiliency of it, which by the way, takes us back to Infrastructure versus Platform, because leveraging Platform as a Service-based solutions when possible, much of the necessary HA is going to be provided by the service itself, which removes it from your plate for you to deal with. DR, however, is a slightly different animal. For starters, when thinking about DR, you might find yourself wondering what DR you actually need. For example, instead of deploying an actual engineered DR solution, your actual solution might be as simple as recreating any resources that experience an issue, especially in the case of any compute or processing resources that are in play. Ultimately, HA and DR solutions will rely largely on how much Infrastructure as a Service you're using, and how much Platform as a Service is in play.
Now while HA and DR might be less of a concern, either due to the intrinsic availability of Platform as a Service offerings or the ability to easily rebuild compute and processing Infrastructure as a Service resources, data protection is critical. When considering data protection for data being processed in your data flow, you need to think about where the data is stored, and if there is replication available. Network bandwidth and network latency are two key considerations when designing a data flow, especially when data is flowing from an on-prem environment to the cloud, and vice versa. I mean, sure, moving data within the cloud, or moving data from on-prem to on-prem is generally not an issue. However, when you start traversing both environments with large amounts of data, latency and bandwidth becomes important. To address potential latency issues between on-prem and cloud, you might want to consider an express route, 'cause ExpressRoute offers the ability to maintain a consistent latency. As far as bandwidth goes, you can typically purchase any bandwidth you need. You just need to be cognizant of what you need. This is obviously based on the amount of data you'll be moving to and from the cloud, and of course, at what velocity. Speaking of velocity, you'll also want to think about how you can automate processes such as scaling, scheduling, and even triggering processes. And of course, let's not forget about backups and monitoring. I mean, if you're dealing with data, you need to consider a backup strategy that can protect the data that you're dealing with.
The ability to effectively monitor the entire solution, as well as the individual pieces of the data flow from start to finish is also important for ensuring a smooth data flow. Another key consideration when working with data flows is control flow. Think about what components or services are going to be ingesting the data initially, what components are going to process or transform the data, and what processes are going to write the data to its destination storage. Security is another key consideration when designing a data flow. Because you're dealing with data, it should be a no-brainer that the data needs to be secured. It's important, especially in cases where data is coming from multiple sources, that people don't have access to data that they shouldn't have access to. That being said, it's critical that ACLs are maintained. As far as pricing goes, because money is no object is typically reserved for the mega-rich, the cost of things is always going to be a consideration. The best way to deal with cost and pricing is to determine what the needs are versus the wants as they relate to your data flow solution. With that information in hand, choose the cheapest solution that meets all your needs. So as you can see, there are quite a few components and considerations that you need to think about when designing a data flow.
About the Author
Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.
In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.
In his spare time, Tom enjoys camping, fishing, and playing poker.