Assessing Data Flow Requirements
Start course
Difficulty
Intermediate
Duration
1h 5m
Students
1274
Ratings
4.7/5
starstarstarstarstar-half
Description

In this course, we're going to review the features, concepts, and requirements that are necessary for designing data flows and how to implement them in Microsoft Azure. We’re also going to cover the basics of data flows, common data flow scenarios, and what all is involved in designing a typical data flow.

Learning Objectives

  • Understand key components that are available in Azure that can be used to design and deploy data flows
  • Know how the components fit together

Intended Audience

This course is intended for IT professionals who are interested in earning Azure certification and for those who need to work with data flows in Azure.

Prerequisites 

To get the most from this course, you should have at least a basic understanding of data flows and what they are used for.

Transcript

Welcome to Assessing Data Flow Requirements. Because a data flow will rarely consist of a single component, in this lecture, I want to touch on the collection of components and considerations that are necessary when designing a data flow.

Let’s start with IaaS versus PaaS. When designing a data flow in Azure, you need to consider that almost any service that will be part of the data flow is likely available in Azure as an IaaS-based solution – mostly virtual machines. What this means is that you could very easily lift-and-shift your resources from on-prem to Azure. However, by doing so, you also burden yourself with all the support goodies that an on-prem solution offers, including patching, antivirus, management, etc. However, if you move to a PaaS-based solution instead, you are going to reduce or eliminate those tedious management and support tasks for your data flow solution, while improving functionality and scalability. As such, in cases where there is a PaaS-based solution available, the rule of thumb is to leverage them whenever possible. An example of this would be to leverage Azure Data Factory instead of using SSIS from SQL.

When it comes to HA and DR, your data flow infrastructure is no different from any other IT solution. You need to ensure the availability and resiliency of it, which, by the way, takes us back to IaaS vs PaaS – because by leveraging PaaS-based solutions where possible, much of the necessary HA is going to be provided by the service itself, removing from you plate to deal with. 

DR, however, is a slightly different animal. For starters, when thinking about DR, you might find yourself wondering what DR you actually need. For example, instead of deploying an actual engineered DR solution, your actual solution might be as simple as recreating any resources that experience an issue – especially in the case of any compute or processing resources that are in play.

Ultimately, HA and DR solutions will rely largely on how much IaaS you are using and how much PaaS is in play.

While HA and DR might be less of a concern, either due to the intrinsic availability of PaaS offerings or the ability to easily rebuild compute and processing IaaS resources, data protection is critical. When considering data protection for data being processed in your data flow, you need to think about where the data is stored and if there is replication available. 

Network bandwidth and network latency are two key considerations when designing a data flow – especially when data is flowing from an on-prem environment TO the cloud, and vice versa. I mean, sure, moving data within the cloud or moving data from on-prem to on-prem is generally not an issue. However, when you start traversing both environments with large amounts of data, latency and bandwidth becomes important.

To address potential latency issues between on-prem and cloud, you might want to consider an ExpresssRoute – because ExpressRoute offers the ability to maintain a consistent latency. As far as bandwidth goes, you can typically purchase any bandwidth you need. You just need to be cognizant of what you need. This is obviously based on the amount of data you’ll be moving to and from the cloud, and at what velocity. And speaking of velocity, you’ll also want to think about how you can automate processes such as scaling, scheduling, and triggering processes.

And let’s not forget about backups and monitoring. I mean, if you are dealing with data, you need to consider a backup strategy that can protect that data you are dealing with. The ability to effectively monitor the entire the individual pieces of the data flow, from start to finish, is also important for ensuring a smooth data flow.

Another key consideration when working with data flows is control flow. Think about what components or services are going to be ingesting the data initially, what components are going to process or transform the data, and what processes are going to write the data to its destination storage.

Security is another key consideration when designing a data flow. Because you are dealing with data, it should be a no-brainer that the data needs to be secured. It’s important, especially in cases where data is coming from multiple sources, that people don’t have access to data that they shouldn’t have access to. As such, it’s important that ACLs are maintained.

 

Because “money is no object” is typically reserved for the mega-rich, the cost of things is always going to be a consideration. The best way to deal with cost and pricing is to determine what the “needs” and “wants” are for your data flow solution. With that information in hand, choose the cheapest solution that meets all your “needs”.

About the Author
Students
84531
Courses
82
Learning Paths
62

Tom is a 25+ year veteran of the IT industry, having worked in environments as large as 40k seats and as small as 50 seats. Throughout the course of a long an interesting career, he has built an in-depth skillset that spans numerous IT disciplines. Tom has designed and architected small, large, and global IT solutions.

In addition to the Cloud Platform and Infrastructure MCSE certification, Tom also carries several other Microsoft certifications. His ability to see things from a strategic perspective allows Tom to architect solutions that closely align with business needs.

In his spare time, Tom enjoys camping, fishing, and playing poker.