This course covers the Design advanced applications part of the 70-534 exam, which is worth 20–25% of the exam. The intent of the course is to help fill in an knowledge gaps that you might have, and help to prepare you for the exam.
Welcome back. In this lesson, we'll be talking about compute-intensive applications.
There are a lot of tasks that require massive amounts of computational power to complete. To name just a few: engineering simulations; genome analysis; financial modeling; and video rendering. And, with tasks such as this, the more computation that you can throw at it, the faster the process will be completed.
Now, as you know, the maximum size of a single server is constrained by hardware, and what I mean by that is that any individual server will only perform as well as the best current hardware. So, that means you may need to use the compute power of multiple servers all working on the same problem at the same time to accomplish your task.
When it comes to actually doing this, there are a lot different options, however there are two main categories that we'll cover in this lesson, and those are Embarrassingly Parallel, and Tightly Coupled.
Embarrassingly Parallel applications consist of separate executables or distinct services that can work on their own jobs without needing to communicate with each other.
This method allows you to add and remove instances as needed, and that would increase or decrease the total required computation time. Tasks such as software testing, media encoding, and image processing are examples of Embarrassingly Parallel tasks.
Tightly Coupled applications require compute nodes to interact or exchange intermediate results, so unlike the Embarrassingly Parallel method, the nodes need to able to communicate with each other, and, they tend to communicate via Message Passing Interface, which is typically abbreviated MPI.
MPI is the de facto standard to exchange messages between nodes in parallel computing. When you need to exchange information between nodes, this can become a bottleneck, so you can use RDMA, or Remote Direct Memory Access to improve performance. Tightly Coupled tasks include things such as weather forecasting and engineering design and analysis.
So the logical question is, how does Azure help with computationally intensive tasks? The reality is that, cloud platforms provide so much computing power that there are a lot of potential ways to handle High-Performance Computing needs. HPC stands for High-Performance Computing, so I'm going to use that abbreviation throughout.
Now, while there are a lot of options, the options that I'm going to cover are going to be Hybrid HPC clustering, Azure based HPC clusters, and Custom Solutions using the competing consumer pattern.
Let's start with the Hybrid HPC Cluster option. Microsoft has long since offered the HPC Pack so that you can run your own High-Performance Computing tasks on premises. The only problem is that if you need more computational power for your on-premises cluster, that means you need to buy additional hardware.
Now, that would require an up-front purchase and it can be a blocker, especially if you're only planning on using this for a one-off task. So this is where a hybrid approach helps. It allows us to add in additional resources as needed without buying new servers. The head node will live on-prem, and the cloud nodes will just serve as extra compute. If you already have an on-prem HPC cluster, this will help you to extend that.
Alright, the next solution was Azure Based HPC Clusters. This is basically the same thing as the hybrid approach in terms of setup, except that you'd install the HPC Pack in Azure. This runs inside of an Azure Virtual Network, and everything lives in Azure.
So this allows you to shut everything down when you're done and you no longer need any of this. And since this doesn't require any up-front investment, it's fairly cost-friendly. Now, it will have some investment into setup and maintenance, but that's a fair tradeoff.
Okay, the final option is a Custom Cloud Solution based on the Competing consumer pattern. Now there are other patterns you could use, but this is the one I wanna cover here. The idea is that you have one or more sources putting some unit of work onto a queue. And then, you can have a pool of one or more consumers.
To give you a better example, you could take a collection of images, and put them into a queue, and then you'd have a pool of consumers that would be responsible for grabbing the images off of the queue and performing some sort of image processing.
If for some reason there's a failure, the failed image would remain in the queue, and it can be picked up by another consumer. And this pattern allows you to scale the consumer pool independently of the pool of resources that put the new work onto the queue.
So, that's just three potential options out of the many possible options. Now you may have wondered about the actual virtual machines used for HPC. Azure has a set of machines that are tailored for High-Performance Computing.
There are four options in the "A" family that are for HPC. The A8, A9, A10, and A11; the difference being that A8 and A9 implement RDMA, which is Remote Direct Memory Access, to allow for very fast inter-node communication.
Alright, that's gonna wrap up this lesson. In the next lesson, we'll cover integrating some of the different Azure services into your solution. So, if you're ready to keep learning, then let's get started in the next lesson.
About the Author
Ben Lambert is the Director of Engineering and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps.
When he’s not building the first platform to run and measure enterprise transformation initiatives at Cloud Academy, he’s hiking, camping, or creating video games.