MWAA Environment Classes

Start course
Overview
Difficulty
Intermediate
Duration
15m
Students
64
Ratings
5/5
starstarstarstarstar
Description

This course delves into Amazon-managed workflows for Apache Airflow (MWAA). This is a great service for anyone already using Apache Airflow, and wanting to find a better way to deal with setting up the service, scheduling, and managing their workflow.

Learning Objectives

  • Understand how Amazon-managed workflows for Apache Airflow is implemented within AWS
  • Learn about DAGs (Directed Acyclic Graphs), which Apache Airflow uses to run your workflows
  • Understand the key components required to set up your own Managed Airflow environment

Intended Audience

This is a great service for anyone already using Apache Airflow, and wanting to find a better way to deal with setting up the service, scheduling, and managing their workflow.

Prerequisites

To get the most out of this course, you should have a decent understanding of cloud computing and cloud architectures, specifically with Amazon Web Services. You should also have some background knowledge about Apache Airflow, however, that is not a hard requirement. Basic knowledge of ELT pipelines and state machines would also be beneficial.

Transcript

Each MWAA environment runs its worker, scheduler, and web server based off of AWS Fargate containers. These containers run the Celery Executor that helps to scale out the number of workers your workflow requires. Your choice of environment class will determine the size / power of these components as well as the Amazon Aurora PostgreSQL metadata database.

You have a choice of three different classes that provide increasing amounts of performance. 

There is a mw1.small class - which can support up to 50 DAGs, The mw1.medium which supports up to 250 DAGs, and finally, the mw1.large which can support up to 1000 DAGs.  Each of these types have greater and greater amounts of backend horsepower to help deal with the increased load.

These numbers are based on typical usage, so if you are doing something more out of the box, maybe opt into a larger size. 

You will have to set a maximum worker count - this is an auto-scaling cost control mechanism. Airflow will create new workers up to this number in order to deal with the required throughput. Any excess throughput will be queued using airflow's native queuing mechanisms. 

You can also set a minimum worker count ( of at least one) or up to your maximum. This would be good to set higher than 1 if you wanted your workflow to always be ready for higher load without the need to wait for autoscaling to kick in.

 

Finally, you can select the number of schedulers you want your environment to use. It defaults to 2 which is probably fine for most workloads, however, you can increase this up to 5 if your solution demands it.

About the Author

William Meadows is a passionately curious human currently living in the Bay Area in California. His career has included working with lasers, teaching teenagers how to code, and creating classes about cloud technology that are taught all over the world. His dedication to completing goals and helping others is what brings meaning to his life. In his free time, he enjoys reading Reddit, playing video games, and writing books.