AWS Batch


What is Compute?
EC2 Auto Scaling
AWS Batch
1h 21m

This course covers the core learning objective to meet the requirements of the 'Designing Compute instances solutions in AWS - Level 1' skill

Learning Objectives:

  • Understand there are different Amazon EC2 compute families
  • Understand the different services that provide compute resources, such as AWS Lambda compared to Amazon EC2, or the Amazon Elastic Container Service, etc
  • Understand that elasticity can be achieved through AWS Auto Scaling
  • Understand the purpose of AWS Elastic load balancers



Hello, and welcome to this lecture where I'll provide a high level overview of AWS Batch. As the name suggests, this service is used to manage and run Batch computing workloads within AWS. Before we go any further, I just want to quickly clarify what Batch computing is. 

Batch computing is primarily used in specialist use cases which require a vast amount of compute power across a cluster of compute resources to complete batch processing executing a series of jobs or tasks. Outside of a cloud environment, it can be very difficult to maintain and manage a batch computing system. It requires specific software and requires the ability to consume the resources required, which can be very costly. However, with AWS Batch, many of these constraints, administration activities and maintenance tasks are removed. You can seamlessly create a cluster of compute resources which is highly scalable, taking advantage of the elasticity if AWS, coping with any level of batch processing while optimizing the distribution of the workloads. All provisioning, monitoring, maintenance and management of the clusters themselves is taken care of by AWS, meaning there is no software to be installed by yourself. 

There are effectively five components that make up AWS Batch service which will help you to start using the service, these being: Jobs. A job is classed as a unit of work that is to be run by AWS Batch. For example, this can be a Linux executable file, an application within an ECS cluster or a shell script. The jobs themselves run on EC2 instances as a containerized application. Each job can at any one time be in a number of different states, for example, submitted, pending, running, failed, among others. Job definitions. These define specific parameters for the jobs themselves. They dictate how the job will run and with what configuration. Some examples of these may be how many vCPUs to use for the container, which data volume should be used, which IAM role should be used, allowing access for AWS Batch to communicate with other AWS services, and mount points.

Job queues. Jobs that are scheduled are placed into a job queue until they run. It's also possible to have multiple queues with different priorities if needed. One queue could be used for on-demand EC2 instances, and another queue could be used for the spot instances. Both on-demand and spot instances are supported by AWS Batch, allowing you to optimize cost, and AWS Batch can even bid on your behalf for those spot instances. 

Job scheduling. The Job Scheduler takes control of when a job should be run and from which Compute Environment. Typically it will operate on a first-in-first-out basis, and it will look at the different job queues that you have configured, ensuring that higher priority queues are run first, assuming all dependencies of that job have been met. 

Compute Environments. These are the environments containing the compute resources to carry out the job. The environment can be defined as managed or unmanaged. A managed environment means that the service itself will handle provisioning, scaling and termination of your Compute instances based on the configuration parameters that you would enter regarding the instance type, purchase method, such as on-demand or spot. This environment is then created as an Amazon ECS Cluster. Unmanaged environments are provisioned, managed and maintained by you, which gives greater customization. However, it does require greater administration and maintenance and also requires you to create the necessary Amazon ECS Cluster that the managed environment would have done on your behalf. 

If you have a requirement to run multiple jobs in parallel using Batch computing, for example, to analyze financial risk models, perform media transcoding or engineering simulations, then AWS Batch would be a perfect solution.

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.