Amazon Elastic Inference


Compute 2019
Start course

This is a short refresher of the 4 AWS Compute services announced at Re:invent 2018 which will cover:

Learning Objective

  • It aims to provide an awareness of what each of the Compute services is used for and the benefit that they can bring to you within your organization

Intended Audience

  • This course would be beneficial to anyone who is responsible for implementing, managing, and securing Compute services within AWS


  • You should have a basic understanding of AWS core services to help you understand how each of these services fit into the AWS landscape

Related Training Content

Compute fundamentals for AWS

Understanding AWS Lambda to Run and Scale your code

Introduction to Amazon ECS

Introduction to AWS EKS



So you've just kicked off the training phase of your multi-layered deep neural network. The training phase is leveraging Amazon EC2 P3 instances to keep the training time to a minimum. But it's still going to take a while. With time in hand, you begin to contemplate what infrastructure you'll use to run your inferences. You're already familiar with the merits of using GPUs for the training phase. GPUs have the ability to parallelize massive amounts of simple math computations, which makes them perfect for training neural networks. GPUs are more expensive to run than CPUs. But because they can parallelize the number crunching you don't need to run them as long as you would the equivalent training performed on CPUs. In fact, training on GPUs can be orders of magnitude quicker. So it may cost you more per hour to run a GPU but you won't need to run it anywhere nearly as long as you would on a CPU. Besides factoring in cost, training your models faster allows you to get them into production quicker to perform inferences. So in terms of the training phase it makes complete sense to go with GPUs. 

So your contemplation now focuses on whether to use GPU or CPU infrastructure to perform inferencing once the training completes and your model is ready. We know that GPUs cost more per hour to run, performing inferences through a trained neural network are far less taxing in terms of required computation and data volume that needs to be ingested and processed. Therefore, CPUs seem to be the way to go. However, you know from past experience that overtime your CPU-hosted inferencing tends to bottleneck due to overwhelming demand. And this makes you reconsider running the inferencing on GPUs. But you now need to budget in the extra cost as a project consideration. This dilemma of whether to use GPUs versus CPUs for inferencing, with respect to both cost and performance is all too familiar for many organizations. The choice of using a GPU or CPU was a fairly mutually exclusive upfront decision made when using EC2. However, this is no longer the case. 

Amazon Elastic Inference is a service from AWS which allows you to complement your EC2 CPU instances with GPU acceleration, which is perfect for hosting your inferencing models. You can now select the appropriate CPU-sized EC2 instance and boost it's number crunching ability with GPU processing. Like with many other AWS services you only pay for the actual accelerator hours you use. What this means is that you get the full GPU processing power but being up to 75% cheaper than running an equivalent GPU-sized EC2 instance. View the following links for more information. Now for starters, Amazon Elastic Inference is launching with three types of Teraflop-mixed precision-powered accelerators. Amazon Elastic Inference has been seamlessly integrated into both the AWS EC2 console and the AWS CLI. In the following EC2 console screenshot attaching GPU acceleration is as simple as enabling the Add an Elastic Inference accelerator option. The equivalent AWS CLI Command looks like the following: noting that the the existing API has been extended with a new optional elastic-inference-accelerator parameter. 

The following list itemizes several prerequisites that need to be in place to leverage Amazon Elastic Inference. A Private Link endpoint configured for Elastic Inference must be present. An IAM role with the necessary policies to connect to the Elastic Inference accelerator. Build your models. You must build your models using TensorFlow, Apache MXNet, and/or ONNX. You must use the latest AWS Deep learning AMIs, which have been updated with Amazon Elastic Inference support which has been baked directly into the TensorFlow, Apache MXNet deep learning frameworks. As you can see, with a few extra configuration options in place you can have the best of both worlds: CPU-hosted inferencing with GPU acceleration. You no longer need to spend the time contemplating CPUs over GPUs. Take both. 

That now brings me to the end of this course covering Firecracker, AWS Outposts, AWS License Manager and Amazon Elastic Inference. You should now have a basic understanding of these four compute services, allowing you to determine if they are a service that you could use and benefit from, now or in the future. If you have any feedback on this course, positive or negative, please contact us by sending an email to Your feedback is greatly appreciated. 

Thank you for your time, and good luck with your continued learning of cloud computing. Thank you.

About the Author
Learning Paths

Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data center and network infrastructure design, to cloud architecture and implementation.

To date, Stuart has created 150+ courses relating to Cloud reaching over 180,000 students, mostly within the AWS category and with a heavy focus on security and compliance.

Stuart is a member of the AWS Community Builders Program for his contributions towards AWS.

He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.

In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.

Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.