New at AWS re:Invent: Monday Night Live with Peter DeSantis

This year at re:Invent saw the return of the Monday Night Live session for the first time in three years. This keynote was dedicated to providing visibility into the underlying components of AWS and how the team at AWS is constantly innovating to enhance its services. The biggest theme of the keynote was how to improve performance without sacrificing security or increasing cost. 

The session this year was led by Peter DeSantis, the Senior Vice President of AWS Utility Computing. In this post, I’ll go through each of the new announcements and what they can offer you as a customer. 

Nitro v5

Nitro v5

The first new announcement was focused on a new generation of the Nitro hypervisor. Over the years, Nitro has continuously evolved and provided customers with considerable improvements in bandwidth and packet rate. This year, DeSantis unveiled the fifth generation of Nitro.

This chip provides another huge leap in performance, offering twice the transistors, twice the computational power, and 50% faster memory performance. By doubling the transistors and computational power, this chip supports 60% higher packets per second (PPS) from previous generations, 30% lower latency, and 40% better performance per Watt. 

With a new Nitro chip, generally comes a new instance type – and this year was no exception. 

The C7gn Instance Type


The C7gn instance type is a new instance type powered by Nitro v5 and Graviton, specifically created for network intensive workloads.  This new instance is said to offer 200 Gbps networking bandwidth, 2x higher packets per second, and up to 50% better packet processing performance compared to its predecessor – the C6gn instance type. 

The workloads that benefit the most from this new network-optimized instance type are high performance computing (HPC) workloads such as weather forecasting, fluid dynamics, and more. However, other networking intensive workloads such as data analytics may see benefit in using this instance type as well. 

Graviton3E Processors


Annapurna Labs has certainly been busy, as we have yet another launch of their custom hardware. AWS customers can now take advantage of the latest generation of processors in the Graviton family. The Graviton3E processors builds off of the success of the Graviton3 processors, but increases the performance with workloads that require high performance levels for floating point and vector math. 

The Graviton3E processors support up to 60% better performance per watt vs comparable x86 instances and up to 25% better performance than the Graviton2 processors. When it comes to vector instructions, the Graviton3E holds a considerable performance boost over the Graviton3 processors, providing up to 35% improvement. Due to this performance boost, this processor lends itself naturally to HPC workloads. 

So, what do we get when we have a new Nitro chip and a purpose-built HPC processor? Another new instance type!

The HPC7g Instance Type 

The HPC7g Instance Type 

This new instance type is coming soon and is powered by both the new Nitro v5 chip and the purpose-built HPC Graviton3E processor. This instance type provides the best price for performance, combined with improved energy efficiency for HPC applications and distributed computing workloads. 

A number of AWS customers are excited about the performance enhancements this will bring to workloads in the fields of genomics, manufacturing, weather predictions, computational fluid dynamics, and more. 

EBS io2 Volume Improvements

A common phrase from Peter DeSantis tonight was “tail latency” – otherwise known as high-percentile latency, or latency that customers rarely see. With EBS, often this latency is seen when dealing with writes, due to required replication for durability. For example, previous io2 volumes saw that every 1 in 100,000 requests took 35 ms to complete. While this is a relatively small number of requests that face this latency, unlucky customers had to factor that into their overall latency. 

This led to AWS wanting to improve the performance of the io2 volumes, and tonight they announced that they did just that. This performance improvement comes in part to AWS’ datagram protocol called the Scalable Reliable Datagram (SRD) protocol. 

The Scalable Reliable Datagram (SRD) protocol

This protocol supports multi-pathing, can detect dropped and delayed packets faster than TCP, retransmits operations in microseconds, and runs on the nitro controller instead of the guest operating system, so it does not impact your application. 


By using SRD with EBS, AWS achieved far greater performance on the io2 volume, achieving a 90% reduction in tail latency, and increased throughput by four times. 

Early next year, all EBS io2 volumes will be running with SRD, providing you lower latency and higher throughput at no additional cost. All you have to do is simply create a new io2 volume. 

Elastic Network Adapter (ENA) Express

Elastic Network Adapter (ENA) Express

Another new feature based on the technology of SRD is Elastic Network Adapter (ENA) Express. This new launch enables you to bring the benefits of SRD to any network interface, enabling higher bandwidth and lower tail latency. With ENA Express, you don’t need to install any additional software or charge your instance, you simply just enable it by toggling it on for your ENA.

For example, let’s say you want to use ENA express for communication between two instances in an Availability Zone. Both instances will need ENA express turned on. Once ENA Express finds that both instances have the feature turned on, it establishes an SRD connection, enabling the traffic to take advantage of the SRD performance benefits. 

It additionally works with both TCP and UDP. TCP over ENA express provides 25 GB/s single connection throughput. 

Soon, this feature will be supported on other services that often rely on ENAs, such as ElastiCache. ENA Express with ElastiCache will offer 44% lower tail latency. 

The Trn1n Instance Type

The Trn1n Instance Type

The existing Trn1 instance type provides high-performance deep learning training and supports 16 Trainium processors, 512 GB of memory per instance, and 800 Gbps of network bandwidth.

This year, we have yet another exciting launch of an instance type that builds off the Trn1, while taking advantage of performance improvements from SRD and the Elastic Fabric Adapter. This new instance type is called the Trn1n, and is a networking optimized version of the Trn1 instance type. It provides better network performance, with 1600 Gbps EFA networking. 

This performance improvement will be most beneficial to machine learning workloads that require ultra fast distributed training of large scale ML models. 

Machine Learning Improvements

Machine learning took up a large portion of the session with few official feature launches, but tons of behind the scenes improvements to improve scalability and communication and remove potential bottlenecks. 

The two big performance improvements that Peter DeSantis spoke of were: 

  1. Using stochastic rounding, which enables practitioners to use 16 bit data types for speed, while getting the accuracy of 32 bit data types for parameters. This has been a topic at past re:Invent conferences, as stochastic rounding is already used in existing instance types such as the Trn1 instance type.
  2. The Ring of Rings algorithm that enables compute processors to more efficiently exchange information after each iteration of a model, leading to 75% faster synchronization between processors. This is available in PyTorch. 

AWS Lambda SnapStart

AWS Lambda SnapStart

Last, but certainly not least is perhaps the most exciting announcement of the night: AWS Lambda SnapStart. 

Lambda SnapStart is the latest improvement to the pesky Lambda cold start issue. This new feature reduces cold start latency by up to 90% at no additional cost to you. So how does it work? 

Lambda initialization

It does this by changing the initialization process. When you run your function, the service initializes your function and caches a snapshot of your Lambda function state. Once your function is invoked again, it will retrieve the required portions of the cached snapshot and use it to run your function. 

Currently, using SnapStart may require some application changes. For example, one of the biggest factors you have to consider is any unique content that you generate during initialization, such as random numbers. In order to maintain uniqueness, you’ll have to now generate this content after initialization if you want to enable SnapStart. 

This feature is available now and is recommended for general purpose Lambda functions and functions that require low latency. 

Cloud Academy