The AWS Network - How Does It Actually Work!?
The AWS Network - How Does It Actually Work!?

In this tablet talk we are going to discuss how the underlying AWS network actually works. This will be an advanced conversation that will describe how traffic moves through the VPC, from the internal IP spaces to the external world, as well as how instance to instance communication takes place. There are a few hidden AWS services that make all the magic happen behind the scenes, and we are going to talk about them in this tablet talk.

Learning objectives

  • Learn how an IP packet is able to make it from one virtual host on its own physical server, to another virtual host on a different physical server
  • Learn how the Mapping Service is able to facilitate communication within the VPC, as well as how Blackfoot is able to route traffic outside of the VPC

Intended Audience

This course is intended for anyone how wants to get a more advanced understanding of how the AWS network works.


To get the most out of this course, you should have general knowledge about the following AWS services: VPC, EC2, Internet gateway, Direct Connect, Route53, S3, and VPC Peering. You should also understand the seven layers of the OSI model, at least at a high level.


Hello and welcome to this tablet talk. My name is Will Meadows. This is me as per usual. And today I want to talk about how does the AWS network actually work!? You know you're serious when we have both an exclamation mark and a question mark. So how does the AWS network work? And this is gonna be a little bit of an advanced topic. And so if you're not super familiar with networking as it is already in AWS, go check out this other tablet talk that I've got here, and it will be able to help you get the understanding of the basics.

So in this tablet talk, we're going to discuss what's involved with getting a piece of traffic, this is from the internet, into AWS, into your VPC, into an EC2 instance. There's a lot of little things that go into making all of these little hops happen, especially when you think about where do all these things actually live? Because if we think about the whole AWS network, right, we know that somewhere there is a data center. In some parts of the data center, there are a bunch of server racks, right? In each of these server racks, there's, you know, again, there's a bunch of servers. And then the EC2 instances themselves.

Let's look at this server down here. Like we have multiple colors. This server down here can then again, be broken up into a bunch of different EC2 instances. And this is all virtualization. This is all that good stuff. It's fairly well known. But some part of it that might not be really well-known is how does a piece of external internet traffic that's never before really seen the front side of your VPC make it from the internet, know which VPC to go into, aka how does it know what data center that goes into? How does it know which box in a server rack it goes into? How does it know which EC2 instance that piece of data is supposed to live on or supposed to travel to? So that's kind of what I'm gonna discuss today. Not any of the infrastructure, like global infrastructure, none of that stuff, but some of the underlying nitty gritties and the question marks that you may have had, but never realized.

So the best place I think to get started with this is to take a look at one of these. We gotta look at a physical host. Well, it's actually one of these. We gotta look at a physical host. So our physical hosts in AWS are these, they're the servers in the server rack, right? So here's our rack. And it's got a bunch of servers on it.

Now, if we take one of these out, we have a box, and this is our physical host. And this is all hardware. It's just like your computer at home, except for it's nothing like your computer at home. It's much flatter, it's sleeker. It's made to be used in a business environment, but it's basically a computer. Now, this physical host is virtualizing EC2 instances and making those available for people to use for their workloads. And what that means is we are abstracting away the physical hardware layer, and we're creating sort of a different virtual layer where we're sort of filling up as many EC2 instances that will fit inside this physical host's memory and CPU power. And all of this stuff is sort of controlled by something called a hypervisor. It sits like right here. And it helps these virtual appliances interact with the physical hardware underneath. And it helps manage all of the goodies. This is the hypervisor.

All right, and so that's happening on every single physical host within the AWS network. And some amount of this is going on. Well, at least the ones concerned with EC2. So let's take a copy of this and let's put that over here. And in fact, let's make another copy of it. 'Cause our question now is, how do we get this EC2 instance, who is in its own VPC, talking to this EC2 instance, who's also in their own VPC. I think this is a good place to start as it will help us build up the formula for how everything else works in connectivity LAN for AWS.

So inside the AWS data center, obviously there is going to be some level of networking between the physical hosts. So we've got a router here, and it knows where this host is in its rack, and it knows where this host is in this rack. And you can imagine there's hundreds of thousands of these everywhere throughout the world. And this is the physical layer of the infrastructure. So in order to send a packet from the red instance over to the yellow instance, we'll at least have to know sort of the physical encapsulation of where it is in the AWS network.

The next thing our sort of jumbo packet is gonna need is where is it VPC-wise? And this would be like, is it in the red VPC? Is it in the yellow VPC? Which VPC is it on that physical host? And the next bit of information we'll need is, what is the IP address of that instance within the VPC? So I'll call it IP internal. And then finally, we have whatever it is that your data was. Maybe it's a picture. Look, there's a tree. Maybe the red instance was trying to send a picture of a, I don't know, was that a gravestone next to a tree, over to the yellow instance for reasons.

So this is when we get to introduce a new service that you might not be aware of, which is the mapping service. And the mapping service is responsible for learning and understanding where everybody lives. 'Cause when this instance is trying to create this ginormous packet, it's not aware of the physical IP address of its destination. It knows who it wants to talk to, and it knows what VPC it's in, but other than that, it doesn't know where it is in the world. It's Carmen Sandiego of networking.

So what will happen, in order to form this packet, is that the red instance in the red VPC will be like, "Hey, mapping service, "where in the heck can I find "the yellow instance in the yellow VPC?" And mapping service would be like, "I've got you. "Here's that information." There's a lot of ARPing. It's like fake ARPing. The mapping service is like a virtual layer to a network switch. So once we get that information back from the mapping service, that gets wrapped up with the physical IP, the VPC info, and the internal information for where it's gonna go. And of course our actual packet information we want to send across. So that gets shoved over the wire and hits over into the physical host.

Now, at this point, we are going to have a little check here, 'cause we can't just believe everything that comes through. Otherwise people could start spoofing. And it asks the mapping service, "Hey, I just got some information. "It's from some red instance from the red VPC. Do you have any information about that?" And then the mapping service says, "Yeah, actually. The red VPC has checked in with us. We know that they exist on a certain physical host, which you have specified. So everything's good." And that packet, boop, gets routed where it's supposed to go.

If that check never succeeds, if the mapping service has no idea what's going on, that packet just gets dumped. It gets thrown into the ether, and it won't be inspected by the instance. There's a trash can. It's filled with dead packets. Now it's more trash can like.

All right, so this was a good instance to examine where both of these VPCs were friends. They knew each other, they had VPC peering going on. And we'll demonstrate that with a little smiley face here and a little smiley face here. They were connected in some fashion.

Now let's imagine we had a different VPC and it's not friendly. Okay, we're gonna put it, we're gonna put it over here. And this instance in this VPC is trying to do something naughty. Let's say that it also wants to send some packet over to the yellow instance in the yellow VPC, but they are not friends. They are, they're enemies. Look at this. These people don't like each other. But it goes, "Hey, mapping service. Can you give me the information for the yellow instance in the yellow VPC?" And the mapping service will take a quick look at its records and be like, "Hm, let's see. I know that red and yellow are friends because they've got a VPC peering connection, but there is no connection between you and that other VPC. So no." And it just dumps that.

But let's say for some reason that you're able to break apart that packet you sent to the mapping service, because it's possible, you can, and you're like, "Ha-ha. "I'm gonna pretend to be in the same VPC as the other guy, and I'm gonna pretend to be their friend." Like squiggly evil eyes. And it says, the mapping service, "Hey, now that I'm friendly," in the same VPC. "Can I get that information for the yellow guy?" And the mapping service goes, "Wait a second. Everybody who's on this physical host that registered with me earlier, none of you were yellow before. Why are you yellow now?"

So you get two types of checks from the mapping service on both the receiving side and the sending side, which is pretty cool. And any time that you get one of these nos, the mapping service is gonna be like, "Hey, AWS, something funky is going on. Send a human to come check this out for me."

All right, we're getting a bit messy here. So let's clear all this up. So one thing you might've noticed with this situation is that if everybody has to talk to the mapping service, all of the physical hosts that want to talk to each other, like these guys over here, that we had, there's gonna be some built-in extra latency in here. Every time the host has to ask the mapping service where it should go and the mapping service returns that information. And the same thing on this side, when the physical host is checking in to make sure that the information is correct, that is receiving from other physical hosts.

So the way we get around that added latency is that the mapping service actually ends up caching a lot of this information directly on the host, or near enough at least. So each of these physical hosts either has a hardware router, an extra one, sort of sitting right underneath it. It can also be a software router. And some of the older hosts are set up like that. This is, this router here is more related to nitro, which is a newer service that came out fairly recently. And nitro helps to offload some of the work from the hypervisor into a hardware appliance. And so what will happen with the mapping service is this cache gets updated very frequently and everything that is the mapping service is positioned right here as close to the physical host as possible. And it's constantly being updated. In fact, there is zero cache miss like the physical host never really even talked to the mapping service itself. It will never miss the cache. As far as it's concerned, this cache is always accurate. And the mapping service is constantly flushing and putting new data in there as often as it can. And that's what helps keeping the network as fast and as quick as it possibly can be.

All right, so we now have a pretty good idea of how everything works within VPC. Everything is just sort of bouncing around off the mapping service and kind of figures out where everything needs to go. But now the question is, how does that traffic get out of the VPC? How does it move to let's say the internet? Or how does it even get back on prem?

Let's say, you've got your buildings over here. And this is connected either via direct connection or maybe through a VPN. And another question you might have is how do we get this over to something like S3 with like an S3 endpoint? How does that all work? Well for that, we're gonna have to talk about something called Blackfoot. Blackfoot is an edge device that is kind of like another big router and helps deal with removing this encapsulation.

So we take a look like at a VPN connection. What will happen is a packet will go from the VPC. It'll hit the Blackfoot edge device. The Blackfoot device will unencumber that packet from the internal information, the VPC information, and will wrap it in the IPSec tunnel information as required, and will send it out the internet wherever it needs to go to get back to your corporate side or wherever you might be. And a very similar process happens when we talk about the direct connection.

Again, your packets will get sent to Blackfoot. Blackfoot will unwrap it from its normal VPC encapsulation, and will instead wrap it up in a VLAN type encapsulation, send that out towards the direct connect router. And then one of the neat parts happens is when you're actually trying to send traffic out to the internet, like we don't want any encapsulation. We want to remove all of the VPC garbage 'cause it needs to go to an actual host somewhere out on the internet that doesn't know what a VPC is. We have to do a little bit more work here. We need to send it out towards Blackfoot and Blackfoot's gonna get this packet again. That's got all this extra encapsulation about VPC and internal information, specifically, like we have our destination out to the internet and that's gonna be some sort of real, you know, host number, but it's our source that needs to change 'cause internally we're gonna have something like 10.0.whatever. And we need that to be a real address that our destination from the internet can send back traffic to.

So this is where Blackfoot will remove this and put our elastic IP address into the packet. And then that gets shuffled off to the internet. And then when things come back in, let's say we're trying to return information, as we just talked about towards the source, it comes back in, it hits Blackfoot, Blackfoot removes the elastic IP, and it puts back in your source 10.O.whatever into this and re-encapsulates it back, it's VPC encapsulation, and puts it right back where it's supposed to go doing all the kind of reverse mapping process with the mapping service as well.

And then the final thing to look at is, how do we use like S3 endpoints that would be attached to your VPC. And it kind of does the same thing where the VPC comes with all of its encapsulation and gets sent over to Blackfoot, and then Blackfoot encapsulates that with the VPC endpoint ID, and doesn't change any of the underlying source. It actually stays the same. And so that gets shoved over to S3, stays within the internal network. That information gets unwrapped and used, and it can be sent right on back through for you to do whatever it is that you wish. Maybe you were trying to download an image of a puppy and that's pretty much it. That's at least the basics of the advanced part of how the networking works for VPC and the mapping service in Blackfoot.

So I hope you enjoyed this a little bit of a deep dive, these kind of cute features that you may not have known anything about. Like how does VPC actually work? It won't help you too much on any tests or anything, but I think it's good. It helps build the underlying knowledge of the infrastructure. All right, well, thank you so much for your time. If you have any questions, please send me an email at Cheers.

About the Author

William Meadows is a passionately curious human currently living in the Bay Area in California. His career has included working with lasers, teaching teenagers how to code, and creating classes about cloud technology that are taught all over the world. His dedication to completing goals and helping others is what brings meaning to his life. In his free time, he enjoys reading Reddit, playing video games, and writing books.