The course is part of this learning path
This course covers a few strategies for isolating your EC2 instances in response to a security event and explores the pros and cons of those strategies.
- Learn how to isolate an EC2 instance's network communication with various levels of granularity
- Understand the positives and negatives associated with each technique
I would recommend this course for any solutions architects, developers, system administrators, and network administrators who are responsible for the security of their architectures.
To get the most out of this course, you should have a decent understanding of cloud computing and cloud architectures, specifically with Amazon Web Services. You should know about VPC, Security groups, NACLS, and all the basic level networking concepts for AWS. It would be helpful if you had some background in IT or network security, but it's not required.
When you develop and create an incident response strategy, it is crucial that containment is one of your primary focuses. Appropriately containing compromised resources, such as EC2 instances, allows you to perform forensic investigations, removal of harmful elements from your architectures, and recovery of any important data that may be on that instance.
That is why it is very important to have an understanding of what you should do before an incident occurs. This means having a game plan ready. I would recommend having an incident response playbook already created for these kinds of situations. Having a step-by-step guide of what to do when an incident occurs, can greatly increase recovery time, reduce human error, and return a little bit of your peace of mind that everything will be ok.
For this lecture, we are going to cover isolation as a containment mechanism. Isolation of affected instances is fairly easy to understand for people just starting their incident response journey and provides a great deal of actual security when implemented well. Isolation is the concept of limiting the visibility and scope of an element so that its actions only affect itself. This means the instance will not be able to see other instances or nodes on the network, as well as not having the ability to reach out to the internet. It is a lot like putting the EC2 instance in a padded room so that it can't hurt itself or others.
Starting off, in order to actually isolate an instance, we first need to detect that something is wrong.
AWS has created and implemented a number of very powerful services that can help you detect when something is wrong with your environment. The two obvious services that can help you detect issues with your EC2 instances are Amazon Guard duty and amazon inspector.
Amazon guard duty is a threat detection service that continually monitors and protects your AWS accounts, workloads, and data. It functions by monitoring and analyzing your metadata streams that come from AWS CloudTrail Events and VPC flow logs. Using this data, with the help of some machine learning, guard duty is able to watch for anomalies within your architectures. As an example, Guard duty is able to detect compromised EC2 instances that have been set up to serve malware to your users, or to mine for bitcoin.
If guard duty detects such a threat, it will notify you through detailed and actionable alerts that can be integrated into event management and other workflow systems.
Amazon inspector is another automated security service that can assess your network and the accessibility of your amazon EC2 instances. Additionally, Amazon Inspector can also assess the security state of your applications running on those instances. you can automate security testing against your fleets to make sure they are all running according to plan, and if it does find any issues Inspector can notify you directly by email, or it can message any service that accepts SNS notifications.
If you want some more education on Guard duty, please take a look at this course over here: https://cloudacademy.com/course/understanding-amazon-guardduty/introduction-60/
And if you want to learn more about amazon inspector, please check out this course: https://cloudacademy.com/course/amazon-inspector/introduction-82/
I just wanted to point out a few services that will help you detect when something is wrong with an instance, and that it might be time to isolate it. With that out of the way, let's talk about what we should do when we find out something is wrong with an instance.
When you have determined that it is time to isolate an EC2 instance, there are a number of ways we can complete this task. Each of them have their pros and cons which include things like how easy they are to implement, the scope of their isolation, and use case specific hang ups.
Let's start off this part of the conversation by looking at isolating an EC2 instance with a security group.
This method of isolation is probably the one you would think of first when building out your incident response playbook. Security groups are quite easy to add onto an existing EC2 instance, and can be configured to limit traffic ingress quite easily - seeing as how their default state is an implicit deny all for traffic. There are however a number of things to keep in mind when using security groups in this manner.
First off, when using security groups, you have to explicitly allow traffic using rules. These rules tell the security group what type of traffic and on what ports should be allowed out of the system. Now if an instance has multiple security groups active that overlap in their coverage, Amazon EC2 will apply the most permissive rule to that instance. Security groups are also stateful, which means they remember their connection and allow responses back to the EC2 instances or out of the instance automatically (regardless of any outbound or inbound rules)
So what that inherently means is that you can never shut off traffic to an instance by adding a security group. You can only allow specific traffic with these means. So in order to begin isolation of an instance, it would be very important to remove any existing security groups from the instance or delete all the rules from any security groups attached to the affected instance. And these Rule changes can happen at any time and take effect immediately.
You could then attach an isolation security group, a blank security group ( with no rules), to that instance to enforce its lack of connectivity.
However, we still have a problem. Since security groups are stateful, they keep track of certain connections to allow traffic back into the network. This helps with the whole implementation stateful connections business that everyone loves about security groups.
So now I get to introduce you to the horrid problem of tracked vs untracked connections.
Untracked connections are from traffic that come from a 0.0.0.0/0 (all traffic) rule AND a 0.0.0.0/0 (all traffic) from (0-65535) ports in the other direction. And this includes both inbound and outbound ways this rule can be written.
Any traffic that fits this category will be immediately interrupted when a rule from a security group changes that would normally stop the flow of traffic. Aka removing a rule, updating a rule, or deleting a security group.
However, there are some types of connections that security groups track, and these do not follow that president. Tracked connections apply to any traffic that has a specific IP or CIDR rule within the security group. This would be something like allowing 203.0.113.1/32 for example. This is a specific IP address that has been allowed on the security group to do a thing.
This type of traffic will NOT be immediately interrupted if a rule that has previously allowed its traffic to flow is removed.
So you could imagine that if a bad actor had access to an instance with this specific type of tracked connection available to it, even if you removed all the old security groups and placed your instance in the isolation security group, they could still maintain access to that affected unit, through that connection.
In order to be 100% sure that there are no tracked connections still available to the instance using security groups, you will have to do the following.
- Create a dedicated “isolation” security group.
- Create a single rule of 0.0.0.0/0 for all traffic in both the inbound and outbound rules.
- Remove any existing security groups attached to the instance.
- Associate the Isolation security group to the instance.
- Finally, delete both the inbound and outbound rules you created for the isolation security group.
This will convert all traffic to untracked and then terminate those untracked connections from the instances.
You could also break this up into steps and have these as security groups ready to deploy instead of having to add and remove rules on the fly.
This would involve having a Step1 security group that already has the 0.0.0.0/0 all traffic all ports rules applied to it. You will also need a Step2 security group with zero rules applied to it.
The order of operations would be to remove any existing rules on the affected instances. Then add the step 1 security group. Then you would remove the step 1 security group. Then you would associate the step 2 security group to that instance.
Either way is not very pretty, but it does work - and quite frankly it's all we have. It would be nice if AWS were to implement a 1 button solution for this.
The next level on the EC2 network isolation chain would have to be the NACL (the network ACL). This handy security device helps you direct traffic into and out of your network at the subnet level.
NACLs work by explicitly allowing or denying access to a subnet based on rules you establish. NACLs are stateless, which means that there must be an explicit rule that allows response traffic back into the network or out of the network (unlike Security groups) which are stateful and do this for you.
All NACL rules are based on external IP addresses or CIDR blocks, and are not relative to any internal destinations. For example, we could allow all traffic from an IP address like 18.104.22.168 which exists outside our network. However we are unable to, for example, Deny all traffic to 192.168.0.1 which exists inside our network.
It is also important to note that a NACL and its associated rules can only be added to one subnet at a time.
Just like with subnets, I am going to go over a few of the benefits of using a NACL to isolate an EC2 instance, as well as some of the problems
First off, it is extremely easy to stop both inbound and outbound communications using a NACL. With just a single inbound and outbule rule, you can terminate both existing connections as well as prevent any future connections to the affected instance. There is no multiple step process involved here like you will have with a security group.
Unfortunately using a NACL like this is very much like trying to cut butter with an axe. You will most assuredly chop through your diary adversary, however you will also hit the table below it in the process. What I mean with this analogy is that NACLs can not be used in a targeted way like the security group.
When you change a NACL it will hit every single instance within that subnet. Which is good I suppose if you want that kind of thing, but if you only have one affected instance it will obviously cause issues.
Those thoughts aside, how do you actually go about isolating your EC2 instances within a subnet using a NACL?
Well, if you are already using a NACL Simply add a DENY ALL rule to your NACL on both the inbound and outbound rules as the very first rule - rule number 1 - for all traffic (0.0.0.0/0)
If for some reason your NACL is already full, then you will have to delete an existing rule to make room for this new one. You should record the details of that rule so you can restore it sometime down the road.
If you are creating a new NACL, Add the rules as stated earlier, and just associate your new NACL with the subnet that hosts the EC2 instance that you wish to isolate.
We can now move up the network chain to the next logical level which is the route table. A route table lives / is connected to a subnet, just like the Network ACLs we just spoke about.
The route table helps the subnet direct traffic around your VPC. You will have one route table per VPC, and if you do not create one yourself, a default one will be applied.
Now if you have a public subnet, your route table is connected to an internet gateway which allows external communication outside of your VPC. Since we are concerned about isolation, this is probably an important thing to remove.
In order to provide isolation from the outside world, or anything else for that matter, we simply need to remove all routes within a route table ( these could be internet gateway, Direct connections, or VPN connections) Whatever it is we need to get rid of them.
Now you could go through and obliterate your beautiful route tables that are already set up the way you like them, or you could simply create a new route table (these come empty by default) and just associate it with your subnet that you wish to isolate.
This will stop all external subnet communications, isolating all of your EC2 instances within. However, be warned the instances will still be able to communicate with each other within that VPC.
As we have worked our way up the network chain, we have seen how each piece of the puzzle might be shut down or limited in order to isolate an affected EC2 instance. Now we have reached the top of the chain, the internet gateway.
So if all else fails, could we just remove the internet gateway from the VPC to stop all outside communication with our affected instance? No. No you can’t. AWS will not let you remove an internet gateway from your VPC if there are any EC2 dependencies within your VPC that require the internet gateway. You would have to first remove each and every dependency from the network (aka shut down every instance) in order to actually remove the internet gateway.
However if you would like the same effectiveness as removing the internet gateway, you would need to remove all the internet gateway routes from all your route tables like we just discussed in the previous section or attach a custom route table with no routes to all those subnets.
Overall this isn't a real answer to your problem, I would recommend using any of the previous ones we discussed, but I guess it's still technically available.
William Meadows is a passionately curious human currently living in the Bay Area in California. His career has included working with lasers, teaching teenagers how to code, and creating classes about cloud technology that are taught all over the world. His dedication to completing goals and helping others is what brings meaning to his life. In his free time, he enjoys reading Reddit, playing video games, and writing books.