Immutable Infrastructure


Immutable Infrastructure
1h 11m

As modern software expectations increase the burden on DevOps delivery processes, engineers working on AWS need increasingly reliable and rapid deployment techniques. This course aims to teach students how to design their deployment processes to best fit business needs.

Learn how to: 
- Work with immutable architecture patterns
- Reason about rolling deployments
- Design Canary deployments using AWS ELB, EC2, Route53, and AutoScaling
- Deploy arbitrarily complex systems on AWS using Blue-Green deployment
- Enable low-risk deployments with fast rollback using a number of techniques
- Run pre-and-post-deploy whole-cloud test suites to validate DevOps deployment and architectural logic

If you have thoughts or suggestions for this course, please contact Cloud Academy at


Hello and welcome back to Cloud Academy's course on Deployment on AWS. I'm your instructor Andrew Templeton, let's get started.

So today in this lecture we're going to talk about immutable infrastructure. So what does that mean? We're going to be looking at immutable and mutable infrastructures, that is, what are the differences between immutable and mutable and what exactly is mutable that we've dealt with before. We're going to talk about the problems associated with mutable infrastructure, mutable infrastructure being the normal way of doing things that you're probably used to. We're going to talk about why immutability is better, and why we like it. We're going to talk about what server immutability means so specifically with regards to EC2 how do we think about immutability versus mutability. And then because we're dealing with the cloud and we have have software defined architecture we can also look at stack immutability, or what happens when we go larger than a single server and we want to start doing deploys in a consistent manner. Well, we want stack immutability. Then I'll show you a very simple manual immutable deploy so you can see the generic process that needs to happen and that you need to script as a DevOps engineer trying to leverage immutable architecture.

So let's look at immutable infrastructure starting with mutable infrastructure. What is mutable infrastructure? Well, it's infrastructure that's changed after provisioning. So this means that after you deploy a piece of infrastructure, you're altering some aspect about it. Now that sounds a little funny but what it means is, for instance, if you deploy, let's pretend you're deploying a Ruby on Rails server. If you deploy a Ruby on Rails server and it's running V1 of your application, and then you turn the Ruby on Rails server part off and update in place to the latest version of the software by doing, for instance, a git-pull to pull it to V2 then, you are doing mutable infrastructure. You're changing that server that VM to Version 2 after it has already been launched and deployed. So you started on Version 1 with a server and then you did something to it and then that same server is Version 2. You mutated it so that's mutable infrastructure.

Now you might be asking, "Okay, well what is immutable infrastructure? What's the opposite if I'm not allowed to go in and alter code on machines to create new deployments?" Well, you could just do a deployment without changing after provisioning because we're in the public cloud and on Amazon Web Services.

So how might we actually do that? Well, let's take a peek really quickly at mutable infrastructure. Well, we can see that for mutable the infrastructure is old version update script same piece of infrastructure but different version. Just run it through one little update script, right? Okay, so we're used to that kind of thing, this is how you might do an immutable deployment on Amazon. That is, we have a couple extra steps here, and it looks different, but we still end up with a newer version even though it's on a new object. So you can take a peek here and see that on the left-hand side we have our old version that we provisioned somehow, then on the right side all we need to do is run through that same provisioning step on the right hand side, redirect the traffic to that newer version that we just completely redeployed, and then kill the old version as the third step. So it's a simple three-step process that changes it from mutable to immutable. Rather than running some script on the same piece of infrastructure, we want to create a totally new version, redirect traffic, and then kill the old one.

Okay, so why do we want to do that extra complexity of that three step process? Well, because mutability is poison left over from on-site. It doesn't work well at all for what we want to do in the cloud. So if we imagine this is what it might look like if we do a deployment onto a piece of infrastructure i.e.a server, and then we patch it all the way up to some new version, Version N or Version 4 or whatever. We have to create it with a script that deploys the original version, then run these patches that change the server or it's file system as we go newer and newer.

Okay, well, what happens if I do this. Right? I need to scale out or have a recovery event. Say for instance that my VN box up there in the top right, dies how do I replace it as quickly as possible and as consistently as possible so I know that I won't have problems? Well, that's actually a pretty tough question if we're looking at mutability, because mutability was designed with on-site data centers in mind that is, it doesn't leverage any of the benefits that we get from having easily provisioned VMs in the cloud. Because running through all these patches would be really, really slow, this doesn't meet our needs for the scaling out or the recovery scenarios. Nor does it even work necessarily. How are we sure that we can go directly to V4 or VN because we might not have even developed this script or tested this script to do a direct deployment to the newer version. We don't even know if V4 will even be the same after we do this deployment. So we don't like this at all.

So let's recap what we just learned about why we don't like mutability. Mutable doesn't scale out well because patch chains are too slow, and that would be running through the same set of actions that we did to deploy the original blocks. We can try skipping patch change, but directs to new versions are very hard to verify and engineer. That is, I need to have a script on hand that'll do a direct deployment to Version 4 if I want to do this, and what happens if we go backwards, right? And you think about doing a downgrade. A downgrade would be pretty painful perhaps if we have to roll back some sort of change that we didn't actually design the script to go from Version 3 to Version 2. What would happen if we had to go backwards? Most importantly in AWS Cloud, we don't need to reuse old resources. We don't have to stick ourselves to this constraint, we could just make new ones and make this whole process a lot simpler and cleaner.

Okay, so why would we do immutable to solve all these problems? We saw that the mutable version was a single script to do an upgrade, immutable was a three step process and it seemed more complex but really immutable is very simple. It is a single script where we provision a new version, redirect it and optionally destroy the old version at the end. So we start with a thing that we provisioned from Amazon, telling it to provision Version XYZ. We have traffic hitting it from other systems. Now if I want to deploy a new version, I just tell Amazon to deploy Version ABC onto more infrastructure. This could be a server or a stack. I then begin by redirecting traffic to the new ABC version, and then just destroy the old infrastructure.

So this process because we're in the cloud and we can deploy infrastructure very quickly may only take a minute or two or five but less than an hour, usually so we're not incurring very much cost, even though we're running two versions of the infrastructure for a brief period. And we've dramatically reduced the complexity operationally. So why do I say that we've reduced the complexity when this diagram looks more complicated than the direct patching? Because I could run this process in reverse. Because the switchover is just a traffic redirection, all I would need to do to roll back to Version XYZ if there's a problem is to deploy XYZ, redirect the traffic back to XYZ and then delete ABC. So I can do rollbacks, and if I want to do another scale out or a recovery all I have to do is add a Version ABC and direct some traffic to it since we are natively supporting a fresh provision of each system.

Okay, so what are our steps when we're doing an immutable server script? Well, we need to provision the new version, so in the case of servers, we'd be doing some version bootstrapping. That could be done through user data or AMIs or any other script that you want to do that runs on startup. Then we would redirect traffic to the fresh servers after testing. For instance, we could switch the ELB pointer or make the new server that's got the newer version join an Elastic Load Balancer that is serving traffic. And then start destroying the old server so that would be just a terminate instances call once we have enough new instances to support the traffic. So it's simple, it supports rollbacks because we can just switch the order in which we deploy the versions, and it's consistent in that every single deployment works exactly the same.

Now, here's the magic of this. That consistency works beyond anything that you could have possibly gotten on mutable. Say I rewrite all of my microservice from Ruby to JavaScript. Now in the old world writing that patch script to change from Ruby to JavaScript without any downtime, would probably be pretty painful. Whereas because I'm just doing a traffic redirection for immutable architecture or infrastructure all I would need to do is deploy the new version running JavaScipt and start sending traffic to it instead of the old Ruby boxes. That is I can make really big changes like that and never have to worry about updating my patch script or changing how it works. So we like this.

Now we can also because we're in the cloud and we can provision entire software-defined data centers or software-defined architecture, we can do this at the stack level, or the system component level. So we'll usually do this with CloudFormation but you can do it with any scripts that you want. So I can launch stack with some new version's template, that is, I can launch the Version ABC CloudFormation stack. I redirect the traffic to that fresh stack, after testing, and then I destroy the old stack. So the process is exactly the same for servers I just use a different kind of tooling to do the deployment.

Again, CloudFormation is the way of choice for most people, but you can do this with manual API inputs as well. Again, it's simple, it supports rollbacks and it's consistent, that is I could completely change the way that my new version of my stack is made up and nobody would notice or care because all I'm doing is a traffic redirection. So the external systems would not really know that anything changed necessarily. And my deploy scripts don't need to change no matter how large my changes become. So we like that as well.

So let's take a quick look at how I might do an immutable server deployment by hand so we can at least understand the steps associated with doing an immutable deployment; that is the create, redirect and destroy.

Okay, so looking at my very simple application here, all I have is a super simple app that deploys and says "Hello, world" effectively with the version number attached to it. So it's a single server and it's addressed by an elastic IP address right now, but it's one of the simplest possible deployments that I could do and demonstrate to you guys how this works. So remember our first step is that we need some kind of script to create a completely fresh version of this that we can begin redirecting traffic to.

Now I've picked Cloud formation as my scripting method of choice but you can actually do this in any way you want, and in fact, you don't even need to script it necessarily. You could go in and manually create the new infrastructure but we want automation, and CloudFormation is my favorite way to do that. So I'm going to call this the "Immutable Stack Number 2" since I have Number 1 running. And I'm going to say I need it to pull down Version 2 of the code. So I'm going to run through and acknowledge all of these things and hit go because that's what I need to do for CloudFormation, but again this could you running any kind of script you wanted. So I'm going to stop the recording here for a second and wait for the stack to finish since it's not exactly the most interesting process in the world.

Okay, so rather than bore you to death, we waited for that script to finish and now I have "Immutable 2" done. Now I need to verify that "Immutable 2" is working and do a redirect as the second step of my immutable deployment practice. So I can verify that this thing is working by going over to a URL for that system and we can see that oh wow, we're on Version 2 unlike Version 1 so this is the direct DNS address of the EC2 instance, it's got a public IP and a direct DNS address. But I want to, because I'm trying to deploy on my IP address to my users, I want to be able to have them see this the next time they reload the page. And I can't make them switch here.

So I need to do my redirect now that I've verified that this is working. In the case of EC2 all I need to do for this kind of setup is to reassign my elastic IP address. So let me do a reassociation here. What was it IF77 was my newer server and I can say it is a reassociation so it doesn't yell at me for trying to remove an address from an existing resource. Okay, so we've finished our reassociation here. I should, in my EC2 console here if I do a reload. Let me do that. Be able to see my IF77 here and I can see that it has the new elastic IP address. So hopefully, when I reload here, I should be able to see that we have this Hello Version 2 upgraded and fancier. So we know that we're now running off of this IP address as the new service. So, in reality, you would probably have a domain pointing to this so this would be or something, and you would start seeing the new version of the code once you do what I just did.

So next I need to go in, destroy the old infrastructure here, so I can do that in my case the destroy action is just to delete the stack. And that actually is the full process for me doing an immutable infrastructure deploy. So I didn't have to do very much for this to work. And I was fairly confident that it would work. Plus because I wasn't doing an in-place modification, I was able to verify that the system was working before I went and redirected traffic to the new system. So now we have this lovely Version 2 that's upgraded and fancier running and I had a very low-friction, low-risk and low-stress deployment. So that's why we like immutable architecture.

Okay, so now that we've had a chance to look at how we might do immutable infrastructure deploys on Amazon using a very simple script, let's take a look at two popular ways to do immutable infrastructure deploys in our next lecture, Rolling and Canary.

About the Author

Nothing gets me more excited than the AWS Cloud platform! Teaching cloud skills has become a passion of mine. I have been a software and AWS cloud consultant for several years. I hold all 5 possible AWS Certifications: Developer Associate, SysOps Administrator Associate, Solutions Architect Associate, Solutions Architect Professional, and DevOps Engineer Professional. I live in Austin, Texas, USA, and work as development lead at my consulting firm, Tuple Labs.