Whole Cloud Tests


Start course
1h 11m

As modern software expectations increase the burden on DevOps delivery processes, engineers working on AWS need increasingly reliable and rapid deployment techniques. This course aims to teach students how to design their deployment processes to best fit business needs.

Learn how to: 
- Work with immutable architecture patterns
- Reason about rolling deployments
- Design Canary deployments using AWS ELB, EC2, Route53, and AutoScaling
- Deploy arbitrarily complex systems on AWS using Blue-Green deployment
- Enable low-risk deployments with fast rollback using a number of techniques
- Run pre-and-post-deploy whole-cloud test suites to validate DevOps deployment and architectural logic

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.


Okay, so welcome back to the Cloud Academy Deployment on AWS course. In this lecture, we're going to be talking about whole-cloud tests, and we'll be looking at the analog to normal code. So if you were doing any kind of software engineering for any period of time, I'm sure you've heard of automated testing or test-driven development. This whole-cloud test is a perfect analog towards the normal code testing that we might do for different versions. We'll also talk about software-defined architecture and what that gives us, and why we're actually able to do whole-cloud tests now. We'll do a cloud deployment test demo. So we're going to get out of the slides really quickly here, and get into me actually showing you what one of these things might look like and what we can hope to achieve using whole-cloud tests.

So let's get into it. So you wouldn't deploy code without automated testing, would you? Now, I hesitate to ask this question because I've seen a lot of older companies that haven't undergone technical modernization efforts, or newer startups that are racing against the clock that don't automated testing their code. But this is a quality process, right? You have to do automated testing if you want high-confidence delivery of software.

We wouldn't want to deploy mission critical code without some sort of automated testing, so we want to make sure that that code is working before we're running it. Why can't we do this with deployment logic, or for our entire software stacks? If we place so much value on this automated testing of our code logic, that's great, but the majority of problems that we might have for delivering our software for doing, for instance, software as a service or an e-commerce store, won't be coming from our actual code. It'll be coming from our data center logic and our management of our infrastructure. We're more likely to have downtime due to mismanagement of infrastructure, so why don't we set up automated testing to validate that we're doing that correctly?

So realizing that our AWS system is just a bunch of API calls, so we no longer have any physical systems that we can access and we've abstracted away all of those things. So, given that we have all of our system modeled as a series of API calls, our system is just a bunch of code or it's just a bunch of calls against code, right? So we can model the entire system in software so our AWS system is just a bunch of software. We like to automatically test software, we should do that. That's why we call it software-defined architecture, because we can define all of the architecture using only software.

So let's see how we might actually do one of these tests. Okay, so right now, we are looking at a CloudFormation template that I wrote for this demonstration. All we're doing is we're creating a software stack that allows me to query a DynamoDB table by executing against a Lambda. So my Lambda is going to let me put objects into a DynamoDB database, and I'm going to connect my compute layer effectively to my database layer using one of these execution roles, using an IAM execution role. And I'll provide some outputs like the table name and the Lambda name.

So what my automated testing should do is validate that that PUT operation for my sample Lambda works, and that all of my permissions and connections work correctly. If we're trying to do that kind of validation, how might we do that? Well, let's look first at a shell script I have. This is on Linux here, or Unix-like system, so it'll look a little bit different if you wrote this in Windows. But effectively, all I'm doing is saying, "Create a bunch of errors if we have any problems." Get the information about where we are in the system, and run a CloudFormation deployment using the Create Stack call. Give it a stack name, select the template body as the example template that we were just looking at, and allow the IAM capabilities to create that execution role. Then run a wait function that will prevent the bash script from proceeding until a CloudFormation has a success message, then run some integration tests on it. So deploy the stack, wait for it to finish and for CloudFormation to tell us that it was successful, and then run our actual business logic tests using some test scripting.

So let's look at what that index.js and our test.js there looks like. Actually, all this is going to do is be a wait function so we're going to skip it, but we have a Lambda here that's just our same logic with a little bit of syntax highlighting so we can see what to expect. We can expect DynamoDB PUT objects to run, so let's look at our test real quickly. I actually wrote these tests manually, rather than using any particular testing framework because I didn't want to look like I was endorsing a particular test framework, but this should look relatively familiar if you've watched my CloudFormation course before. We're using the same script and everything And we have to realize that CloudFormation is the best mechanism for running through and doing tests. We're looking at it from a slightly different perspective here in that we want to run a whole-cloud test after we do a deploy. We just happen to be using CloudFormation for it. So effectively, I acquire some of my packages. I set up a test object that I can test against. I'm logging out to standard out some straightforward things to tell me what's going on. I describe the stacks and I try to find the stack for the one that I just deployed And if there's and error, then I error out with an exit code. If I did find the stack, then I go and find the output values and I use the Lambda name in the output value. So I need to figure out where that Lambda was created, since CloudFormation assigns its own names, and I run a request response-style request against that system So after I finish that invocation, I should be able to run a DynamoDB Get, and make sure that the table is actually connected and I didn't have any exceptions or anything, and make sure that it all exists. Then I should do cleanup on my test by just deleting the object back out.

So without further ado, we've seen this is how we might do a whole-cloud test. Just create a CloudFormation stack, wait for it to finish creating, and run some tests against it. I wrote a very simple test script that only runs one test. You could use a testing framework, but all we need to do here is look at our example usage here. I also have multiple Amazon accounts, so I need to make sure to set my profile.

Okay, so we got a stack ID back, and we can see that my creation is started because I ran my Create Stack call in the CLI. And while we're waiting, what's going on here is that I'm running an HTTP request against the described stack's API to see what the state is still. So as long as the stack is still in "Create in Progress" and hasn't switched over to a rollback or a "Delete in Progress", then we know that it's still working. This ping cycle that I'm doing is because the CLI does not wait for the stack to finish completing for it to return it on the bash command line. Returns as soon as the stack creation has initiated. So we have these transition states like, "Create in Progress" that we need to just wait for that to exit this transition state to make sure that the create finished correctly. Here, all I've done is set up a certain interval that I need to ping at to make sure that it's still "Create in Progress" and hasn't transitioned into a failure state.

So we can see that we exited successfully from our ping cycle and our test has begun running. It ran fairly quickly there. We exited with success from the ping cycle, and we got the "Create Complete" stabilization. We grabbed the Lambda and DynamoDB table names by doing another Describe Stacks call and checking the outputs here. So we saw from our template design that we should expect the table name and the Lambda name, which we used to run our tests against. We need to test those resources. We successfully got them back. We ran our Lambda invocation and we checked to see if our Dynamo object still existed there. It matched, and then we cleaned it out with a delete. Once we finish cleaning up, we just ran a full stack integration test.

So if we imagine, from our previous lectures, we had our blue-green deploy here. This is a very simple software system, but it actually is a functioning system that allows me to create, read, update and delete against the table here. I just validated on an entire stack, including the permissions and the entire deployment logic. I just validated that this whole thing will work totally automatically, which is pretty great. This is something fairly unique to Amazon. Just now, some other cloud providers are catching up with the ability to do these kind of highly orchestrated automation systems, but Amazon certainly makes it easiest for us. So whenever we're thinking about doing our deploys, we should be thinking about running full stack cloud tests if we can.

So to recap, this was the last lecture in the course. Throughout this course, we learned about the different primary ways that we can do deployments on code for individual servers, as well as entire sub-stacks or entire stacks for our blue-green deployments, and learned how to do whole-cloud tests to validate that our deployments work correctly. We learned a little bit about the tools that we have in place for doing DNS switches and for doing traffic rebalancing using Auto Scaling groups, Auto Scaling launch configurations, Elastic Load Balancer configurations, and Route 53 configurations.

So hopefully, you've learned a whole bunch of different techniques and you can reference back to these slides, and potentially use these graphics that I put together for you. Refer back to them and use them in the future as you're defining your deployment and DevOps processes for any new software initiatives that you perform. Thanks.

About the Author

Nothing gets me more excited than the AWS Cloud platform! Teaching cloud skills has become a passion of mine. I have been a software and AWS cloud consultant for several years. I hold all 5 possible AWS Certifications: Developer Associate, SysOps Administrator Associate, Solutions Architect Associate, Solutions Architect Professional, and DevOps Engineer Professional. I live in Austin, Texas, USA, and work as development lead at my consulting firm, Tuple Labs.