Auto Scaling
Start course
1h 48m

There are a lot of different options, across a variety of cloud platforms that are well suited for running specific workloads, such as web applications. Things such as Google App Engine, AWS Elastic Beanstalk, Azure App Services: Web Apps, among others.

However, there are still plenty of times where we need to set up our own infrastructure. And so cloud vendors offer IaaS (infrastructure as a service) options. Google provides us with Compute Engine which allows us to create virtual machines, custom images, snapshots, networks, auto-scalers and load balancers.

If we're going to create and implement an application on the Google Cloud Platform system operations, then understanding these services are going to help us to create highly available, highly scalable applications.

All the major cloud providers offer the ability to set up virtual machines, networks, auto-scalers, and load balancers. Where the Google Cloud is different is in the speed of creating and starting up virtual machine instances. As well as the massively scalable software-based, global load balancer; which doesn't require pre-warming. Google also offers per-minute billing for VM instances, after the first 10 minutes.

So Google has a lot to offer. And if you're looking to learn more about the Google Cloud systems operations, then this may be the course for you.

What exactly will we cover in this course?

Course Objectives: Google Cloud Platform system operations 

By the end of this course, you'll know:

How to use Compute Engine to create virtual machines
How to create disk snapshots
How to create images
How to create instance templates and groups
How to create networks
How to use the auto-scaler and load balancer
Intended Audience

This is an intermediate level course because it assumes:

You have at least a basic understanding of the cloud
You’re at least familiar with general IT concepts

What You'll Learn

Summary A review of the course

Lecture What you'll learn
Intro What will be covered in this course
Getting Started An introduction to the Google Cloud Platform
Networking How to create and secure Cloud Networks
Disks and Images An overview of disk types and images
Authorization and IAM How to authenticate and authorise users
Disk Snapshots How to use snapshots for point-in-time backups
Cloud Storage Overview A refresher on Cloud Storage
Instance Groups How to manage instances with managed and unmanaged groups
Cloud SQL Overview A quick primer on how to use Cloud SQL
Startup and Shutdown Scripts Using startup scripts to provision machines at boot time
Autoscaling How to automatically add and remove instances
Load Balancing How to balance traffic across instances
Putting It All Together A demo of how to use some of the services we've learned about



Welcome back. In this lesson we'll talk about auto scaling. We'll start with an overview and then we'll move on to talking about policies, and then we'll set up auto scaling.

So, what is auto scaling? Again, this is the case where the name is pretty descriptive. Auto scaling is the ability to add and remove instances automatically, as the workload demands, so that you don't need to do it manually. This can help manage cost because you don't need extra instances running when they're not really required. Also, it will make for a better application, because users or processes aren't waiting for the resources, or requests due to the servers being overtaxed. Autoscalers work with the managed instance groups to add and remove instances, which are based on an instance template. And it will add and remove them down to the minimum and up to the maximum as needed.

In order to determine when new instances should be created we need to set up a policy. There are three different auto scaling policies that we can use. Those are CPU utilization, HTTP requests per second, and Stackdriver metrics.

For the CPU option, we can scale out. If the average usage of the total virtual CPU cores in the instance group exceeds the threshold that we've set. As an example, we can set a value of 70%. And if the average CPU usage hits 70% or higher, the autoscaler is going to add an instance, assuming that we haven't already exceeded the maximum allowed instances for that group.

The HTTP load balancing serving capacity option will scale based on the requests per second per instance. This works because the loadbalancer allows us to specify the max request per second. And so we can tell the auto scaler to scale if the request go over a certain percentage of that.

If either of these CPU or HTTP options don't work, we can also use Stackdriver custom metrics, and this allows us to specify the metric and the target range. We won't be getting into Stackdriver because it needs to be its own course. However, it is a cross cloud monitoring tool that's integrated with many of the Google cloud platform services.

And if we need more in the way of options, we can use multiple metric options. And that allows us to add up to five policies based on the three above options. If you have more than one policy, the autoscaler will select the one that leaves the most amount of available servers.

Let's jump into the console, so we can actually create an autoscaler. Let's start off by editing the instance group we created in the previous lesson. And we'll edit it to allow auto scaling. So from the instance group page, we'll click on our group. And notice we have one server running. That's because we set it to use just one in our previous lesson. Let's click on the details tab, so we can see that auto scaling is disabled.

Alright, let's edit this group by clicking on the edit link at the top of the page. And we'll set auto scaling to on. And now, we can use the different auto scaling options that we talked about previously to determine how we're going to scale out. You can see the default is for CPU, and it's set to 60%. It has a default minimum of one instance, and a default max of 10. Looking at the HTTP option the form is identical.

Changing to metric base, you can see it's a bit different. We have two text boxes and a drop down. This is where we select something like read or write ops. Whatever metric you select, this has to be a metric that quantifies how busy an instance is and it needs to be one that will change by adding or removing instances. And then we need to set a target, which is our measurement in this example of CPU it could be percent of utilization. And then the target type, this defines how the autoscaler computes the data collected from instances, and the possible target types are gauge. This is where the autoscaler computes the average value of the data collected in the last couple of minutes. Delta per minute is where the autoscaler calculates the average rate of growth per minute, and compares that to the target utilization. And then we have Delta per second, and that's where the autoscaler calculates the average rate of growth per second, and compares that to the target utilization. So, it's more advanced, however, it's going to be useful if the other options don't work for your use case.

And then we have the multiple metrics option, which allows us to build up something more complex than any single metric alone. For example, we can say 50% CPU utilization and high disk IO. We're going to use CPU for this demo, and I'm going to set it to 75%. And while we're at it, let's change our min and max values. We'll set the minimum to two instances, and the maximum to three. And let's save this.

Okay, notice how it picked up on our new minimum, and it's adding another instance. So if the average CPU across the existing instances increases to 75%, then the autoscaler is going to add another server. And after a few minutes of the instances being below that threshold, they're going to start to be removed one at a time. Google determines the period of time to wait before removing machines and they try and ensure that these new instances stay around for as long as they need.

Alright, that's going to wrap up our lesson on auto scaling. In our next lesson we're going to cover load balancing. So, if you're ready to talk about load balancing, then let's get started with the next lesson.

About the Author
Learning Paths

Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.