Adding Resiliency to Google Cloud Container Engine Clusters (GKE)
Resiliency should be a key component of any production system. GKE provides numerous features focused on adding resiliency to your deployed containerized applications and services on Google Cloud Platform, allowing them to efficiently and reliably serve your applications and services.
This course is for developers or operations engineers looking to expand their knowledge of GKE beyond the basics and start deploying resilient production quality containerized applications and services.
Viewers should have a good working knowledge of creating GKE container clusters and deploying images to those clusters.
- Understand the core concepts that make up resiliency.
- Maintain availability when updating a cluster.
- Make an existing cluster scalable.
- Monitor running clusters and their nodes.
This Course Includes
- 60 minutes of high-definition video
- Hands-on demos
What You'll Learn
- Course Intro: What to expect from this course
- Resiliency Defined: The key components of resiliency.
- Cluster Management: In this lesson, we’ll cover Rolling Updates, Resizing a Cluster, and Multi-zone Clusters.
- Scalability: The topics covered in this lesson include, Load Balancing Traffic and Autoscaling a Cluster.
- Container Operations: In this lesson, we’ll demo Stackdriver monitoring and take a look at the Kubernetes dashboard.
- Summary: A wrap-up and summary of what we’ve learned in this course.
Earlier, we defined scalability is the ability to match capacity to demand, we also stated that scalability is inextricably linked to resiliency. The example that we used was a scalable web application, that worked just as well with one user, as a million users, and it gracefully handles peaks and dips in traffic automatically, by adding and removing nodes, only when needed. And this allows it to consume only the resources necessary to meet demand.
So now we want to look at how this is possible within Container Engine. So the cluster autoscaler, enables users to automatically resize clusters, so that all scheduled pods have a place to run. If there are no resources in the cluster to schedule a recently created pod, then a new node is added.
On the other hand, if some node is underutilized, and all pods running on it can be easily moved elsewhere, then that node can be deleted. This feature allows users to only pay for resources that are actually needed, and to get new resources when demand actually increases. Now as you can see, from the bullets on the slide, there are caveats around using autoscaler, while it's still in beta. And I assume that most of these, such as the noted possibility of disruption, will either completely remedied, or greatly reduced, as the autoscaler is actually released. Now this is really one of the cooler pieces, and really most applicable pieces of functionality that GKE provides.
That being said, the demo on how to configure it, is a little underwhelming, given how simple Google has made it. You can do this both via the CLI, and as you see the command on the screen, where you can take an existing cluster, you can enable autoscaling on it, and, give it a minimum number of nodes, a max nodes, and a zone. So what we're going to look at is, in addition to being able to do this via CLI, we also are going to look at how to do this within the portal.
And it's important to note, that every change that you make to the cluster autoscaler, calls Kubernetes master to restart, which takes several minutes to complete. We're back over in the GCP portal, and we're going to look at how easy it is to enable the autoscaler for an existing cluster. And you can certainly do this as you're creating a cluster from scratch as well.
But we're going to go in, we'll take a look at our GKE resiliency cluster. So we're going to look at this cluster, and right now, we see the total size is two, so this is set for the node pool, this is just a normal cluster, we haven't done anything special to it, we just created it. But what we're going to do is, we're going to edit this cluster now, we're going to go down and take a look at autoscaling, which is obviously noted as being in beta. And currently is shows it as being off, and this is set on the node pool level.
So if we look down, and on this node pool, we want to say yes, we do want to autoscale. So it's as easy as turning autoscaling on, and setting the min and max size. So now we're going to take a look at, let's put this to 10, allow us to autoscale, we'll start at two, so we always have at least, at least two nodes. And then it will scale all the way up to 10, as necessary. So to commit that change, we're just going to click, save, and this will take a little bit to set. And what it's doing underneath the covers, is it's actually updating the VMs, the compute instances, and the compute instance groups, associated with our container cluster.
We've talked all about scalability, resiliency, having a ton of resources at our disposal, having those distributed. But there's another, another side to this coin, there's another part of the conversation. And that's load balancing. So if you're exposing an HTTP service hosted on Container Engine, HTTP load balancing is the recommended method for that functionality, and this will route traffic to the appropriate resources, based on both availability, and geographic proximity, depending on how you have your load balancer set up.
Using load balancing, regardless of the specific technologies, or implementation, allows you incoming traffic or load to be directed and spread across your various nodes. So they share the burden, and actually increase your scalability, and availability and resiliency.
Specifically, how we're going to implement this using Kubernetes and GKE, is something called an ingress, and an ingress is one of the main components of a Container Engine load balancing solution. It's a Kubernetes resource that encapsulates a collection of rules and configurations for routing your external HTTP traffic, to your internal services.
You can see in the image, this is what it's going to punch through, and allow public internet to access your services, specifically, in this case, your services running on Container Engine. This is achieved via a combination of the ingress resource, and Google Cloud load balancing. When you create an ingress in your cluster, Container Engine creates an HTTP load balancer, and configures it to route traffic to your application. Another note, while the Kubernetes ingress is technically a beta resource, the cloud load balancers that Container Engine provisions to implement the ingress, are production ready. And this is an example of how we configure a very basic ingress, matter of fact, I named it basic ingress, for an existing container cluster.
And with all other Kubernetes config, you'll see the boiler plate API version, kind and metadata fields. The ingress spec, has all of the information needed to configure a load balancer or proxy server. Most importantly, it contains a list of rules, matched against all incoming requests, and currently the ingress resource only supports HTTP rules.
So, what we've got here is, we've got our back end, which is a service port combination, as described in the services doc. So ingress traffic is typically sent directly to the end port, to the end point, excuse me, matching a back end. So here we have a service name, and a service port, so this is nginx on service port 80. It is a very simple command we use for the back door command line to actually implement this ingress, and that is just kubectl apply, and we give it the ingress yaml.
Now I've got a sample that we're going to look at in real time, that we're going to apply to one of our container clusters, that we have created in GCP. We're back over in the portal, and we're in our Container Engine section, of course. But now we want to look at the load balancing portion of this, so this is another tab, that has not been there long.
So we want to look at discovery and load balancing, and on this tab, we also see this is in beta. We're going to see the discovery and load balancing information for our container clusters. And in this case it's going to show the service type, and right now, we have both of our clusters, I exposed via load balancers, via Kubernetes. So we're going to change one of those to an ingress, and just so you know, we've got our end points mapped here, so this is a public API, if you're in this view, and you want to, you want to test your end point, you can just click on this, it's going to open up another browser.
And we're serving our service that give us hello world. So back to the ingress, what we want to do is we want to take this, and instead of having a load balancer, we want to apply that ingress. If we look over, look at our ingress, this is just what we saw on the slide, just a second ago, except, it has our service name, GKE resiliency, that's going to, to point to our, our existing container cluster. One of them, and the service port of 80.
So if you remember, it's extremely simple to, perform this operation. I'm going to cheat, I've got a script text, now I'm just going to copy that, the kubectl apply command, and we'll paste it in our terminal window. So now in our terminal window, we're just going to paste that command in, and this is going to read that ingress, and then immediately apply it to the appropriate container cluster.
So we see, if we look back at our yaml file, that is going to go the GKE resiliency. So if we execute that, we would get a created command, so the ingress is created now, and a lot of things are actually happening behind the scenes. So if we go back to the portal, we should be able to see some of those changes happening right now. So if we refresh, we should now see, an ingress being created, called basic ingress, which we just looked at. So this is in progress of being created, so, we can dig in and look at the information for that ingress object. So right now, I'm not going to get too much, this is just, just being created. But we'll see a little bit of metadata associated with that, while it's being created.
We can also look and see, in addition to this view, there's some other components within GCP, that are going to be created as part of this. They may not be created yet, but I can at least show you where they will be. So if we look over in our menu, we want to go down to networking. So if we look at network services, and we look at load balancing, this is part of the functionality that is going to be created, by GCP for us, as we're creating our ingress. And so here you'll see we know have, so GCP has crated for us, a network load balancer that is now associated with our container cluster.
And we can go in and look at the information for that, and it'll give us some information about the, about the ports, the pass and rules, as well as the instance group, that this is all mapped to. And this is just, this is a, this is a load balancer, that is the same as if we had created it ourselves, it just happened to get created behind the scenes, and is associated with the ingress object within Kubernetes, that exists within our container cluster.
When we go through all of this to create a load balancer, and we want to expose our web server, or our web services, or whatever, on a domain name, alright, we need an external IP address, for a static IP that does not change. And by default, contain engine is going to allocate an ephemeral external IP address, and this means, this will expose through the ingress, when we create this by default. And this is going to mean that, that ephemeral address is subject to change, right.
So for services or an application that you're planning to host for a long time, you really should use a static address that does not change, that you can have mapped to a domain name. So you have the option to either create a new static IP address, or you can take that ephemeral address that it has created by default, and just convert it over to a static address.
Of course, like everything within GCP, or just about everything, you can do this both via the command line, or the portal, right, so you can automate this as part of a build process. But let's take a look at how easy this is, just to flip this over in the portal. Probably the hardest part of this, is just finding where the external static IP is, and then changing it, so, we just need to find which one's associated with our container cluster, specifically with that ingress that we just created, for the load balancing, and then we just need to flip that over from ephemeral, to be static.
So let's take a look at where this is, so if we scroll all the, almost all the way down, we'll look under networking, and it's not under network services, it's actually under VPC networks, and we want to look at external IP addresses. So once here, we need to look for the one that is in use by our load balancer. So, we know the name, we could map this out if we wanted to, and get here a different way, but we'll see that, here's our basic ingress, that we just created, so we know that this in use by that, so all we want to do, and this is a breeze, we just want to take that type, and we're going to change it from ephemeral to static. We're going to give it a new name, my static, nope, all lowercase.
Now we've got all lowercase, my static, description is optional, don't have to worry about it. And once we click reserve, that's then going to give us a static IP address, that will not change, that we can count on to map to a domain address. So we have that address that was ephemeral, that would change periodically, depending on when GCP needed it to change, when to rotate it in. But now we've got that static, and it will never change on us. As an alternative to the multi-step process, that we just went through, we have the option to create our static IP address, at the same time that we create our ingress.
We can just include that as part of our configuration yaml, and then run that ingress up up high command, where we just apply the ingress yaml to our container cluster. And as part of the creation of that ingress, GCP will also create a named static IP address, associated with that. Once you run that, it will wait until the IP address of the application changes to use the reserved IP address of the resource that you specified. It may take a couple of minutes, to update the existing ingress resource, and reconfigure the load balancer and propagate the load balancer across the globe. But once this operation completes, the Container Engine releases the ephemeral IP address, previously associated with your application.
To actually demo migrating workloads, would be a little bit of a course in and of itself, or it would certainly be too long for this section. But I wanted to go head and mention this, because it's important to know that this is possible, and why you might want to do this. So there will be times when you will want to migrate your workloads from one node pool to another, that has different machine types. So remember that a node pool is a group of machines that all have the same configuration, including machine type, that a cluster can, and a cluster can contain, multiple node pools.
For instance, you may realize, that after running your application or services, for several months, that each node really needs a higher amount of memory, to handle in-memory processing. And in this case, the only difference between the node pools, is going to be that underlying machine type, where you need to switch it out to a higher memory machine type, to hire a different type of load, or different type of processing, that you didn't necessarily, know that was going to happen before.
And the way you would do this, is you would create a migrate to node pool, and then you essentially cycle through your nodes, using some command line operations, to get the nodes, and then migrate each one over to the new node pool. So you do this, again, like a rolling update, so you do not have any down time. And this a little different, because you're actually migrating to another node pool, opposed to a rolling update, within an existing node pool.
Okay, now for our next section, we're going to talk about container operations. And this is the section where we're going to look at monitoring, alerting and reacting to, issues that may occur within our container cluster.
About the Author
Steve is a consulting technology leader for Slalom Atlanta, a Microsoft Regional Director, and a Google Certified Cloud Architect. His focus for the past 5+ years has been IT modernization and cloud adoption with implementations across Microsoft Azure, Google Cloud Platform, AWS, and numerous hybrid/private cloud platforms. Outside of work, Steve is an avid outdoorsman spending as much time as possible outside hiking, hunting, and fishing with his family of five.