Today, we’ll be building on our recent coverage of the Kubernetes Ecosystem to talk more in depth about Kops. This post is a complement to our Kubernetes webinar earlier this year and follows previous posts that cover deploying applications with Helm and creating and maintaining Kubernetes clusters with Kops. Let’s begin by addressing a basic question: What is Kops?
What is Kops?
Kops is an official Kubernetes project for managing production-grade Kubernetes clusters. Kops is currently the best tool to deploy Kubernetes clusters to Amazon Web Services. The project describes itself as kubectl for clusters.
If you’re familiar with kubectl, then you’ll feel at home with Kops. It has commands for creating clusters, updating their settings, and applying changes. Kops uses declarative configuration, so it’s smart enough to know how to apply infrastructure changes to existing clusters. It also has support for cluster operational tasks like scaling up nodes or horizontally scaling the cluster. Kops automates a large part of operating Kubernetes on AWS.
Before moving on to examples, let’s look at its key features:
- Deploy clusters to existing virtual private clouds (VPC) or create a new VPC from scratch
- Supports public & private topologies
- Provisions single or multiple master clusters
- Configurable bastion machines for SSH access to individual cluster nodes
- Built on a state-sync model for dry-runs and automatic idempotency
- Direct infrastructure manipulation, or works with CloudFormation and Terraform
- Rolling cluster updates
- Supports heterogeneous clusters by creating multiple instance groups
Check out this short ASCII cast demo for more info.
Now, we’ll tackle a common scenario: Create a cluster and configure it for your use case.
Creating Your First Kubernetes Cluster on AWS
You’ll need to configure IAM permissions and an S3 bucket for the
KOPS_STATE_STORE. The KOPS_STATE_STORE is the source of truth for all clusters managed by Kops. You’ll need appropriate IAM permissions so that Kops can make API calls on your behalf. I won’t cover that in this post, but you can follow the instructions here.
You’ll also need to configure DNS. Kops supports a variety of configurations. Each has its own setup instructions. AWS Route53 with an existing HostedZone is the easiest. We’ll assume that there is an existing AWS Route53 HostedZone for
slashdeploy.com in these examples.
Kops clusters must be valid DNS names. Let’s create the
demo.slashdeploy.com cluster. Kops will also create the DNS record for the Kubernetes API sever at
bastion.demo.slashdeploy.com. Keep in mind that DNS names may only be so long, so don’t use base cluster names that are too long. Everything starts with
kops create. You can pass options directly to the command or write a cluster spec file. We’ll use the command line options for this exercise. Using a dedicate file is great for source control and other forms of configuration management.
kops create accepts many options. We’ll start with the simplest case by only supplying the required options.
$ kops create cluster \ --yes \ --zones=eu-west-1a,eu-west-1b,eu-west-1c \ demo.slashdeploy.com
There are two required values.
--zones states the GCP zones / AWS regions where to create the infrastructure. Here, eu-west-1a, eu-west-1b, eu-west-1c are specified. This instructs Kops to create infrastructure in each eu-west-1 availability zone. This is important because Kops aims to create high availability production clusters. Multiple availability zones make the cluster more reliable by protecting against failures in one availability zone.
You must also specify the cluster name.
--yes confirms operations that normally prompt for confirmation.
kops create adds a
kubectl configuration entry for the new cluster so you’re ready to use it right away. The command is async. It will trigger infrastructure creation, but will not block it completely. Luckily, Kops includes a command to validate a cluster. You can rerun this command until it succeeds.
$ kops validate demo.slashdeploy.com
When everything is complete, you should see something similar to the following:
$ kops validate cluster demo.slashdeploy.com Validating cluster demo.slashdeploy.com INSTANCE GROUPS NAME ROLE MACHINETYPE MIN MAX SUBNETS master-eu-west-1a Master m3.medium 1 1 eu-west-1a master-eu-west-1b Master m3.medium 1 1 eu-west-1b master-eu-west-1c Master m3.medium 1 1 eu-west-1c nodes Node t2.medium 2 2 eu-west-1a,eu-west-1b,eu-west-1c NODE STATUS NAME ROLE READY ip-172-20-120-240.eu-west-1.compute.internal master True ip-172-20-50-132.eu-west-1.compute.internal master True ip-172-20-66-106.eu-west-1.compute.internal master True ip-172-20-75-89.eu-west-1.compute.internal node True Your cluster demo.slashdeploy.com is ready
Now, you’re ready to run any
kubectl command such as
kubectl get pods -n kube-system. The cluster is a bit strange because it has three masters and only a single worker. Let’s update the node instance group.
Modifying Cluster Infrastructure
kops behaves like
kubectl. This means that you can
kops edit to edit the configuration files in your editor. The next step is to run
kops update. This applies configuration changes, but does not modify running infrastructure.
kops rolling-update manages updating or recreating infrastructure.
This process applies to all sorts of configuration changes. First
update, and finally
rolling-update. Let’s take this for a spin by editing the node instance group to increase the number of worker nodes.
$ kops edit instancegroup nodes
That will open a YAML file in your editor. You’ll see something similar to the following:
apiVersion: kops/v1alpha2 kind: InstanceGroup metadata: creationTimestamp: "2017-04-05T15:33:52Z" labels: kops.k8s.io/cluster: demo.slashdeploy.com name: nodes spec: image: kope.io/k8s-1.5-debian-jessie-amd64-hvm-ebs-2017-01-09 machineType: t2.medium maxSize: 1 minSize: 1 role: Node subnets: - eu-west-1a - eu-west-1b - eu-west-1c
All we need to do is replace
maxSize with appropriate values. I’ll set both values to
3 and save the file. This writes the updated file back to the
KOPS_STATE_STORE. Now, we need to
update the cluster. Again, we’ll supply
--yes to confirm the changes.
$ kops update cluster --yes Using cluster from kubectl context: demo.slashdeploy.com I0422 07:34:58.458492 26834 executor.go:91] Tasks: 0 done / 114 total; 35 can run I0422 07:34:59.990241 26834 executor.go:91] Tasks: 35 done / 114 total; 26 can run I0422 07:35:01.211466 26834 executor.go:91] Tasks: 61 done / 114 total; 36 can run I0422 07:35:04.215344 26834 executor.go:91] Tasks: 97 done / 114 total; 10 can run I0422 07:35:04.845173 26834 dnsname.go:107] AliasTarget for "api.demo.slashdeploy.com." is "api-demo-1201911436.eu-west-1.elb.amazonaws.com." I0422 07:35:05.045363 26834 executor.go:91] Tasks: 107 done / 114 total; 7 can run I0422 07:35:05.438759 26834 executor.go:91] Tasks: 114 done / 114 total; 0 can run I0422 07:35:05.438811 26834 dns.go:140] Pre-creating DNS records I0422 07:35:06.707548 26834 update_cluster.go:204] Exporting kubecfg for cluster Wrote config for demo.slashdeploy.com to "/home/ubuntu/.kube/config" Kops has set your kubectl context to demo.slashdeploy.com Cluster changes have been applied to the cloud. Changes may require instances to restart: kops rolling-update cluster
Finally, apply the
$ kops rolling-update cluster --yes Using cluster from kubectl context: demo.slashdeploy.com NAME STATUS NEEDUPDATE READY MIN MAX NODES bastions Ready 0 1 1 1 0 master-eu-west-1a Ready 0 1 1 1 1 master-eu-west-1b Ready 0 1 1 1 1 master-eu-west-1c Ready 0 1 1 1 1 nodes Ready 0 3 3 3 3 No rolling-update required
That’s a bit strange. Kops says that there is no rolling-update required. This is true because we only changed the minimum and maximum number of instances in the
nodes auto scaling group. This does not require any changes to the existing infrastructure. AWS simply triggers creation of two instances.
Let’s make another change that requires changing infrastructure. Imagine that the existing t2
.medium instances are not cutting it. We need to scale up to meet workload requirements. To do that, we need to change the instance type. The same
rolling-update process applies here. Let’s upgrade to
m4.large. Repeat the exercise and replace t2
m4.large then apply the
rolling-update. Now, Kops kills each node in order to trigger creation of an up-to-date node.
$ kops rolling-update cluster --yes Using cluster from kubectl context: demo.slashdeploy.com NAME STATUS NEEDUPDATE READY MIN MAX NODES bastions Ready 0 1 1 1 0 master-eu-west-1a Ready 0 1 1 1 1 master-eu-west-1b Ready 0 1 1 1 1 master-eu-west-1c Ready 0 1 1 1 1 nodes NeedsUpdate 3 0 3 3 3 I0422 07:42:31.615734 659 rollingupdate_cluster.go:281] Stopping instance "i-038cbac0aeaca24d4" in AWS ASG "nodes.demo.slashdeploy.com" I0422 07:44:31.920426 659 rollingupdate_cluster.go:281] Stopping instance "i-046fe9866a3b51fe6" in AWS ASG "nodes.demo.slashdeploy.com" I0422 07:46:33.539412 659 rollingupdate_cluster.go:281] Stopping instance "i-07f924becaa46d2ab" in AWS ASG "nodes.demo.slashdeploy.com"
Caution though! Current versions (<= 1.6) do not yet perform a real rolling update It just shuts down machines in sequence with a delay; there will be downtime Issue #37 We have implemented a new feature that does drain and validate nodes. This feature is experimental, and you can use the new feature by setting
export KOPS_FEATURE_FLAGS="+DrainAndValidateRollingUpdate". This should be fixed in a future release.
This same process applies to infrastructure and configuration (such as
kubelet flags or API server flags). The documentation covers specific cases:
- Changing root volume size
- Using spot instances
- Configuring kubelet flags
- Configuring api-server flags
As always, you can refer to the documentation for complete information.
Custom Cluster Infrastructures
Our example covered the most simple case, but this does not apply to all scenarios. Let’s walk through different options available to the
kops create cluster.
$ kops create cluster --help Creates a k8s cluster. Usage: kops create cluster [flags] Flags: --admin-access stringSlice Restrict access to admin endpoints (SSH, HTTPS) to this CIDR. If not set, access will not be restricted by IP. (default [0.0.0.0/0]) --associate-public-ip Specify --associate-public-ip=[true|false] to enable/disable association of public IP for master ASG and nodes. Default is 'true'. --bastion Pass the --bastion flag to enable a bastion instance group. Only applies to private topology. --channel string Channel for default versions and configuration to use (default "stable") --cloud string Cloud provider to use - gce, aws --dns string DNS hosted zone to use: public|private. Default is 'public'. (default "Public") --dns-zone string DNS hosted zone to use (defaults to longest matching zone) --image string Image to use --kubernetes-version string Version of kubernetes to run (defaults to version in channel) --master-count int32 Set the number of masters. Defaults to one master per master-zone --master-security-groups stringSlice Add precreated additional security groups to masters. --master-size string Set instance size for masters --master-zones stringSlice Zones in which to run masters (must be an odd number) --model string Models to apply (separate multiple models with commas) (default "config,proto,cloudup") --network-cidr string Set to override the default network CIDR --networking string Networking mode to use. kubenet (default), classic, external, cni, kopeio-vxlan, weave, calico. (default "kubenet") --node-count int32 Set the number of nodes --node-security-groups stringSlice Add precreated additional security groups to nodes. --node-size string Set instance size for nodes --out string Path to write any local output --project string Project to use (must be set on GCE) --ssh-public-key string SSH public key to use (default "~/.ssh/id_rsa.pub") --target string Target - direct, terraform (default "direct") -t, --topology string Controls network topology for the cluster. public|private. Default is 'public'. (default "public") --vpc string Set to use a shared VPC --yes Specify --yes to immediately create the cluster --zones stringSlice Zones in which to run the cluster
--newtork-cidrcan be used when deploying to an existing AWS VPC.
--bastiongenerates a dedicated SSH jump host for SSH access to cluster instances. This is best used with
--master-zonesspecifies all of the zones where masters run. This is key for HA setups.
--networkingsets the default network. Note that your particular choice depends on your requirements and may work with the specified
--topologyis the internal networking state. I prefer
--bastion --topology=private --associate-public-ip=false --networking=weaveto keep the clusters inaccessible on the public internet.
Kops is one the best tools that we have right now to manage Kubernetes clusters. Kops, like everything else in the Kubernetes ecosystem, is changing rapidly. The
#kops channel on the Kubernetes Slack team is the best place to interact with other users. The people behind it are actively fixing bugs, introducing new features, and accepting proposals from the community. They also set aside an hour every other week to offer help and guidance to the community. They work with newcomers, help with PRs, and discuss new features, anything goes.
Add something to the agenda. They hold office hours (on Zoom video conferences) on Fridays at 5 p.m. UTC/9 a.m. US Pacific Time every other week, on odd weeks. I also recommend that you read through the issue tracker to get a feel for the known issues and more importantly the missing features.
Kops can do a lot, but it may not do everything for your use case, so be sure to do your research before diving in head first. One notable omission is the lack of pre/post install hooks for node configuration. This is required for things like pre-pulling images or installing software on nodes. This was recently fixed in a pull request, but there is no timeline for the next release at this point in time.
Kops, Kubernetes, containers, Docker and more are also discussed in the CloudAcademy 2017 office hours.
Stay tuned on the Cloud Academy blog for more Kubernetes!
Real-Time Application Monitoring with Amazon Kinesis
Amazon Kinesis is a real-time data streaming service that makes it easy to collect, process, and analyze data so you can get quick insights and react as fast as possible to new information. With Amazon Kinesis you can ingest real-time data such as application logs, website clickstre...
Google Vision vs. Amazon Rekognition: A Vendor-Neutral Comparison
Google Cloud Vision and Amazon Rekognition offer a broad spectrum of solutions, some of which are comparable in terms of functional details, quality, performance, and costs. This post is a fact-based comparative analysis on Google Vision vs. Amazon Rekognition and will focus on the tech...
New on Cloud Academy: CISSP, AWS, Azure, & DevOps Labs, Python for Beginners, and more…
As Hurricane Dorian intensifies, it looks like Floridians across the entire state might have to hunker down for another big one. If you've gone through a hurricane, you know that preparing for one is no joke. You'll need a survival kit with plenty of water, flashlights, batteries, and n...
Amazon Route 53: Why You Should Consider DNS Migration
What Amazon Route 53 brings to the DNS table Amazon Route 53 is a highly available and scalable Domain Name System (DNS) service offered by AWS. It is named by the TCP or UDP port 53, which is where DNS server requests are addressed. Like any DNS service, Route 53 handles domain regist...
How to Unlock Complimentary Access to Cloud Academy
Are you looking to get trained or certified on AWS, Azure, Google Cloud Platform, DevOps, Cloud Security, Python, Java, or another technical skill? Then you'll want to mark your calendars for August 23, 2019. Starting Friday at 12:00 a.m. PDT (3:00 a.m. EDT), Cloud Academy is offering c...
What Exactly Is a Cloud Architect and How Do You Become One?
One of the buzzwords surrounding the cloud that I'm sure you've heard is "Cloud Architect." In this article, I will outline my understanding of what a cloud architect does and I'll analyze the skills and certifications necessary to become one. I will also list some of the types of jobs ...
Boto: Using Python to Automate AWS Services
Boto allows you to write scripts to automate things like starting AWS EC2 instances Boto is a Python package that provides programmatic connectivity to Amazon Web Services (AWS). AWS offers a range of services for dynamically scaling servers including the core compute service, Elastic...
Content Roadmap: AZ-500, ITIL 4, MS-100, Google Cloud Associate Engineer, and More
Last month, Cloud Academy joined forces with QA, the UK’s largest B2B skills provider, and it put us in an excellent position to solve a massive skills gap problem. As a result of this collaboration, you will see our training library grow with additions from QA’s massive catalog of 500+...
DevSecOps: How to Secure DevOps Environments
Security has been a friction point when discussing DevOps. This stems from the assumption that DevOps teams move too fast to handle security concerns. This makes sense if Information Security (InfoSec) is separate from the DevOps value stream, or if development velocity exceeds the band...
Test Your Cloud Knowledge on AWS, Azure, or Google Cloud Platform
Cloud skills are in demand | In today's digital era, employers are constantly seeking skilled professionals with working knowledge of AWS, Azure, and Google Cloud Platform. According to the 2019 Trends in Cloud Transformation report by 451 Research: Business and IT transformations re...
Disadvantages of Cloud Computing
If you want to deliver digital services of any kind, you’ll need to estimate all types of resources, not the least of which are CPU, memory, storage, and network connectivity. Which resources you choose for your delivery — cloud-based or local — is up to you. But you’ll definitely want...
Google Cloud vs AWS: A Comparison (or can they be compared?)
The "Google Cloud vs AWS" argument used to be a common discussion among our members, but is this still really a thing? You may already know that there are three major players in the public cloud platforms arena: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP)...