What is Kops? Kubernetes Operations with Kops

Today, we’ll be building on our recent coverage of the Kubernetes Ecosystem to talk more in depth about Kops. This post is a complement to our Kubernetes webinar earlier this year and follows previous posts that cover deploying applications with Helm and creating and maintaining Kubernetes clusters with Kops. Let’s begin by addressing a basic question: What is Kops?

What is Kops?

Kops is an official Kubernetes project for managing production-grade Kubernetes clusters. Kops is currently the best tool to deploy Kubernetes clusters to Amazon Web Services. The project describes itself as kubectl for clusters.

If you’re familiar with kubectl, then you’ll feel at home with Kops. It has commands for creating clusters, updating their settings, and applying changes. Kops uses declarative configuration, so it’s smart enough to know how to apply infrastructure changes to existing clusters. It also has support for cluster operational tasks like scaling up nodes or horizontally scaling the cluster. Kops automates a large part of operating Kubernetes on AWS.

Before moving on to examples, let’s look at its key features:

  • Deploy clusters to existing virtual private clouds (VPC) or create a new VPC from scratch
  • Supports public & private topologies
  • Provisions single or multiple master clusters
  • Configurable bastion machines for SSH access to individual cluster nodes
  • Built on a state-sync model for dry-runs and automatic idempotency
  • Direct infrastructure manipulation, or works with CloudFormation and Terraform
  • Rolling cluster updates
  • Supports heterogeneous clusters by creating multiple instance groups

Check out this short ASCII cast demo for more info.
Now, we’ll tackle a common scenario: Create a cluster and configure it for your use case.

Creating Your First Kubernetes Cluster on AWS

You’ll need to configure IAM permissions and an S3 bucket for the KOPS_STATE_STORE. The KOPS_STATE_STORE is the source of truth for all clusters managed by Kops. You’ll need appropriate IAM permissions so that Kops can make API calls on your behalf. I won’t cover that in this post, but you can follow the instructions here.

You’ll also need to configure DNS. Kops supports a variety of configurations. Each has its own setup instructions. AWS Route53 with an existing HostedZone is the easiest. We’ll assume that there is an existing AWS Route53 HostedZone for slashdeploy.com in these examples.
Kops clusters must be valid DNS names. Let’s create the demo.slashdeploy.com cluster. Kops will also create the DNS record for the Kubernetes API sever at api.demo.slashdeploy.com, and bastion.demo.slashdeploy.com. Keep in mind that DNS names may only be so long, so don’t use base cluster names that are too long. Everything starts with kops create. You can pass options directly to the command or write a cluster spec file. We’ll use the command line options for this exercise. Using a dedicate file is great for source control and other forms of configuration management. kops create accepts many options. We’ll start with the simplest case by only supplying the required options.

$ kops create cluster \
	--yes \
	--zones=eu-west-1a,eu-west-1b,eu-west-1c \
	demo.slashdeploy.com

There are two required values. --zones states the GCP zones / AWS regions where to create the infrastructure. Here, eu-west-1a, eu-west-1b, eu-west-1c are specified. This instructs Kops to create infrastructure in each eu-west-1 availability zone. This is important because Kops aims to create high availability production clusters. Multiple availability zones make the cluster more reliable by protecting against failures in one availability zone.

You must also specify the cluster name. --yes confirms operations that normally prompt for confirmation. kops create adds a kubectl configuration entry for the new cluster so you’re ready to use it right away. The command is async. It will trigger infrastructure creation, but will not block it completely. Luckily, Kops includes a command to validate a cluster. You can rerun this command until it succeeds.

$ kops validate demo.slashdeploy.com

When everything is complete, you should see something similar to the following:

$ kops validate cluster demo.slashdeploy.com
Validating cluster demo.slashdeploy.com
INSTANCE GROUPS
NAME                    ROLE    MACHINETYPE     MIN     MAX     SUBNETS
master-eu-west-1a       Master  m3.medium       1       1       eu-west-1a
master-eu-west-1b       Master  m3.medium       1       1       eu-west-1b
master-eu-west-1c       Master  m3.medium       1       1       eu-west-1c
nodes                   Node    t2.medium       2       2       eu-west-1a,eu-west-1b,eu-west-1c
NODE STATUS
NAME                                            ROLE    READY
ip-172-20-120-240.eu-west-1.compute.internal    master  True
ip-172-20-50-132.eu-west-1.compute.internal     master  True
ip-172-20-66-106.eu-west-1.compute.internal     master  True
ip-172-20-75-89.eu-west-1.compute.internal      node    True
Your cluster demo.slashdeploy.com is ready

Now, you’re ready to run any kubectl command such as kubectl get pods -n kube-system. The cluster is a bit strange because it has three masters and only a single worker. Let’s update the node instance group.

Modifying Cluster Infrastructure

Remember that kops behaves like kubectl. This means that you can kops edit to edit the configuration files in your editor. The next step is to run kops update. This applies configuration changes, but does not modify running infrastructure. kops rolling-update manages updating or recreating infrastructure.

This process applies to all sorts of configuration changes. First edit, then update, and finally  rolling-update. Let’s take this for a spin by editing the node instance group to increase the number of worker nodes.

$ kops edit instancegroup nodes

That will open a YAML file in your editor. You’ll see something similar to the following:

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2017-04-05T15:33:52Z"
  labels:
    kops.k8s.io/cluster: demo.slashdeploy.com
  name: nodes
spec:
  image: kope.io/k8s-1.5-debian-jessie-amd64-hvm-ebs-2017-01-09
  machineType: t2.medium
  maxSize: 1
  minSize: 1
  role: Node
  subnets:
  - eu-west-1a
  - eu-west-1b
  - eu-west-1c

All we need to do is replace minSize and maxSize with appropriate values. I’ll set both values to 3 and save the file. This writes the updated file back to the KOPS_STATE_STORE. Now, we need to update the cluster. Again, we’ll supply --yes to confirm the changes.

$ kops update cluster --yes
Using cluster from kubectl context: demo.slashdeploy.com
I0422 07:34:58.458492   26834 executor.go:91] Tasks: 0 done / 114 total; 35 can run
I0422 07:34:59.990241   26834 executor.go:91] Tasks: 35 done / 114 total; 26 can run
I0422 07:35:01.211466   26834 executor.go:91] Tasks: 61 done / 114 total; 36 can run
I0422 07:35:04.215344   26834 executor.go:91] Tasks: 97 done / 114 total; 10 can run
I0422 07:35:04.845173   26834 dnsname.go:107] AliasTarget for "api.demo.slashdeploy.com." is "api-demo-1201911436.eu-west-1.elb.amazonaws.com."
I0422 07:35:05.045363   26834 executor.go:91] Tasks: 107 done / 114 total; 7 can run
I0422 07:35:05.438759   26834 executor.go:91] Tasks: 114 done / 114 total; 0 can run
I0422 07:35:05.438811   26834 dns.go:140] Pre-creating DNS records
I0422 07:35:06.707548   26834 update_cluster.go:204] Exporting kubecfg for cluster
Wrote config for demo.slashdeploy.com to "/home/ubuntu/.kube/config"
Kops has set your kubectl context to demo.slashdeploy.com
Cluster changes have been applied to the cloud.
Changes may require instances to restart: kops rolling-update cluster

Finally, apply the rolling-update.

$ kops rolling-update cluster --yes
Using cluster from kubectl context: demo.slashdeploy.com
NAME                    STATUS  NEEDUPDATE      READY   MIN     MAX     NODES
bastions                Ready   0               1       1       1       0
master-eu-west-1a       Ready   0               1       1       1       1
master-eu-west-1b       Ready   0               1       1       1       1
master-eu-west-1c       Ready   0               1       1       1       1
nodes                   Ready   0               3       3       3       3
No rolling-update required

That’s a bit strange. Kops says that there is no rolling-update required. This is true because we only changed the minimum and maximum number of instances in the nodes auto scaling group. This does not require any changes to the existing infrastructure. AWS simply triggers creation of two instances.
Let’s make another change that requires changing infrastructure. Imagine that the existing t2.medium instances are not cutting it. We need to scale up to meet workload requirements. To do that, we need to change the instance type. The same edit, update, and rolling-update process applies here. Let’s upgrade to m4.large. Repeat the exercise and replace t2.medium with m4.large then apply the rolling-update. Now, Kops kills each node in order to trigger creation of an up-to-date node.

$ kops rolling-update cluster --yes
Using cluster from kubectl context: demo.slashdeploy.com
NAME                    STATUS          NEEDUPDATE      READY   MIN     MAX     NODES
bastions                Ready           0               1       1       1       0
master-eu-west-1a       Ready           0               1       1       1       1
master-eu-west-1b       Ready           0               1       1       1       1
master-eu-west-1c       Ready           0               1       1       1       1
nodes                   NeedsUpdate     3               0       3       3       3
I0422 07:42:31.615734     659 rollingupdate_cluster.go:281] Stopping instance "i-038cbac0aeaca24d4" in AWS ASG "nodes.demo.slashdeploy.com"
I0422 07:44:31.920426     659 rollingupdate_cluster.go:281] Stopping instance "i-046fe9866a3b51fe6" in AWS ASG "nodes.demo.slashdeploy.com"
I0422 07:46:33.539412     659 rollingupdate_cluster.go:281] Stopping instance "i-07f924becaa46d2ab" in AWS ASG "nodes.demo.slashdeploy.com"

Caution though! Current versions (<= 1.6) do not yet perform a real rolling update It just shuts down machines in sequence with a delay; there will be downtime Issue #37 We have implemented a new feature that does drain and validate nodes. This feature is experimental, and you can use the new feature by setting export KOPS_FEATURE_FLAGS="+DrainAndValidateRollingUpdate". This should be fixed in a future release.

This same process applies to infrastructure and configuration (such as kubelet flags or API server flags). The documentation covers specific cases:

As always, you can refer to the documentation for complete information.

Custom Cluster Infrastructures

Our example covered the most simple case, but this does not apply to all scenarios. Let’s walk through different options available to the kops create cluster.

$ kops create cluster --help
Creates a k8s cluster.
Usage:
  kops create cluster [flags]
Flags:
      --admin-access stringSlice             Restrict access to admin endpoints (SSH, HTTPS) to this CIDR.  If not set, access will not be restricted by IP. (default [0.0.0.0/0])
      --associate-public-ip                  Specify --associate-public-ip=[true|false] to enable/disable association of public IP for master ASG and nodes. Default is 'true'.
      --bastion                              Pass the --bastion flag to enable a bastion instance group. Only applies to private topology.
      --channel string                       Channel for default versions and configuration to use (default "stable")
      --cloud string                         Cloud provider to use - gce, aws
      --dns string                           DNS hosted zone to use: public|private. Default is 'public'. (default "Public")
      --dns-zone string                      DNS hosted zone to use (defaults to longest matching zone)
      --image string                         Image to use
      --kubernetes-version string            Version of kubernetes to run (defaults to version in channel)
      --master-count int32                   Set the number of masters.  Defaults to one master per master-zone
      --master-security-groups stringSlice   Add precreated additional security groups to masters.
      --master-size string                   Set instance size for masters
      --master-zones stringSlice             Zones in which to run masters (must be an odd number)
      --model string                         Models to apply (separate multiple models with commas) (default "config,proto,cloudup")
      --network-cidr string                  Set to override the default network CIDR
      --networking string                    Networking mode to use.  kubenet (default), classic, external, cni, kopeio-vxlan, weave, calico. (default "kubenet")
      --node-count int32                     Set the number of nodes
      --node-security-groups stringSlice     Add precreated additional security groups to nodes.
      --node-size string                     Set instance size for nodes
      --out string                           Path to write any local output
      --project string                       Project to use (must be set on GCE)
      --ssh-public-key string                SSH public key to use (default "~/.ssh/id_rsa.pub")
      --target string                        Target - direct, terraform (default "direct")
  -t, --topology string                      Controls network topology for the cluster. public|private. Default is 'public'. (default "public")
      --vpc string                           Set to use a shared VPC
      --yes                                  Specify --yes to immediately create the cluster
      --zones stringSlice                    Zones in which to run the cluster
  • --vpc and --newtork-cidr can be used when deploying to an existing AWS VPC.
  • --bastion generates a dedicated SSH jump host for SSH access to cluster instances. This is best used with --associate-public-ip=false.
  • --master-zones specifies all of the zones where masters run. This is key for HA setups.
  • --networking sets the default network. Note that your particular choice depends on your requirements and may work with the specified --topology.
  • --topology is the internal networking state. I prefer --bastion --topology=private --associate-public-ip=false --networking=weave to keep the clusters inaccessible on the public internet.

Next Steps

Kops is one the best tools that we have right now to manage Kubernetes clusters. Kops, like everything else in the Kubernetes ecosystem, is changing rapidly. The #kops channel on the Kubernetes Slack team is the best place to interact with other users. The people behind it are actively fixing bugs, introducing new features, and accepting proposals from the community. They also set aside an hour every other week to offer help and guidance to the community. They work with newcomers, help with PRs, and discuss new features, anything goes.

Add something to the agenda. They hold office hours (on Zoom video conferences) on Fridays at 5 p.m. UTC/9 a.m. US Pacific Time every other week, on odd weeks. I also recommend that you read through the issue tracker to get a feel for the known issues and more importantly the missing features.

Kops can do a lot, but it may not do everything for your use case, so be sure to do your research before diving in head first. One notable omission is the lack of pre/post install hooks for node configuration. This is required for things like pre-pulling images or installing software on nodes. This was recently fixed in a pull request, but there is no timeline for the next release at this point in time.

Kops, Kubernetes, containers, Docker and more are also discussed in the CloudAcademy 2017 office hours.
Stay tuned on the Cloud Academy blog for more Kubernetes!

Avatar

Written by

Adam Hawkins

Passionate traveler (currently in Bangalore, India), Trance addict, Devops, Continuous Deployment advocate. I lead the SRE team at Saltside where we manage ~400 containers in production. I also manage Slashdeploy.


Related Posts

Connie Benton
Connie Benton
— April 1, 2020

How To Build a Career with AWS Certifications

From Iaas and PaaS solutions to digital marketing, cloud computing reshapes the world of technology. As the influence of this technology grows, so does investment. Tens of billions of dollars are being spent on cloud computing-related services each year. This influx is continuing to inc...

Read more
  • AWS
  • Certifications
Vijayakumar Athithan
Vijayakumar Athithan
— March 27, 2020

What is Cognito in AWS?

Web applications usually allow a valid username and password combination for successful sign in to the application. Modern authentication flows incorporate more approaches to ensure user authentication. When using AWS, this is no exception, thanks to the abilities and features offered b...

Read more
  • AWS
  • AWS Cognito
  • Solutions Architect
Avatar
Andrew Larkin
— March 20, 2020

The 12 AWS Certifications: Which is Right for You and Your Team?

As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in cloud computing. As the market leader and most ma...

Read more
  • AWS
  • AWS Certifications
Alisha Reyes
Alisha Reyes
— March 17, 2020

Cloud Academy’s Blog Digest: How Do AWS Certifications Increase Your Employability, How to Become a Microsoft Certified Azure Data Engineer, and more

With everything going on right now, it's likely that the only thing you've been reading lately is related to the coronavirus pandemic. It's important to stay informed during these times, but it's also good to jump into something that can take your mind off of the current situation for j...

Read more
  • AWS
  • Azure
  • blog digest
  • Certifications
  • Cloud Academy
  • programming
  • Security
Avatar
Cloud Academy Team
— March 13, 2020

Which Certifications Should I Get?

As we mentioned in an earlier post, the old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and compan...

Read more
  • AWS
  • Azure
  • Certifications
  • Cloud Computing
  • Google Cloud Platform
Alisha Reyes
Alisha Reyes
— March 7, 2020

New on Cloud Academy: Intro to GitOps; AWS Courses; Java, Python, Amazon Linux 2, Ubuntu, & Docker Playgrounds; and much more

New Lab Playgrounds This month, our Content Team released six new "playground labs." Our playground labs provide a safe and secure sandbox environment for you to explore your own ideas, follow along with Cloud Academy courses, or answer your own questions — all without having to instal...

Read more
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Alisha Reyes
Alisha Reyes
— March 6, 2020

New on Cloud Academy: Intro to GitOps; AWS Courses; Java, Python, Amazon Linux 2, Ubuntu, & Docker Playgrounds; and much more

New Lab Playgrounds This month, our Content Team released six new "playground labs." Our playground labs provide a safe and secure sandbox environment for you to explore your own ideas, follow along with Cloud Academy courses, or answer your own questions — all without having to instal...

Read more
  • AWS
  • Azure
  • gitops
  • Google Cloud Platform
  • lab playground
  • programming
Patrick Navarro
Patrick Navarro
— March 4, 2020

AWS Certifications: How Do They Increase Your Employability and Progress Your Career?

AWS certifications are no walk in the park. They’re designed to validate in-depth, specialist knowledge and comprehensive experience, often requiring months of dedicated studying to earn even for those already working with the cloud platform. But the rewards that AWS professionals ca...

Read more
  • AWS
  • AWS certification
  • certification
Avatar
Chandan Patra
— February 21, 2020

Elasticsearch vs. CloudSearch: AWS Cloud Search Choices

Elasticsearch vs. CloudSearch: What's the main difference? Let's compare AWS-based cloud tools: Elasticsearch vs. CloudSearch. While both services use proven technologies, Elasticsearch is more popular, open source, and has a flexible API to use for customization; in comparison, CloudS...

Read more
  • AWS
  • Azure
  • cloudsearch
  • elasticsearch
Avatar
Andrew Larkin
— February 13, 2020

Cloud Academy Content Roadmap Updates

Welcome to our Q1 2020 roadmap. This is the content we plan to build over the next three months, between February 1 - and April 30, 2020. Let's look at some of our roadmap highlights. Atlassian Bamboo for CI/CD We had a lot of requests for practical guides on how to apply DevOps tool...

Read more
  • Artificial Intelligence
  • AWS
  • Azure
  • Docker
  • Google Cloud Platform
  • Kubernetes
  • Machine Learning
Alisha Reyes
Alisha Reyes
— February 7, 2020

New on Cloud Academy: Git Labs, CKA and CKAD Lab Challenges, AWS and Azure Learning Paths, AGILE, and Much More

We just kicked off our first Free Weekend of 2020. This means we've unlocked our Training Library for just 72 hours. Until Sunday at 11:59 pm (PST), you can get unlimited access to our industry-leading learning paths, courses, certification prep exams, and our most popular hands-on labs...

Read more
  • agile
  • AWS
  • Azure
  • Google Cloud Platform
  • Linux
  • OWASP
  • programming
  • red hat
  • scrum
Avatar
Stuart Scott
— February 6, 2020

How to Encrypt an EBS Volume

Keeping data and applications safe in the cloud is one of the most visible challenges facing cloud teams in 2020. Cloud storage services where data resides are frequently a target for hackers, not because the services are inherently weak but because they are often improperly configured....

Read more
  • AWS
  • EBS
  • Encryption