Kubernetes: Ecosystem & Production Operations

Kubernetes Ecosystem: if you have been following our introductory webinar series on Kubernetes, we recently wrapped up part two: Ecosystem & Production Operations, which you can watch here.
In part two, we covered production preparedness, application packaging and cluster ops, and the wider Kubernetes ecosystem and tools. In this recap, I’ll also expand on some of the areas covered in the live event.
Let’s start by digging into the Kubernetes features that are important in a production environment.
Learn more about Kubernetes Ecosystem

Kubernetes Ecosystem: production resources

  • DaemonSet: A DaemonSet is a pod that is automatically scheduled to every node in the cluster. This is especially useful for running monitoring or logging agents on each node, or a log collector like Fluentd. This is my preferred way to collect telemetry data.
  • StatefultSet: Prior to Kubernetes 1.5, this was referred to as PetSet. A StatefulSet is similar to a pod, except that it has ordering and other stateful guarantees. You can use a StatefulSet to run a database setup. Consider MongoDB. You can define a StatefulSet to bring up an arbiter, primary, and secondary in that order. However, I would recommend that you focus on stateless workloads for now. This area is actively developing and will naturally mature over time.
  • Ingress: As of Kubernetes 1.5, this is a beta API. You can think of an asingress sort of a proxy for pods. You may use it as a firewall, to handle virtual hosting, or even as a quasi API gateway. I recommend reading the Ingress guide. Ingress resources will be big going forward and will certainly change the way you deploy applications with Kubernetes!
  • Job & CronJob: What would an application be without some batch processing and recurring reports? A job creates one or more pods and tracks execution of the job through all of the containers. They can also be scheduled with the resourceCronJob.

Odds are, you’ll need one or two of these to build a full-featured application. Now, let’s take a look at production practices to keep everything in tip-top shape.

Production Practices

  • Set resource requests and limits: Setting these ensures that pods either get the required compute resources or fail to schedule. Establishing the limit ensures that containers do not consume unexpected compute resources.
  • Separate critical and non-critical workloads: This practice increases resource utilization. Non-critical containers set a limit and may be thrown anywhere in the cluster. Critical loads can be guaranteed their minimum compute resources and may be scheduled appropriately.
  • Node selectors and Node name: This point is a continuation of the previous two. These two settings inform the scheduler about node characteristics. For example, One entire CPU on a c4.xlarge  instance type is not the same as one on a t2.small. You can use these settings to place containers on an appropriate node. This is especially useful for heterogeneous clusters or when workloads are divided into critical/non-critical.
  • Set liveness and readiness probes: This is mandatory for production environments! Liveness probes can automatically restart broken containers. Readiness probes test that a container is ready to receive traffic. These probes ensure that your containers are working as defined by your probes, and not on generic container semantics.
  • Add telemetry: Remember, it’s not in production until it is being monitored, and you cannot have monitoring without telemetry data. Given the large variety of metric collection tools, it doesn’t matter which one you use, just pick one and go with it. It’s easy to deploy a DaemonSet for your chosen agent. Be sure to collect cluster CPU/memory percentage and the same for pods. Decide on a headroom and create an action plan for when the threshold is breached.
  • Prep for Cluster Admin: Read and understand the administration guide. It covers Kubernetes version upgrades, node maintenance, and managing API versions. I can assure you that you’ll need the first two at some point during production operations. It’s best to prepare before something happens so that you have a general idea of how it may impact your system.
  • Plan for Availability: You may consider a multi-master setup or even federated clusters. You will want to make sure that you have a solid understanding of your availability requirements, the potential risks, the failure modes that you can expect, and a plan for resolving issues.
  • Keep resource definitions in (YML/JSON) under source control: These are important files, and changes must be tracked. Ideally, they are kept in the same repo as the source code. Remember, these are just YAML/JSON files. They should be listed and verified during CI. You don’t want mistakes in these breaking your deployments!
  • Secure your cluster: The Kubernetes official documentation provides in-depth coverage of this topic. There are multiple ways to implement authorization. Choose one and set it up before going to production!
  • Back up etcd data: Kubernetes stores all data in etcd. Trust me; you don’t want to lose this.
  • Configure centralized logging: You can install or fluentd similar to collect logs from all containers and ship them off to something else. You can even run that system (ELK for example) on Kubernetes! Just make sure you have a strategy in place to collect logs from all nodes and containers in a central place. You will need this!

Kubernetes Ecosystem

There is a vibrant ecosystem around Kubernetes. Here, we’ll focus on tools for cluster infrastructure management and application packaging.

Cluster operations, or Cluster ops, generally refers to the work required to provision, maintain, and scale Kubernetes clusters. I’ll be honest, this is one my favorite technical areas, and I think you’ll like it too. Let’s start at the beginning. Clusters don’t just spring into existence. They must be created.
Kubernetes ecosystem is a distributed system in itself. It’s non-trivial to build from scratch. Kelsey Hightower’s tutorial “Kubernetes the Hard Way” covers everything you need to build and run K8S from scratch. This is a fabulous resource if you want to get really down and dirty and learn it all. Most of us, myself included, consider this a reference manual rather than a tutorial. Check it out. It’s detailed and long. This use case is better served with automation.
Kops (short for “Kubernetes Operations”) is as official as you can get for open-source Kubernetes tools. It is “the way” to bootstrap clusters on AWS. This is essentially Kubel  for clusters. If you want to bootstrap and manage a new cluster, this is the place to start. Kube-Up is a popular script for bootstrapping a new cluster that you will see referenced in documentation and old posts. However, it has depreciated over time and you’ll want to stick with kops going forward.
If you don’t want to run Kubernetes yourself, there are a variety of hosted solutions available. Google Cloud Platforms provides Google Container. It’s the easiest and most straightforward way to get Kubernetes in production. I recommend this option if you’re using GCP. I also recommend switching cloud providers just to use Google Container Engine. Tectonic is a Kubernetes solution from CoreOS. It wraps the official Kubernetes releases in a tight package from setup, scaling, and general administration. It also includes the kube-aws  CLI for managing Kubernetes clusters on AWS.
Kismatic from Apprenda is a useful suite of tools for provisioning, maintaining, and testing Kubernetes clusters. It includes kuberang This tool may be used to smoke test a Kubernetes cluster (especially useful if you’re following “Kubernetes the Hard Way”!). Again, this is a small sample. You’ll find plenty more by doing a Google search or following the ecosystem. KubeWeekly can send you the latest news, projects, tutorials, and other good stuff in their weekly newsletter.
There are also many projects that target end users (and not system administrators). Kubernetes’ package manager Helm is the most useful and important of these projects. You write packages called “charts,” and then you can use the helm  CLI to install/upgrade charts as “releases” on your cluster. Charts contain all of the resources required to run a particular application, including services and deployments.
Here are a couple of examples. You can use the MySQL chart to run mysql for application A, and deploy it again with a different release name for application B. Or, you can create a chart for an entire microservice application and deploy it that way. The official charts repository is one of the most active repositories in the Kubernetes organization. It’s also a fantastic resource for tips and tricks, such as learning how other users write various Kubernetes resources. The Helm docs list a bunch of related tools and tutorials to help you get started. Odds are, you’ll find a few blog posts about Helm in each KubeWeekly issue. Keep an eye on Helm, and try it out for yourself.

How to get involved

The ecosystem is a product of the wonderful Kubernetes community. Here are some other ways that you can get involved:
Stay connected. The community and many of those who maintain it are active on the Kubernetes Slack channel. This is the best place to be if you are even remotely interested in Kubernetes and the ecosystem.
Get involved. As with any rapidly changing technology, staying involved and talking with others is the best way to succeed. Kubernetes is no exception. Look into local meet-ups and conferences. You may also join the various planning meetings and weekly calls with Kubernetes developers. These are great forums to voice your opinion on project direction, collaborate on issues, and learn what other people are up to. You will learn something and you’ll help others in the process.
Kubernetes guides. The official Kubernetes guides are a fantastic supplement to the webinars in our two-part series. They provide more information on use cases and functionality and a great abstract overview of Kubernetes. I suggest you watch the previous Kubecon videos. These will help you get a handle on how people are using Kubernetes in the field, and you’ll learn about the cool stuff that the larger Kubernetes ecosystem is working on.

Coming soon: Helm and Kops

I hope you have enjoyed this webinar series. If you want a refresher on Kubernetes and its main features, check out part one, “Hands on Kubernetes,” by viewing the webinar and the recap.
We’ve received lots of positive feedback from these events, so I am planning a second two-part series specifically on Helm and Kops as a result. Stay tuned to the Cloud Academy webinars page for scheduling information. Until then, you can find me @ahawkins  on the Kubernetes Slack. Good luck out there, and happy shipping!

Written by

Adam Hawkins

Passionate traveler (currently in Bangalore, India), Trance addict, Devops, Continuous Deployment advocate. I lead the SRE team at Saltside where we manage ~400 containers in production. I also manage Slashdeploy.

Related Posts

— January 10, 2019

2018 Was a Big Year for Content at Cloud Academy

As Head of Content at Cloud Academy I work closely with our customers and my domain leads to prioritize quarterly content plans that will achieve the best outcomes for our customers.We started 2018 with two content objectives: To show customer teams how to use Cloud Services to solv...

Read more
  • Amazon Web Services
  • Cloud Computing
  • Google Cloud Platform
  • microsoft azure
— December 21, 2018

2019 Cloud Computing Predictions

2018 was a banner year in cloud computing, with Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP) all continuing to launch new and innovative services. We also saw growth among enterprises in the adoption of methodologies supporting the move toward cloud-native...

Read more
  • 2019 Predictions
  • Cloud Computing
Albert Qian
— August 28, 2018

Introducing Assessment Cycles

Today, cloud technology platforms and best practices around them move faster than ever, resulting in a paradigm shift for how organizations onboard and train their employees. While assessing employee skills on an annual basis might have sufficed a decade ago, the reality is that organiz...

Read more
  • Cloud Computing
  • Product Feature
  • Skill Profiles
— July 31, 2018

Cloud Skills: Transforming Your Teams with Technology and Data

How building Cloud Academy helped us understand the challenges of transforming large teams, and how data and planning can help with your cloud transformation.When we started Cloud Academy a few years ago, our founding team knew that cloud was going to be a revolution for the IT indu...

Read more
  • Cloud Computing
  • Skill Profiles
— June 26, 2018

Disadvantages of Cloud Computing

If you want to deliver digital services of any kind, you’ll need to compute resources including CPU, memory, storage, and network connectivity. Which resources you choose for your delivery, cloud-based or local, is up to you. But you’ll definitely want to do your homework first.Cloud ...

Read more
  • AWS
  • Azure
  • Cloud Computing
  • Google Cloud
Albert Qian
— May 23, 2018

Announcing Skill Profiles Beta

Now that you’ve decided to invest in the cloud, one of your chief concerns might be maximizing your investment. With little time to align resources with your vision, how do you objectively know the capabilities of your teams?By partnering with hundreds of enterprise organizations, we’...

Read more
  • Cloud Computing
  • Product Feature
  • Skill Profiles
— April 5, 2018

A New Paradigm for Cloud Training is Needed (and Other Insights We Can Democratize)

It’s no secret that cloud, its supporting technologies, and the capabilities it unlocks is disrupting IT. Whether you’re cloud-first, multi-cloud, or migrating workload by workload, every step up the ever-changing cloud capability curve depends on your people, your technology, and your ...

Read more
  • Cloud Computing
— March 29, 2018

What is Chaos Engineering? Failure Becomes Reliability

In the IT world, failure is inevitable. A server might go down, an app may fail, etc. Does your team know what to do during a major outage? Do you know what instances may cause a larger systems failure? Chaos engineering, or chaos as a service, will help you fail responsibly.It almo...

Read more
  • Cloud Computing
  • DevOps
— November 22, 2017

AWS re:Invent 2017: Themes and Tools Shaping Cloud Computing in 2018

As the sixth annual re:Invent approaches, it’s a good time to look back at how the industry has progressed over the past year. How have last year’s trends held up, and what new trends are on the horizon? Where is AWS investing with its products and services? How are enterprises respondi...

Read more
  • AWS
  • Cloud Adoption
  • Cloud Computing
  • reInvent17
— October 27, 2017

Cloud Academy at Cloud Expo Santa Clara, Oct 31 – Nov 2

71% of IT decision-makers believe that a lack of cloud expertise in their organizations has resulted in lost revenue.1 That’s why building a culture of cloud—and the common language and skills to support cloud-first—is so important for companies who want to stay ahead of the transfo...

Read more
  • Cloud Computing
  • Events
— October 24, 2017

Product News: Announcing Cloud Academy Exams, Improved Filtering, Navigation, and More

At Cloud Academy, we’re obsessed with creating value for the organizations who trust us as the single source for the learning, practice, and collaboration that enables a culture of cloud.Today, we’re excited to announce the general availability of several new features in our Content L...

Read more
  • Cloud Computing
— August 29, 2017

On ‘the public understanding of encryption’ Tweet by Paul Johnston

Some of the questions by journalists about encryption prove they don't get it. Politicians don't seem to get it either (most of them). In fact, outside technology, there are some ridiculous notions of what encryption means. Over and over again, the same rubbish around encryption gets re...

Read more
  • Cloud Computing