CloudAcademy
  1. Home
  2. Training Library
  3. Google Cloud Platform
  4. Courses
  5. Adding Resiliency to Google Cloud Container Engine Clusters

Resiliency Defined

Contents

keyboard_tab
Course Introduction
Adding Resiliency to Google Cloud Container Engine Clusters (GKE)
4
Course Summary
play-arrow
Start course
Overview
DifficultyIntermediate
Duration55m
Students149

Description

Course Description:

Resiliency should be a key component of any production system. GKE provides numerous features focused on adding resiliency to your deployed containerized applications and services, allowing them to efficiently and reliably serve your applications and services.

Intended audience:

This course is for developers or operations engineers looking to expand their knowledge of GKE beyond the basics and start deploying resilient production quality containerized applications and services.

Prerequisites:

Viewers should have a good working knowledge of creating GKE container clusters and deploying images to those clusters.

Learning objectives:

  • Understand the core concepts that make up resiliency.
  • Maintain availability when updating a cluster.
  • Create an existing cluster scalable.
  • Monitor running clusters and their nodes.

This Course Includes:

  • 60 minutes of high-definition video
  • Hands on demos

What You'll Learn:

  • Course Intro: What to expect from this course
  • Resiliency Defined: The key components of resiliency.
  • Cluster Management: In this lesson we’ll cover Rolling Updates, Resizing a Cluster, and Multi-zone Clusters.
  • Scalability: The topics covered in this lesson include, Load Balancing Traffic and Autoscaling a Cluster.
  • Container Operations: In this lesson we’ll demo Stackdriver montitoring and take a look at the Kubernetes dashboard.
  • Summary: A wrap-up and summary of what we’ve learned in this course.

Transcript

Hello, and welcome back. In this section we're goin' to talk about resiliency. What it is, and why it's important for our GKE deployments. Site Reliability Engineering, or SRE for short, is a discipline that incorporates aspects of software engineering, and applies that to operations whose goals are to create ultra scalable, and highly reliable software systems.

Defined by Ben Treynor, founder of Google's SRE team, this is what happens when a software engineer is tasked with what used to be called operations. So creating applications that are both resilient and scalable, is an essential part of any enterprise of our architecture. A well-designed application should be able to scale seamlessly as demand increases or decreases, and also be resilient enough to withstand the loss of one or more resources.

But first off I'd like to talk a little bit about scalability, which is the ability to match capacity to demand. And scalability is really inextricably linked to resiliency.

And when talking about the cloud, you will often hear the term elasticity. Elasticity is the ability to increase or decrease resources as needed to meet the current capacity needs of your application or services. So, example, a scalable web application is one that works well with one user or a million users, and gracefully handles peaks and dips in traffic automatically.

By adding and removing nodes only when needed, scalable apps only consume the resources necessary to meet demand. For an application to be resilient, it needs to be able to automatically replace instances that have failed, or become unavailable. In our diagram, we show a load balance container cluster, attached to replicated instances of cloud sequel.

Our cluster is configured for resiliency, and therefore availability. By distributing itself across regions as well as replicating across multiple nodes within the cluster itself. Now for cloud sequel, I'm showing two instances set up for read replication, where one would be the master and the other would be a read replica. Google cloud sequel provides the ability to replicate a master instance to one or more read replicas.

And what a read replica is is a copy of the master that reflects changes to that master instance in almost real time.

Okay, in the next section, we're gonna go on and we're goin' to talk about cluster management, the ability to update our clusters based on the needs of our applications and services.

About the Author

Students371
Courses2

Steve is a consulting technology leader for Slalom Atlanta, a Microsoft Regional Director, and a Google Certified Cloud Architect. His focus for the past 5+ years has been IT modernization and cloud adoption with implementations across Microsoft Azure, Google Cloud Platform, AWS, and numerous hybrid/private cloud platforms. Outside of work, Steve is an avid outdoorsman spending as much time as possible outside hiking, hunting, and fishing with his family of five.