Availability

Beginner

10m

7,746

4.9/5

What happens once your software is actually running in production? Ensuring that it stays up-and-running is important. And depending on what the system does, and how much traffic it needs to handle, that may not be particularly easy.

There are systems that will allow developers to run their code and not need to think about it. Platforms as a service option like Google’s App Engine go a long way to reducing and, in some companies, removing operations. However, not every system can or will run on such platforms. Which means that having qualified operations engineers is an important thing.

The role of an operations engineer is continually evolving; which isn’t a surprise since changes in technology never slows down.

So, if the job falls on you to keep a system up-and-running, where do you start? What needs to happen? These are the questions this Lesson aims to answer.

In this Lesson, we take a look at some of tasks that operations engineers need to address. I use the term operations engineer as an umbrella, to cover a wide variety of job titles. Titles such as ops engineer, operations engineer, site reliability engineer, devops engineer, among others, all fall under this umbrella.

Regardless of the name of the title, the responsibilities involve keeping a system up-and-running, with little or no downtime. And that’s a tough thing to do because there are a lot of moving parts.

If you’re just starting out, and are interested in one of those roles, then the fundamentals in this Lesson may be just what you need. These fundamentals will prepare you for more advanced Lessons on specific cloud providers and their certifications.

Topics such as high availability are often covered in advanced Lessons, however they tend to be specific to a cloud provider. So this Lesson will help you to learn the basics without needing to know a specific cloud provider.

If this all sounds interesting, check it out! :)

Lesson Objectives

By the end of this Lesson, you'll be able to:

Identify some of the aspects of being an ops engineer
Define why availability is important to ops
Define why scalability is important to ops
Identify some of the security concerns
Define why monitoring is important
Define why practicing failure is important

Intended Audience

This is a beginner level Lesson for anyone that wants to learn. Though probably easier if you have either:

Development experience
Operations experience

Optional Pre-Requisites

What You'll Learn

Lecture	What you'll learn
Intro	What will be covered in this Lesson
Intro to Operational Concerns	What sort of things to operations engineers need to focus on?
Availability	What does availability mean in the context of a web application?
High Availability	How do we make systems more available than the underlying platform?
Scalability	What is scalability and why is it important?
Security	What security issues to ops engineers need to address?
Infrastructure as code	What is IaC and why is it important?
Monitoring	What things need to be monitored?
System Performance	Where are the bottlnecks?
Planning and Practicing Failure	How can you practice failure?
Summary	A review of the Lesson

About the Author

Ben Lambert, opens in a new tab

Software Engineer

Students

109,390

Labs

Courses

Learning paths

Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.

Covered Topics

Operations

Availability

Lesson Objectives

Intended Audience

Optional Pre-Requisites

What You'll Learn

SOLUTIONS

CERTIFICATIONS

TRAINING LIBRARY

RESOURCES

PAST EVENTS

COURSE INDEX