How to Become a DevOps Engineer

The DevOps Handbook introduces DevOps as a framework for improving the process for converting a business hypothesis into a technology-enabled service that delivers value to the customer. This process is called the value stream. Accelerate finds that applying DevOps principles of flow, feedback, and learning to the value stream results in more successful businesses and happier employees.

The so-called “DevOps Engineer” internalizes these three principles and their relation to the business and other members of the value stream. The DevOps Engineer’s goal is to improve multiple facets of the software development life cycle (SDLC) process using a mix of practices, tools, and technologies. Kelsey Hightower described DevOps Engineers as the “Special Forces” inside an organization.

The DevOps engineer encapsulates depth of knowledge and years of hands-on experience. [They’re] battle tested. This person blends the skills of the business analyst with the technical chops to build the solution—plus they know the business well, and can look at how any issue affects the entire company.

How does one become a member of these elite special forces — known as a DevOps Engineer?

Read the basics in this blog, use The DevOps Handbook as a guide to the practices of flow, feedback, and learning, and then take some DevOps courses to become a DevOps certified professional. Cloud Academy’s DevOps Engineer Learning Path is specifically designed to help you prepare for the AWS DevOps Engineer – Professional Certification. Cloud Academy’s DevOps Institute Certification Preparation Learning Path was developed in partnership with the DevOps Institute to provide you with a common understanding of DevOps goals, business value, vocabulary, concepts, and practices, preparing students to to sit the industry-recognized DevOps Institute Foundation Certification Exam.

AWS DevOps Engineer - Professional Certification Cloud Academy DevOps Institute Certification Exam Preparation |

Focus on velocity

Velocity is central to DevOps. The premises is simple: Businesses that ship software faster are more likely to succeed in the market place. Faster iterations mean businesses may quickly adapt to changing market conditions, faster business hypothesis validation, and faster recovery from outages. Thus, it is in the business’ best interest to accelerate their software delivery value stream.

Trunk-based development and continuous delivery are the best ways to accelerate software delivery. The DevOps Handbook sets a clear goal:

At the end of each development interval, we must have integrated, tested, working, and potentially shippable code, demonstrated in a production-like environment, created from trunk using a one-click process, and validated with automated tests.

Achieving this goal requires a mix of technical skills. The DevOps Engineer needs to wire up an automated deployment pipeline — which in of itself requires many different skills — to support the “one-click process” requirement. They also need the software engineering chops to properly test throughout the deployment pipeline.

Our DevOps Engineer must be able to deploy different types of applications to different infrastructure using infrastructure-as-code and configuration management. Common tools in this area are: Docker, Kubernetes, Packer, and Ansible, along with a cloud provider such as AWS or Azure.

It’s important to understand that every application is different, so there is no one size fits all approach to continuous integration/continuous delivery (CI/CD). Start by understanding the ideas and how various tools fit together in an automated deployment pipeline. Cloud Academy offers multiple resources on deployment pipelines. The CI/CD Tools & Services Learning Path covers the high-level concepts along with technical implementations. There’s also a similar Learning Path targeting CI/CD using only AWS services. The Terraform Learning Path is a great introduction to infrastructure-as-code as well.

The SDLC doesn’t end in production. It begins in production. Production is different than other environments in that the team must pay careful attention to operations.

A new take on telemetry

Telemetry is any form of data (such as time series metrics, alerts, or logs) used to understand the current operational state. The DevOps Engineer takes a holistic view on telemetry by focusing on its relevance to everyone in the value stream. Consider this quote from The DevOps Handbook:

Every member of our value stream will use telemetry in a variety of ways. For example, developers may temporarily create more telemetry in their application to better diagnose problems on their workstation, while Ops engineers may use telemetry to diagnose a production problem. In addition, Infosec and auditors may review the telemetry to confirm the effectiveness of a required control, and a product manager may use them to track business outcomes, feature usage, or conversion rates.

Our DevOps Engineer must internalize this and implement telemetry systems that support all members of the value stream. This requires the technical chops to work with a variety of time series data, alerting, and log ingress systems to create a unified telemetry solution for the entire value stream.

There are a plethora of tools in this space. Common setups mix open source tools and vendor products to create a unified telemetry solution. Prometheus is great tool for time series data and alerting. Graphana can visualize just about any type of data. The ELK Stack is a complete solution for time series data, log ingress, and visualization. More complex systems can benefit from ingress, transformation, and routing projects, such as FluentD and Riemann.

Telemetry data isn’t limited to applications or infrastructure. Auditing, compliance, and other info-sec data is equally important. Cloud Academy’s AWS Monitoring and Auditing Learning Path demonstrates how to leverage this type of telemetry.

Ultimately, the DevOps Engineer can wrangle together a telemetry system that supports all members of the value stream including the value stream itself. Accelerate provides four metrics for value stream performance: lead time, deploy frequency, mean-time-to-resolve, and change failure rate. The DevOps Engineer champions these metrics and uses them as input for improvement experiments.

DevOps has no end state

DevOps establishes feedback loops. First from development to production, then production back into development, and an outer feedback loop that drives improvement across the others. Here’s how The DevOps Handbook describes it:

The Third Way enables the creation of a generative, high-trust culture that supports a dynamic, disciplined, and scientific approach to experimentation and risk-taking, facilitating the creation of organizational learning, both from our successes and failures.

The DevOps Engineer understands that improvement is always possible through scientific experimentation and learning, and strives to spread and support this culture throughout the organization.

Hypothesis-driven development is a good example. The idea is simple: form a hypothesis and confirm it with data. This is evident in A/B testing small changes before committing to larger work. Deciding which features to ship can become a data-driven exercise instead of coming from gut feelings and hunches.

The DevOps Engineer also advocates for dedicated time for organization learning. This may come from attending conferences, conducting internal workshops, or running post-mortems. These exercises reinforce the idea that value stream participants care about the improvement in their daily work just as much (if not more) than the daily work itself.

This line of thinking birthed “chaos engineering” as a way to improve system reliability. The idea of purposely breaking production may have seemed like a crazy idea at the time, but now it has gained widespread acknowledgement as a useful technical practice. Organizations must experiment and take risks to learn and stay competitive. Ralph Loura, HP’s CIO, puts it wonderfully:

Internally, we described our goal as creating “buoys, not boundaries.” Instead of drawing hard boundaries that everyone has to stay within, we put buoys that indicate deep areas of the channel where you’re safe and supported. You can go past the buoys as long as you follow the organizational principles. After all, how are we ever going to see the next innovation that helps us win if we’re not exploring and testing at the edges?

Joining the “special forces”

Nailing down the specifics of what makes a DevOps Engineer is difficult. There’s a definite mix of technical skills backed by a strong understanding of value streams and software development philosophy. In fact, the technical skills aren’t worth much without understanding the principles of flow, feedback, and learning. However, if you can embody these principles and bring the technical skills along, then you’re likely to become a very valuable team member.

The DevOps Playbook provides an ideal start point for anyone looking to quickly absorb and get started using the fundamental practices of DevOps, AGILE and CI/CD. Then build up software engineering, infrastructure engineering, and configuration management skills with objective-driven learning. Cloud Academy has a library of DevOps Learning Paths comprised of courses, quizzes, hands-on labs, and exams, delivering the theory, technical knowledge, and hands-on practice to help you gain industry-leading DevOps certifications.

Cloud Academy DevOps Learning Paths

A good sample project is building a small application, creating a deployment pipeline with automated tests, and creating a telemetry system with deploy dashboards, alerts, and the four value stream metrics from Accelerate. Then experiment with different languages, frameworks, and infrastructure solutions. If your first take used AWS, then try Google Cloud Platform. Try Packer for a VM-based infrastructure instead of deploying containers to Kubernetes.

A good DevOps Engineer can work in multiple technical contexts, so don’t assume the stack will always be the same. Just don’t get lost in the tools. Remember the principles of flow, feedback, and learning as measured by lead time, deployment frequency, mean-time-to-resolve, and change failure rate. Let those guide you and the rest will follow.

 

Avatar

Written by

Adam Hawkins

Passionate traveler (currently in Bangalore, India), Trance addict, Devops, Continuous Deployment advocate. I lead the SRE team at Saltside where we manage ~400 containers in production. I also manage Slashdeploy.


Related Posts

Valery Calderón Briz
Valery Calderón Briz
— October 22, 2019

How to Go Serverless Like a Pro

So, no servers? Yeah, I checked and there are definitely no servers. Well...the cloud service providers do need servers to host and run the code, but we don’t have to worry about it. Which operating system to use, how and when to run the instances, the scalability, and all the arch...

Read more
  • AWS
  • Lambda
  • Serverless
Avatar
Stuart Scott
— October 16, 2019

AWS Security: Bastion Host, NAT instances and VPC Peering

Effective security requires close control over your data and resources. Bastion hosts, NAT instances, and VPC peering can help you secure your AWS infrastructure. Welcome to part four of my AWS Security overview. In part three, we looked at network security at the subnet level. This ti...

Read more
  • AWS
Avatar
Sudhi Seshachala
— October 9, 2019

Top 13 Amazon Virtual Private Cloud (VPC) Best Practices

Amazon Virtual Private Cloud (VPC) brings a host of advantages to the table, including static private IP addresses, Elastic Network Interfaces, secure bastion host setup, DHCP options, Advanced Network Access Control, predictable internal IP ranges, VPN connectivity, movement of interna...

Read more
  • AWS
  • best practices
  • VPC
Avatar
Stuart Scott
— October 2, 2019

Big Changes to the AWS Certification Exams

With AWS re:Invent 2019 just around the corner, we can expect some early announcements to trickle through with upcoming features and services. However, AWS has just announced some big changes to their certification exams. So what’s changing and what’s new? There is a brand NEW ...

Read more
  • AWS
  • Certifications
Alisha Reyes
Alisha Reyes
— October 1, 2019

New on Cloud Academy: ITIL® 4, Microsoft 365 Tenant, Jenkins, TOGAF® 9.1, and more

At Cloud Academy, we're always striving to make improvements to our training platform. Based on your feedback, we released some new features to help make it easier for you to continue studying. These new features allow you to: Remove content from “Continue Studying” section Disc...

Read more
  • AWS
  • Azure
  • Google Cloud Platform
  • ITIL® 4
  • Jenkins
  • Microsoft 365 Tenant
  • New content
  • Product Feature
  • Python programming
  • TOGAF® 9.1
Avatar
Stuart Scott
— September 27, 2019

AWS Security Groups: Instance Level Security

Instance security requires that you fully understand AWS security groups, along with patching responsibility, key pairs, and various tenancy options. As a precursor to this post, you should have a thorough understanding of the AWS Shared Responsibility Model before moving onto discussi...

Read more
  • AWS
  • instance security
  • Security
  • security groups
Avatar
Jeremy Cook
— September 17, 2019

Cloud Migration Risks & Benefits

If you’re like most businesses, you already have at least one workload running in the cloud. However, that doesn’t mean that cloud migration is right for everyone. While cloud environments are generally scalable, reliable, and highly available, those won’t be the only considerations dri...

Read more
  • AWS
  • Azure
  • Cloud Migration
Joe Nemer
Joe Nemer
— September 12, 2019

Real-Time Application Monitoring with Amazon Kinesis

Amazon Kinesis is a real-time data streaming service that makes it easy to collect, process, and analyze data so you can get quick insights and react as fast as possible to new information.  With Amazon Kinesis you can ingest real-time data such as application logs, website clickstre...

Read more
  • amazon kinesis
  • AWS
  • Stream Analytics
  • Streaming data
Joe Nemer
Joe Nemer
— September 6, 2019

Google Cloud Functions vs. AWS Lambda: The Fight for Serverless Cloud Domination

Serverless computing: What is it and why is it important? A quick background The general concept of serverless computing was introduced to the market by Amazon Web Services (AWS) around 2014 with the release of AWS Lambda. As we know, cloud computing has made it possible for users to ...

Read more
  • AWS
  • Azure
  • Google Cloud Platform
Joe Nemer
Joe Nemer
— September 3, 2019

Google Vision vs. Amazon Rekognition: A Vendor-Neutral Comparison

Google Cloud Vision and Amazon Rekognition offer a broad spectrum of solutions, some of which are comparable in terms of functional details, quality, performance, and costs. This post is a fact-based comparative analysis on Google Vision vs. Amazon Rekognition and will focus on the tech...

Read more
  • Amazon Rekognition
  • AWS
  • Google Cloud Platform
  • Google Vision
Alisha Reyes
Alisha Reyes
— August 30, 2019

New on Cloud Academy: CISSP, AWS, Azure, & DevOps Labs, Python for Beginners, and more…

As Hurricane Dorian intensifies, it looks like Floridians across the entire state might have to hunker down for another big one. If you've gone through a hurricane, you know that preparing for one is no joke. You'll need a survival kit with plenty of water, flashlights, batteries, and n...

Read more
  • AWS
  • Azure
  • Google Cloud Platform
  • New content
  • Product Feature
  • Python programming
Joe Nemer
Joe Nemer
— August 27, 2019

Amazon Route 53: Why You Should Consider DNS Migration

What Amazon Route 53 brings to the DNS table Amazon Route 53 is a highly available and scalable Domain Name System (DNS) service offered by AWS. It is named by the TCP or UDP port 53, which is where DNS server requests are addressed. Like any DNS service, Route 53 handles domain regist...

Read more
  • Amazon
  • AWS
  • Cloud Migration
  • DNS
  • Route 53