Introduction to Operations
What happens once your software is actually running in production? Ensuring that it stays up-and-running is important. And depending on what the system does, and how much traffic it needs to handle, that may not be particularly easy.
There are systems that will allow developers to run their code and not need to think about it. Platforms as a service option like Google’s App Engine go a long way to reducing and, in some companies, removing operations. However, not every system can or will run on such platforms. Which means that having qualified operations engineers is an important thing.
The role of an operations engineer is continually evolving; which isn’t a surprise since changes in technology never slows down.
So, if the job falls on you to keep a system up-and-running, where do you start? What needs to happen? These are the questions this course aims to answer.
In this course, we take a look at some of tasks that operations engineers need to address. I use the term operations engineer as an umbrella, to cover a wide variety of job titles. Titles such as ops engineer, operations engineer, site reliability engineer, devops engineer, among others, all fall under this umbrella.
Regardless of the name of the title, the responsibilities involve keeping a system up-and-running, with little or no downtime. And that’s a tough thing to do because there are a lot of moving parts.
If you’re just starting out, and are interested in one of those roles, then the fundamentals in this course may be just what you need. These fundamentals will prepare you for more advanced courses on specific cloud providers and their certifications.
Topics such as high availability are often covered in advanced courses, however they tend to be specific to a cloud provider. So this course will help you to learn the basics without needing to know a specific cloud provider.
If this all sounds interesting, check it out! :)
By the end of this course, you'll be able to:
- Identify some of the aspects of being an ops engineer
- Define why availability is important to ops
- Define why scalability is important to ops
- Identify some of the security concerns
- Define why monitoring is important
- Define why practicing failure is important
This is a beginner level course for anyone that wants to learn. Though probably easier if you have either:
- Development experience
- Operations experience
What You'll Learn
|Lecture||What you'll learn|
|Intro||What will be covered in this course|
|Intro to Operational Concerns||What sort of things to operations engineers need to focus on?|
|Availability||What does availability mean in the context of a web application?|
|High Availability||How do we make systems more available than the underlying platform?|
|Scalability||What is scalability and why is it important?|
|Security||What security issues to ops engineers need to address?|
|Infrastructure as code||What is IaC and why is it important?|
|Monitoring||What things need to be monitored?|
|System Performance||Where are the bottlnecks?|
|Planning and Practicing Failure||How can you practice failure?|
|Summary||A review of the course|
Welcome back to Introduction to Operations. I'm Ben Lambert and I'll be your instructor for this lesson.
In this lesson we're going to talk about some of the security concerns that operations engineers find themselves having to deal with. Security is one of those topics that is quite broad and one that we'll likely cover its own future course. So this lesson will be a brief overview of a couple of security related issues.
If you've taken some of the other introductory Dev Ops course, Intro to Dev Ops, Intro to CI or Intro to CD, you've probably noticed that I mention security a lot and that's because it's an important topic that often gets ignored. Or if gets assigned to a team and they end up stuck in a silo working in isolation. Now while security engineers are a very important part of the team, they can't secure everything alone.
So, security needs to become everyone's job and because operations has to keep systems up and running, a lot of that burden falls on them. So what kind of security concerns do operations engineers find themselves dealing with? One, is preventing outages caused by distributed denial of service attacks. We talked about this a bit earlier. We'll describe some of the best practices for mitigating denial of service attacks.
It's worth noting that many of the cloud providers understand and know how to mitigate the distributed denial of service attacks. They've had to do that for their own services for years. So certain components are more resistant than others. Here's a diagram of system that's based on the AWS best practices for resilient systems.
Here we have cloud front to serve as edged locations, which means cached versions of our assets, the HTML, CSS, java script, et cetera can be served up to a user from the closest point to them. This spreads the traffic out around different end points. And depending on the type of content, a web application firewall may help to filter out some of the requests. Having a virtual private cloud is another way to help mitigate and attack because it allows you to have some components hidden behind a private network and prevent requests from going directly to those servers. Using a load balancer will allow you to distribute whatever load makes it through to an additional auto scaling group of servers.
The ELB will filter out TCP requests that are not properly formatted, mitigating certain types of attack. All total, the goal is to minimize the attack surface and further distribute the responses via edge caching. And finally, make sure your servers are able to scale out to handle the remaining load. Another issue that can become common when running web applications are common web attacks, such as sequel infection and cross-site scripting. The developers should be taking care to ensure that these issues don't pop up. However, sometimes they can. Maybe you're hosting a third party app or something else.
Either way having a web application firewall in front can help serve as an additional round of security. It can filter out a lot of the requests before they make it through to your application. So, web application firewalls are a great way to add another layer of protection. However, they are not a replacement for ensuring that your applications are secure. Another issue that ops engineers face is patch management.
If your servers are not up to date, then you risk ending up with servers that are vulnerable to exploits. According to the US Computer Emergency Readiness Team, as many as 85% of targeted attacks are preventable by keeping up on software patches. Keeping up on managing the patch levels for your servers and the software running on it can be difficult. However, it becomes a lot easier if you're using some sort of configuration management tool to report and manage that sort of thing. And that'll tie into infrastructure management, which is the topic of our next lesson.
So, if you're ready, let's get started.
Ben Lambert is a software engineer and was previously the lead author for DevOps and Microsoft Azure training content at Cloud Academy. His courses and learning paths covered Cloud Ecosystem technologies such as DC/OS, configuration management tools, and containers. As a software engineer, Ben’s experience includes building highly available web and mobile apps. When he’s not building software, he’s hiking, camping, or creating video games.