As spending on the public cloud is increasing globally, companies are looking for ways to reduce cost and increase efficiency. Financial Operations, or FinOps, is similar to DevOps, which enables companies to accelerate technology delivery. FinOps is a new operating model that maximizes the value of an organization's cloud investment.
In this course, you are going to learn about FinOps Principles and how to build FinOps Teams, as well as the three phases of the FinOps Lifecycle. Specifically, you will learn how to apply FinOps processes and practices to reduce rates and avoid unnecessary cloud costs.
If you have any feedback on this course, please get in touch with us at support@cloudacademy.com.
Learning Objectives
- Understand what makes the cloud so powerful and why it is changing how businesses operate
- Understand what makes cloud challenging from a technology, management, and financial perspective
- Learn about the six FinOps Principles and how to build successful FinOps Teams
- Learn about FinOps capabilities and how to build a common language within your organization
- Learn about the anatomy of a cloud bill and how to take advantage of the Basic Cloud Equation
- Learn about the three phases of the FinOps Lifecycle and how to build successful processes and practices to reduce rates and avoid cost
Intended Audience
This course is for engineers, operations, and Finance people looking to understand how to improve efficiency and reduce cost in the cloud.
Prerequisites
to get the most out of this course, you should have a foundational understanding of cloud concepts, specifically how compute and storage are provisioned and billed in the cloud. Some familiarity with rate reduction and cost avoidance methods in the cloud would also be helpful but are not essential.
References
The FinOps Lifecycle section of this course references materials from:
The Anatomy of a Cloud Bill lecture references materials from:
- Cloud FinOps: Collaborative, Real-Time Cloud Financial Management, O'Reilly Media; 1st edition (January 7, 2020)
In the Operate Phase, we define the process, the workflow, and the responsibility for carrying out the choices we made during the Optimize Phase. The Operate Phase makes use of the FinOps Principles number 1: "Teams need to collaborate", number 2: "The business value of cloud drives decisions", and number 6: "Take advantage of the variable cost model of the cloud".
In the previous Optimize Phase, we set goals and measure their outcome, while in the Operate Phase we take action to achieve these goals. Specifically, I am going to teach you how to align teams to business goals, use Automation to scale execution, build Metrics-Driven Cost Optimization, and how to handle shared environments including containers.
Let's start with aligning teams to business goals. Any recommendations the FinOps team communicates to leadership and engineers will be in addition to what these teams are already doing. And because the FinOps team is not directly affecting employees' quarterly or annual performance evaluations, these recommendations will naturally be prioritized last. This means that engineers will occasionally work on cost avoidance with no substantial progress made.
To break this cycle the FinOps team needs to get executive support where cloud cost becomes a part of everyone's performance evaluation. And the way to get the executive support is to build a realistic business case of how much money can be saved over what time period and have Finance validate and support the business case.
Change is hard for everyone and the best way to facilitate a culture shift toward cost awareness is when executives mandate and reward positive behavior. However, mistakes can happen and I discourage punishing when something doesn't go according to plan. Instead, do an honest evaluation of what was not ideal so that everyone can learn from it.
Once the FinOps program has executive support, the FinOps team will need to align shared goals between engineers, Finance, and the business. We do that by building processes and assigning smart deliverables to people. Smart here stands for specific, measurable, achievable, relevant, and time-bound. For goals to be relevant the FinOps team will need to quantify the financial gain and put it in relation to the effort. For example, saving one thousand Dollars may not be economically viable if it takes multiple engineers several weeks to accomplish that.
Communication is key for finding the balance between Cost Avoidance and business impact. Specifically, the FinOps team will need to balance focus on cost versus focus on growth. For example, simply sorting opportunities from the highest to lowest financial gain empowers leadership and engineers to choose their own cut-off line. This will make teams go for the low-hanging fruit first, which will result in quick wins.
The FinOps team can take these early wins to build credibility and showcase successful optimization examples across the organization. Using a metric-driven approach by for example building a waste KPI and providing visibility to everyone will gamify Cost Avoidance across the organization. This will provide visibility instead of trying to force good behavior.
Some people are more competitive and will see this as a scoreboard they can influence. And it is also something with which engineers can identify and be proud of. However, some people are less competitive or will not be motivated by a top ten list. Instead of nagging or shaming the stragglers, the FinOps team will need to partner with the late adopters.
Simply having a conversation with the teams that are not performing well will bring visibility into their problems and motivations. For example, after approaching a team responsible for a small but still substantial portion of the company's budget, I learned that there were only two people in the team, this is why progress was slow.
By simply providing the late adopters with the data they needed and explaining where the largest opportunities were, and asking them how I can help, I was able to make substantial progress, again by just focusing on the low-hanging fruit and positive business outcomes.
Eventually, the low-hanging fruit will be exhausted and the company may still have substantial opportunities in aggregate. This means that each opportunity will be small but the sum of these opportunities will be worth going after. A simple way of determining when automation needs to be leveraged is when the money an engineer saves is less than the engineers get paid for their time.
At this stage, we want to build guardrails instead of gatekeepers. Think of self-service processes where engineers are allowed to build but get notified when something is not ideal. This promotes innovation compared to having to go through an approval process.
From a FinOps perspective, there are different types of automation, specifically policies that enforce guardrails and policies that eliminate waste. A guardrail policy will allow an engineer to start a service but send a notification that the workload is out of compliance. A waste elimination policy will periodically scan for inefficient workloads and terminate them.
The FinOps team needs to balance notification frequency based on urgency. For example, an immediate termination due to a security violation needs to inform the engineer right away, while most other notifications can be aggregated into a weekly summary email.
Automation is a key contributor during the Operate Phase because it handles a large number of small opportunities and it does so reliably and continuously. Waste will naturally happen, and it works better to clean it up periodically instead of relying on everyone not to be messy.
Next, let's look at Metrics-Driven Cost Optimization. The goal is to eventually move away from financial numbers and have the business drive decisions. Using data to drive decisions works much better than making assumptions. But the data we need to build metrics may need to be collected first.
Reach out to the business units and ask them how they measure success. Are they using any KPIs that the FinOps team can help automate? Think of KPIs like cost per revenue, cost per active customer, cost per item sold, cost per streaming hour, tax return, or concert ticket.
In addition to business KPIs, the FinOps team will need to showcase their success. KPIs that work well are cost per core hour, cost per memory hour, or cost per Dollar saved. And when we go down the engineering stack KPIs like cost per feature or microservice provide visibility into efficiency of code.
Once the data for the KPIs has been automated, providing reports with trending KPIs is the next step. Some trial and error may be required to find the right balance and tune the KPIs. For example, does the business want to see Gross or Net cloud cost, and what services are layered into the cost calculation. For example, does the business want to include licensing cost, managed services, or even headcount cost.
Let the stakeholders drive the decisions as they are the primary consumers of the data. The goal is to enable better decisions using all the data available to have better business outcomes.
Now let's switch gears a little and look at how we handle shared environments including containers. A shared environment is something that is used by multiple teams at the same time, like for example a container service that may host applications across the organization. All FinOps capabilities like attributing cost to owners, consistent reporting, guardrails and waste elimination, and KPI tracking need to be enabled for these shared environments as well.
The simplest shared environment is an account or project that hosts multiple different applications. Even if all taggable resources have been properly tagged, there will be untaggable resources that incur a cost which needs to be attributed to the owners.
The simplest way to deal with these untaggable resources is to apportion the cost to existing workloads based on the workload's total cost. The example I gave earlier is for two workloads one with 1,000 Dollars the other with 2,000 Dollars monthly spend and 300 Dollars of untaggable monthly cost. In this case, you can apportion 100 Dollars to the first and 200 Dollars to the second workload.
Over time the FinOps team may refine this apportioning approach when more detailed data becomes available. For example, when the untaggable resource is outbound network traffic and it is known that the second workload is responsible for 90 percent of all outbound traffic, the apportioning algorithm can be updated to be more accurate.
For container-based workloads we can tag the containers, also called pods, or use so-called namespaces to attribute cost to the owners. A namespace is a collection of applications and can span across different types of worker nodes. Engineers control which applications are allowed in which namespace.
A container service will have so-called master nodes that manage the containers and there will be network traffic between the master and worker nodes. The cost for these has to be apportioned as well. Talk to the engineers to find out which apportioning method makes the most sense.
The FinOps team will need to expand the reporting capabilities built in the Inform Phase to account for these shared environments. Likely additional cost attribution dimensions will need to be surfaced like namespace, project, application, team, stack and so on. Reports with too many options can be overwhelming in which case specialized reports for these shared environments make more sense.
Before we wrap up I want to give you step-by-step instructions for how to get started on your FinOps journey. First, you will need to figure out what needs to be done with regard to FinOps in your company. Reach out to your peers and managers and interview them about challenges and what an ideal state will look like. Build a wishlist as this will become your roadmap later on. Ask for feedback on how to prioritize this wish list as this will tell you what to work on first and what comes later.
Based on what you learn about the FinOps needs of your company, build a business case. Determine what needs to be done in three months, in 12 months, and in 2 years from now. Work with the stakeholders to quantify the financial opportunities and have Finance validate these. Once Finance is comfortable with the numbers, build a proposal for a roadmap. You will need scope, activities, milestones, and potential savings opportunities. Scope here means how much effort will it be. For example, can you do it by yourself or will you need help.
Once you have the first draft of the roadmap refine it by talking with your manager, their manager, and their peers. After you incorporate all the feedback ask the stakeholders to confirm that this makes sense to them before presenting it to executives. Having alignment will ensure that you get support from the stakeholders during the executive meeting.
Your goal is that executives will promote your roadmap and make it a mandatory part of the organizational goals. Eventually, these goals will trickle down and become part of everyone's performance evaluation. This ensures that FinOps is at the forefront of everyone's thinking and that the program will get the commitment to be successful.
Last not least don't think you are alone. There are many people within and outside your company that can help, be creative. Reach out to the cloud vendor team and ask if they can help train your engineers. Use brown bags and tech talks to raise awareness of how to use cloud efficiently. Schedule hackathons around cost and efficiency. Organize meetups where engineers from other companies can share their success stories. Use regular informal knowledge-sharing sessions with companies in your industry to learn about their best practices and where you can improve.
To summarize the FinOps Operate Phase: Collaboration with engineers, Finance, and the business is the key to successful execution of guardrails and waste elimination. To scale, we need to automate successful manual processes. Shared environments will require additional tooling which expands existing reporting. The ideal outcome is to use Metrics-Driven Cost Optimization and have the business drive decisions in the cloud.
Please note: this lecture references materials from:
Dieter Matzion is a member of Intuit’s Technology Finance team supporting the AWS cost optimization program.
Most recently, Dieter was part of Netflix’s AWS capacity team, where he helped develop Netflix’s rhythm and active management of AWS including cluster management and moving workloads to different instance families.
Prior to Netflix, Dieter spent two years at Google working on the Google Cloud offering focused on capacity planning and resource provisioning. At Google he developed demand-planning models and automation tools for capacity management.
Prior to that, Dieter spent seven years at PayPal in different roles ranging from managing databases, network operations, and batch operations, supporting all systems and processes for the corporate functions at a daily volume of $1.2B.
A native of Germany, Dieter has an M.S. in computer science. When not at work, he prioritizes spending time with family and enjoying the outdoors: hiking, camping, horseback riding, and cave exploration.