As businesses grow their spend in the public cloud to accelerate innovation and to build a competitive advantage, predicting cloud growth accurately short- to long-term becomes increasingly important for leadership. Finance and executives need to know available funds several years into the future to build their innovation roadmap.
In this course, you are going to learn about cloud forecasting and how to align forecasting models with the maturity of your FinOps / Cloud Financial Management practice. You will learn about the relevant terms and concepts as well as how to identify ownership and accountability. We will break down the challenge into addressable parts and walk you through solution approaches at each step.
Learning Objectives
- Understand what cloud forecasting is and why it's important
- Understand what challenges exist in cloud forecasting and how to address them
- Learn about the different ways you can forecast in the cloud
- Learn about what you can do to improve cloud forecasting
- Learn about the role forecasting plays in FinOps
Intended Audience
- This course is for FinOps / Cloud Financial Management and Finance people looking to understand how to improve cloud forecasting and how to increase forecast accuracy.
Prerequisites
A basic understanding of how the cloud works, including compute and storage services and their pricing models, as well as an understanding of the financial processes around forecasting, budgeting, procurement, and allocations.
So far we have learned the relevant terms and concepts needed and Identified ownership and accountability around cloud forecasting. Now let's break down the cloud forecasting challenge into addressable parts and go over its essential components.
Account and Project Structure.
Let's look at how we utilize the Account and Project Structure for cloud forecasting. One of the key components of cloud forecasting is attributing cloud cost to owners. Our first line of defense is to use dedicated accounts, or projects on the GCP side, for each business unit, cost center, and major engineering team. When doing so, we need to anticipate organizational changes, acquisitions, and divestitures. A divestiture is when a company sells off a portion of its business.
Organizational changes may move workloads between accounts or leave some accounts with shared owners. Acquisitions, and divestitures essentially add and remove accounts from the cloud forecast. Our forecasting methodology needs to incorporate these changes over time. Specifically we need the ability to add and remove accounts from the forecast during a forecast period. Additionally leadership and finance need to be made aware of these changes before interpreting forecast data to avoid followup questions.
Tagging and cost allocation.
Now let's look at tagging and cost allocation. Our second line of defense for attributing cost to owners is our tagging strategy. Workloads that are shared across multiple owners require tagging to support cost attribution. For containerized workloads we use labels or namespaces to achieve the same.
When building a tagging strategy less complexity but more consistency is more successful. Start with one or two mandatory cost tags that all workloads must implement to support cost attribution. Handle the remaining business logic externally like for example in your bill visualization or forecasting system.
When building our forecasting logic, we need to account for tags changing over time. For example when monolithic applications are being decomposed into smaller services, or when organizational changes result in tags being renamed. Our forecasting methodology needs to handle these changes in tags over time. Specifically we need the ability to map new tags to old tags to provide continuity in our forecasts. Leadership and finance need to be aware of these changes when reviewing forecast data.
Even in a best case scenario where everything taggable has been tagged, not all cloud resources support tagging. This means that untaggable costs, like for example network traffic, need to be apportioned to the workloads responsible for incurring their cost.
A simple first step is to evenly spread the untagged and untaggable cost across the workloads. However this can cause bad behavior. Think of an apartment complex using an even split across units to apportion utilities versus having individual meters for each unit.
A better way to apportion these costs is weighted by the workload's costs. For example if there are three workloads in a shared account that have a cost split of 60, 30, and 10 percent, we use these percentages to apportion untagged and untaggable cost. However this has the drawback that for example the 10 percent workload could be responsible for the majority of untagged and untaggable costs.
In an ideal world we would use additional telemetry from an observability platform, like a logging system, that captures the origin of data transfers, to provide additional insights to accurately apportion untagged and untaggable costs.
Some organizations may require additional tags to identify ownership and attribute cost back to teams, like for example cost center, vice president, business unit, department, or owner. Which of these tags your organization will need, depends on your tagging standard and your organizational structure.
Cost tags need to be surfaced as cost allocation tags in Amazon Web Services. Review differences in the master billing file for Azure and Google Cloud Platform. On the GCP side you can also use auto tagging of resources to have them identified in the billing file.
Tagging or labeling is the foundation of telling apart workloads in the cloud, identifying ownership, and attributing costs to teams. Depending on the maturity of the organization, tagging may be manual, use automated tag hygiene monitoring, or be part of a continuous integration and continuous delivery or CI/CD pipeline with tag-or-terminate policies in place.
Communication and Accountability.
Now let's look at the role of communication and accountability in cloud forecasting. I explain in my course titled "Cloud Financial Management — Beyond Just Optimization", collaboration is the engine of the practice of FinOps.
FinOps enables communication between executives, finance, business, and engineers. FinOps practitioners need to strive to build a culture of communication to enable fast and high quality decision making.
A common challenge in cloud forecasting related to communication is that the people working on a forecast are not being included in decisions that substantially impact the forecast. For example this includes project scope changes that also affect cloud spend.
Everything we do within the FinOps practice needs to fulfill a business purpose. The downstream effect of activities needs to be well understood by everyone. For example when an engineer provides a forecast estimate, the engineer needs to understand that their budget will be impacted if they don't let us know when the estimate changes.
A good method for holding leadership and engineers accountable for their cloud spend is to have regular actuals versus forecast reviews and identify substantial over or under spend. A common cause for being under target is being short on staff. And a common cause for being over target is having started a project earlier than planned or experiments that engineers forgot to turn off.
A more automated way of holding leadership and engineers accountable is to set up budget alerts in AWS or spending quotas in Azure. This requires that leadership and engineers are aware of these budget alerts and have a process for managing these.
Forecast Frequency, Models, and Accuracy.
Now let's look at cloud forecast frequency, models, and accuracy. Finance will have specific requirements of when forecasts are due and how frequently forecast updates are needed. Most common is an annual forecast that is due close to the end of the fiscal year.
Intermediate forecasts may be necessary to update budgets based on business drivers. Reach out to your finance team and ask them about their existing processes and work with finance on how FinOps can improve the forecasting process.
Next let's look at cloud forecasting models. Depending on the maturity of an organization, specific prediction models will be easier to implement. For example trend based forecasting produces good results in the early maturity phases while driver based forecasting produces more accurate results but relies on a more mature process.
In addition Finance will have requirements around forecast granularity that are driven by the fiscal reporting needs. Examples here are a cost of goods sold or COGS versus operating expense or OPEX breakdown, or a breakdown by specific business entities. You will need to work closely with your Finance team to learn about their specific needs.
Next let's look at cloud forecast accuracy. The forecast accuracy depends on the quality of the data used and the models applied to it. Reach out to Finance to learn which forecast variance is within the acceptable limits. You will also need to work with Finance on requirements for layering in discounts, cost avoidance, and savings instruments like savings plans, reservations, and committed use discounts on the GCP side.
When comparing forecasts to actuals you will be able to identify workloads that are performing substantially over or under their targets. You will need to reach out to these workload owners to identify root causes and how to mitigate them.
When using driver based forecasting it will be important to understand why workloads scale differently from their drivers. A common cause for this is that the driver is not a good match for the workload growth. This may require short term adjustments or building additional drivers that are a better match in the long term.
Cloud spend materiality defines where the organization focuses their efforts. The business needs to drive cloud forecasting accuracy. A lack of accuracy may not be addressed until it becomes a larger problem that has executive attention and sponsorship.
Showback & Chargeback.
Now let's look at the showback and chargeback process in cloud forecasting. To be able to take ownership of their cloud workloads, leadership and engineers need to understand their total cost of ownership and what activities drive changes in cost. This means that cost visibility needs to be accessible, timely, and accurate but also - that cost attribution is accurate and fair.
Shared costs need to be allocated back to the business units and engineering teams. Examples for shared costs are: enterprise discounts, enterprise support, 3rd party licensing, shared services like logging and security, and shared resources like container orchestration and data transfer charges.
Engineering leaders will support the apportioning of untagged or untaggable costs if the apportioning algorithm is well understood and socialized across the organization.
KPI Driven Decisions.
Now let's look at KPI driven decisions in the cloud. In an ideal world unit economics drives business decisions in the cloud. Key performance indicators or KPIs may differ between cloud providers and between workloads within a cloud provider. Using unit economics will expose Total Cost of Ownership or TCO advantages of technologies between cloud vendors.
Engineers and leadership need actionable insights to take ownership and make targeted adjustments to their cloud workloads. This requires accurate attribution of ownership and identifying which cloud usage needs to be adjusted in which way. Unit economics across cloud providers need to be available in an aggregate or rollup view for long-term tracking and allow drill-downs into specific teams, workloads, and cloud products for troubleshooting.
How to estimate cloud cost for future workloads.
Now let's look at how to estimate cloud cost for future workloads. New workloads that do not exist in the cloud yet, or new features of existing workloads that have a substantial impact on cost, require manual estimation of these new costs. Examples are high availability and disaster recovery or new persistence models, such as databases, being added.
All major cloud providers offer web-based so called cost calculators that allow new workloads to be modeled and provide a reasonably high quality cost estimate. The people most suited for building these models are the engineers that are going to launch the new workloads as they have in-depth subject matter knowledge.
The challenge is that the engineer may not have a perfect view of how the actual cloud workload will look once it is launched. Common mistakes are to forget to model a specific aspect of the workload like data transfer, or to overprovision compute resources as utilization in the cloud is not yet known.
An iterative approach is recommended where the engineer revises the initial model and shares the updated estimates with the forecasting team so they can update the numbers in the forecast and layer in the new estimate.
Rate Reduction.
Now let's look at how Rate Reduction affects cloud forecasting. As I explain in my course titled "Cloud Financial Management — Beyond Just Optimization", the basic cloud equation is Usage times Rate equals Cost. The usage can be a duration in time or a quantity of usage while the rate is a financial number in the local currency.
The fourth FinOps capability is "Rate Reduction". Rate Reduction refers to anything that reduces cost from public or list pricing like for example Enterprise Agreements, Private Pricing Agreements, and savings instruments. When building cloud forecasts we need to layer in the various rate deduction effects to produce more accurate forecasts.
Enterprise Agreements and Private Pricing Agreements typically provide either a flat discount rate for specific cloud products, or a tiered discount rate based on usage. We will need to estimate the effects of these discounts when building cloud forecasts.
Enterprise Agreements and Private Pricing Agreements will need to be renewed over time. Future agreements may have slightly different discount structures which we will need to model and layer into our cloud forecasts.
Savings instruments are things like reserved instances in AWS, savings plans also in AWS, sustained use discounts in GCP, committed use discounts also in GCP, and reservations in Azure. Depending on the growth of cloud workloads, we need to estimate future savings instrument purchases and layer in the associated discounts when building cloud forecasts.
A simple first step for layering in rate reduction is to calculate historic rate reduction effects on cloud cost and apply these ratios to our cloud forecast. A more advanced approach is to model individual rate reduction initiatives and their effects on cloud cost.
Cost Avoidance.
Now let's look at how cost avoidance affects cloud forecasting. The fifth FinOps capability is "Cost Avoidance". Cost Avoidance refers to any effort that reduces usage like right-sizing, waste reduction, cloud parking, auto scaling, and re-architecting workloads to be more cloud native to take advantage of containers or serverless technologies.
Flexera's 2021 state of the cloud report estimates 30 percent of cloud being either unused or underutilized, which makes cost avoidance our second largest cost savings lever. When forecasting cloud cost we need to model the effects of future cost avoidance efforts and layer these into the forecast to produce accurate results.
A simple first step for layering in cost avoidance is to estimate the effects of historic cost avoidance initiatives and apply these ratios to our cloud forecast. A more advanced approach is to model specific future cost avoidance initiatives and their effects on cloud cost.
Training and improving maturity.
Now let's look at how training can improve maturity and effect cloud forecasting. FinOps is primarily a culture shift where cloud cost is moved to the forefront of everyone's thinking. With leadership and engineers receiving on-going updates to technology advances in the cloud, some workloads will be re-architected to avoid technology debt. When older, more legacy technologies are being replaced with more cloud native ones, we may see improvements in total cost of ownership over time.
To estimate these changes in cloud cost we need to model technology migrations over time. Migrations typically have a ramp up during which both the old and new workloads are running, with a ramp down of the old workload after some observation time. We will need to reach out to the engineering teams to get a better picture of the behavior during these phases and the duration of the migration to be able to layer these into our cloud forecasts.
To summarize "breaking the challenge into addressable parts", we use account and project structures as well as a tagging strategy to allocate cloud cost to owners. We work closely with Finance to get the requirements for our cloud forecasting frequency, models, and accuracy. And we need to model and layer in effects of rate reduction and cost avoidance to build more accurate cloud forecasts.
Dieter Matzion is a member of Intuit’s Technology Finance team supporting the AWS cost optimization program.
Most recently, Dieter was part of Netflix’s AWS capacity team, where he helped develop Netflix’s rhythm and active management of AWS including cluster management and moving workloads to different instance families.
Prior to Netflix, Dieter spent two years at Google working on the Google Cloud offering focused on capacity planning and resource provisioning. At Google he developed demand-planning models and automation tools for capacity management.
Prior to that, Dieter spent seven years at PayPal in different roles ranging from managing databases, network operations, and batch operations, supporting all systems and processes for the corporate functions at a daily volume of $1.2B.
A native of Germany, Dieter has an M.S. in computer science. When not at work, he prioritizes spending time with family and enjoying the outdoors: hiking, camping, horseback riding, and cave exploration.