As businesses grow their spend in the public cloud to accelerate innovation and to build a competitive advantage, predicting cloud growth accurately short- to long-term becomes increasingly important for leadership. Finance and executives need to know available funds several years into the future to build their innovation roadmap.
In this course, you are going to learn about cloud forecasting and how to align forecasting models with the maturity of your FinOps / Cloud Financial Management practice. You will learn about the relevant terms and concepts as well as how to identify ownership and accountability. We will break down the challenge into addressable parts and walk you through solution approaches at each step.
Learning Objectives
- Understand what cloud forecasting is and why it's important
- Understand what challenges exist in cloud forecasting and how to address them
- Learn about the different ways you can forecast in the cloud
- Learn about what you can do to improve cloud forecasting
- Learn about the role forecasting plays in FinOps
Intended Audience
- This course is for FinOps / Cloud Financial Management and Finance people looking to understand how to improve cloud forecasting and how to increase forecast accuracy.
Prerequisites
A basic understanding of how the cloud works, including compute and storage services and their pricing models, as well as an understanding of the financial processes around forecasting, budgeting, procurement, and allocations.
In the previous section, we learned how to "Align forecasting methodologies with practice maturity" and looked in detail how to perform trend and driver-based forecasting. Now we are going to look at how tools can help with cloud forecasting.
Tools are very helpful when it comes to cloud forecasting because they have sophisticated algorithms to apply to cost and usage data. Let's start by looking at cloud-native tools. Here I am going to focus on products offered by Amazon Web Services, however, similar products are being offered by all major cloud providers.
AWS Cost Explorer.
Let's start with AWS Cost Explorer. It is a bill visualization tool available in the AWS console. Hourly and Resource level granularity has a cost of one cent per one thousand UsageRecords per month. Using the AWS Cost Explorer application interface or API has a cost of one cent per request.
The tool offers very detailed filtering and grouping capabilities, although groups cannot be nested, meaning we are limited to a single grouping function. There is a built-in forecasting capability, however, it tends to over forecast in my experience.
Users need to learn AWS-specific terms like charge type or purchase option before they can fully utilize the capabilities of the tool. AWS Cost Explorer is a powerful tool and provides more targeted access to billing data compared to viewing invoices in the billing interface as these can be hundreds of pages for larger cloud footprints.
AWS Budgets and Budget Reports.
Next, let's look at AWS Budgets and Budget Reports. Both are available in the AWS console. AWS Budgets has a cost of ten cents starting with the third action-enabled budget; the first two are free. AWS Budget Reports has a cost of one cent per delivered report.
AWS Budgets let you define a fixed or monthly budget which can be recurring or expiring. It supports percentage-based or absolute thresholds and is able to send alerts via email, text message, Amazon Chime, and Slack. Automated actions can be defined that support stopping some types of cloud workloads without user intervention.
Budget Reports are an automated way to send daily, weekly, or monthly budget summaries via email to keep everyone informed of how their cloud spend is trending.
AWS Budgets and Budget Reports allows leadership and engineers to take ownership of their cloud workloads by tracking actual spend against their budgets.
AWS Cost Anomaly Detection.
Next, let's look at AWS Cost Anomaly Detection. This is a relatively new offering that is available for free in the AWS console. It provides the ability to create cost monitors where thresholds, frequencies, and recipients for alerts are defined.
AWS uses machine learning to find anomalous spend behavior which is then aggregated in the detection history. Here anomalies are listed by detection date, severity, duration, and the AWS product that experienced the anomalous spend. Further analysis is possible via links that show the anomaly in AWS Cost Explorer.
AWS Cost Anomaly Detection is a powerful first step in surfacing cloud cost spikes. The tool is capable of detecting very subtle cost variances, something that is likely to be overlooked using a manual process.
As with all tools that rely on historical data, the detection algorithm cannot account for activities that have no record in the past. A common cause for false positives are new workloads being added or existing workloads being manually scaled.
Amazon Quicksight.
Next, let's look at Amazon Quicksight. This is a relatively new business intelligence or BI offering available in the AWS console. It is available in an enterprise or standard edition and pricing is based on authors and readers.
The tool has a wide range of visualization capabilities, offers data acceleration via the Super-fast, Parallel, In-memory Calculation Engine or SPICE, allows the programmatic creation of dashboards via APIs, and offers machine learning insights.
Amazon Quicksight is a good first step for creating business intelligence dashboards and is well integrated within the AWS ecosystem. However, there are no default dashboards provided out-of-the-box, requiring authors to start from scratch, which has a learning curve.
While the tool has deep capabilities, the graphs tend to look a little bland. This is due to the default color palette using muted pastel tones and the annotations are thin making them not stand out.
Using Amazon Athena to access the AWS Cost and Usage Report or CUR.
Lastly, let's look at how we can use Amazon Athena to access AWS billing data. The tool charges for the number of bytes scanned, rounded up to the nearest megabyte, with a ten megabyte minimum per query. Additionally, standard Amazon Simple Storage Service or S3 rates are charged for requests and data transfer. By default, query results are stored in an S3 bucket and are also billed at standard S3 rates.
We get started by configuring AWS billing data to be delivered via the AWS Cost and Usage Reports or CUR into Amazon S3. Here it is important to configure the data to be stored in parquet format so that Amazon Athena can access it.
AWS delivers CUR data generally three times per day, however, the data can be lagging several hours behind depending on when cloud products report in their billing data and also on the time of the month. Expect delays at the beginning of a month as AWS systems are stretched building end-of-month invoices.
Amazon Athena provides a simple web-based structured query language or SQL query editor that uses a modified Presto syntax. CUR data is available in a single table with hundreds of columns and potentially millions of rows depending on the size of the cloud footprint. Query results can be downloaded in comma-separated value or CSV format and imported into spreadsheets.
Many examples of CUR queries can be found online. For deeper insights, users will need to learn the meaning of the most used CUR fields. Some examples for commonly used fields are line item usage account id, line item usage type, and line item unblended cost. The combination of these three fields will show unblended cost by account and cloud product.
Amazon Athena is a powerful tool that allows for deeper insights AWS Cost Explorer cannot deliver like for example nested grouping. However, using the tool requires hands-on SQL skills and requires detailed knowledge of the CUR fields.
In-house tools.
Now let's switch gears and look at what in-house tools we can build to help us with cloud forecasting.
Spreadsheet automation.
Let's start with spreadsheet automation. Spreadsheets can be a powerful data analytics and visualization tool and one might argue that most businesses rely on them in one form or another. One capability commonly overlooked is that the data import into a spreadsheet can be automated.
For example, Google sheets offer import functions for comma-separated value format or CSV, tab-separated value format or TSV, HyperText Markup Language or HTML, Extensible Markup Language or XML, really simple syndication or RSS, and Atom Syndication Format feeds. You can simply specify a Uniform Resource Locator or URL to point at a data source, which can be another Google sheet, and the data will be imported and updated automatically.
This provides data analytics and visualization capabilities with low engineering effort that can be implemented relatively quickly. For example, we can use a Lambda function to generate a data export and use a spreadsheet to provide custom reports across the organization.
One drawback of this method is that customizations of the code in the spreadsheet can get complex quickly and maintaining the spreadsheet code will become challenging over time. I recommend keeping things as simple as possible and switching to other tools for more complex applications.
ChatOps (integration with Slack, Microsoft Teams, Google Chat, and so forth)
Next, let's look at ChatOps. Here we use chat applications for development and operations. All common chat applications like Slack, Microsoft Teams, Amazon Chime, and Google Chat provide application interfaces or APIs to integrate automation functions. For example, when an employee opens a helpdesk ticket, the ticket can be posted into the helpdesk team's chat to start their workflow.
When it comes to cloud forecasting a good first step is to post-budget alerts either as a direct message to the owner or in the channel of the engineering team responsible for the workload. These communications need to provide actionable insights, meaning the recipient needs to know what account or project is affected, what the allocated budget is, and the trending of actual spend.
A more advanced technique is to build custom chat commands that allow direct access to billing data or waste reduction opportunities. For example, an engineer could submit a chat command to see their current spend broken down by workload to investigate cost impact of experiments without having to learn other tools. Another example is to build a custom chat command to surface orphaned resources, so-called zombie assets, which incur a cost but are not actively used.
ChatOps is a powerful tool as it provides capabilities in a single pane without context switching and learning other tools. However, building ChatOps capabilities requires engineering effort and ties critical business capabilities to a specific chat vendor. Switching chat vendors will necessitate some of the code to be rewritten.
Building custom dashboards
Lastly, let's look at building custom dashboards in-house. As an organization matures in their cloud journey, it may become unavoidable to support specialized business processes that cannot be performed by 3rd party tooling.
Building in-house dashboards requires a more substantial engineering effort. Such an effort requires deep technical knowledge on the front-end visualization side, the extract, transform, load or ETL layer, and the database side. These specialized skills are for the most part mutually exclusive making it challenging to find full-stack engineers. Building and maintaining in-house dashboarding capabilities may require a small team of analytics specialists.
There is a large number of 3rd party dashboarding vendors available, the most common being Tableau by Tableau Software, Looker now owned by Google, and Microsoft Power BI. These tools typically come with substantial licensing fees and the choice of vendor will likely be driven by the business.
When designing dashboards we need to provide actionable, accurate, consistent, near real-time insights for engineers, leadership, and finance. This may necessitate different views or dashboards depending on the customer. For example, Finance may need to see an annual actuals versus budget view while engineers are only interested in how the workloads they own perform.
3rd party tools.
Now let's switch gears again and look at 3rd party tools. The cloud tools market offers a wide array of capabilities with each vendor claiming their features being superior. Here we are going to focus on the vendors with the longest track record that distinguished themselves from their competition.
CloudHealth by VMware.
Let's start with CloudHealth, now owned by VMware. Founded in 2012, CloudHealth is a multi-cloud management platform that provides visibility, optimization, and automation across five cloud providers, specifically AWS, GCP, Azure, Oracle Cloud, and VMware cloud.
It is a good first step when shopping for a feature-rich product with high-quality support that is intuitive to use. CloudHealth primarily excels in spend monitoring but also offers spend forecasting, tracking, and optimization. As with all 3rd party tools I recommend running a short-term proof of concept before committing to a specific vendor.
Effectively using any tool requires learning its proprietary capabilities. Large tools offer a vast array of features that require in-depth training. CloudHealth provides self-paced online training courses that explain well how to use their features effectively. Additionally, their support staff is very knowledgeable and responsive.
Cloudability by Apptio.
Next let's look at Cloudability, now owned by Apptio. Founded in 2011, Cloudability is a cloud financial management tool for multi-cloud, hybrid, and software as a service or SaaS infrastructures. It supports three cloud providers, specifically AWS, GCP, Azure.
Cloudability works closely with the FinOps Foundation and its parent company Apptio is a member of the Technology Business Management or TBM council. The offered features are comparable to CloudHealth when it comes to spend monitoring, forecasting, tracking, and optimization. However, Cloudability's visualizations have a more up-to-date look and feel and it integrates well with other Apptio products specifically centered around TBM.
Anaplan's Planning, Budgeting, and Forecasting datasheet.
Lastly, let's look at Anaplan. Founded in 2006, it is a business performance orchestration tool that connects teams, systems, and insights for agile scenario modeling, continuous forecasting, and connected planning.
At the time of the making of this video, this was the only commercially available tool that offered a comprehensive planning, budgeting, and forecasting module capable of driver-based forecasting out-of-the-box. I worked with the Anaplan team to help them build this functionality.
As with all 3rd party tools there is a learning curve associated with effectively using the forecasting module. Anaplan is not offering the forecasting module as a stand-alone component, meaning other components have to be purchased before driver based forecasting becomes available.
To summarize "How tools can help with cloud forecasting", tools offered by cloud providers require less engineering effort compared to in-house tools. Tools built in-house are more customizable to support specialized business processes compared to 3rd party tools. And 3rd party tools are the most feature rich but require specialized training to be used properly.
Dieter Matzion is a member of Intuit’s Technology Finance team supporting the AWS cost optimization program.
Most recently, Dieter was part of Netflix’s AWS capacity team, where he helped develop Netflix’s rhythm and active management of AWS including cluster management and moving workloads to different instance families.
Prior to Netflix, Dieter spent two years at Google working on the Google Cloud offering focused on capacity planning and resource provisioning. At Google he developed demand-planning models and automation tools for capacity management.
Prior to that, Dieter spent seven years at PayPal in different roles ranging from managing databases, network operations, and batch operations, supporting all systems and processes for the corporate functions at a daily volume of $1.2B.
A native of Germany, Dieter has an M.S. in computer science. When not at work, he prioritizes spending time with family and enjoying the outdoors: hiking, camping, horseback riding, and cave exploration.