Next Steps
Start course
1h 15m

Cloud computing providers like Amazon Web Services, Microsoft Azure, and Google Cloud Platform are becoming a larger part of our IT budget, making it necessary to understand their cost. We may even be surprised to see public cloud bills to be higher than expected. I am going to take a closer look at the top contributors and what we can do to reduce overall spending while maintaining innovation velocity.

In this course, you'll learn what makes the cloud such an attractive solution, what drives cloud adoption, and what are the typical costs of cloud computing are. You'll learn about a wide range of cloud cost optimization techniques, the best practices for cost management, and how to gamify the cloud cost experience.

If you have any feedback relating to this course, please let us know at

Learning Objectives

  • Understand what makes cloud attractive and how adoption will drive cost
  • Learn how to gain visibility into cloud cost and how to hold departments accountable for their spending
  • Learn about cloud cost drivers and how to get the most out of your budget
  • Discover how to establish best practices and build a culture of cost-consciousness

Intended Audience

This course is for executives, architects, and technical leads looking to understand what drives public cloud cost and to learn about best practices of cloud cost optimization.


To get the most out of this course, you should have a basic understanding of cloud concepts. Some familiarity with cloud services like compute and storage would also be helpful but is not required.


Welcome back to our cost optimization strategies for the cloud course. I'm Dieter Matzion and I will be your instructor for this lecture. In this lecture we're going to talk about cost optimization next steps. We are going to touch on some ideas of things to do once your cost optimization program has been underway for a bit. Gamifying the cloud cost experience does not mean building top 10 lists of applications that cost the most. Such a list would be mostly static with little usefulness.

For example, the number one cost application is likely one of your businesses primary offerings. It would take a lot of engineering resources to move them off the number one spot, potentially with only marginal benefit to your business. A good example of gamification is when a developer approached me with a launch request for a new feature that would need 5,000 course in the cloud. I showed him where to find the monthly production launch budget and what feature requests were queued up for the current month. I explained that because his new feature had vary little impact on revenue, this was just the nature of the things his team worked on. And because his feature was very resource-intensive that I wasn't even sure if he would be able to launch next month.

The developer came back two days later with the same feature now needing only 680 course, which put him into the middle of the current month's queue. Having cost reporting available to everyone within my company, allowed me to easily share information and it allowed a developer to understand what it would take for his code to make it into production. The developer learned how to win the game by following the rules. The use cases we discussed in the managing cost drivers lecture were driven by the convention to reduce cost.

And people sometimes ask me where the cut-off point is. When is it not worth working on something? For me, the answer is, when the time it takes me to work on a reduction effort costs the company more than I'm saving. That said, this is somewhat of a rule of thumb. A battery is to quantify the amount of revenue a section of code produces and compared against the cost it takes to run that section of code. A relatively simple way to obtain both metrics is by performing a canary test against your baseline. This assumes you already have the ability to quantify a revenue from a production deployment.

A canary release is essentially a fractional production deployment where some portion of your customers use the new code version. This allows you to not only extrapolate the cost of a full-scale production deployment but also the change in revenue the new code version provides. For cost, you will likely get percentage changes of sub-systems. Like for example, a plus 3% increase in the API layer with a 2% decrease of the persistency layer. This will allow your development leads to make more informed decisions around software releases.

While we're on the subject let's touch on how system's performance optimization impacts cost. Some future releases may not change revenue. For example, refactoring efforts where the same functionality uses less cloud resources. These releases will add a credit to your release budget which allows development leads to save credits for upcoming experiments or bigger releases. This is another way of gamifying the cloud cost experience. Somewhat related to the previous lecture where we touched on cultural changes, I like to bring up how failure as a service can help with cloud cost.

There are several open source examples for failure as a service. One well known is Netflix's Chaos Monkey, essentially a script that periodically traverses your compute infrastructure in the cloud and randomly terminates virtual machines.

This may seem a bit much at the early stages of your cloud journey. However, consider the advantages of random termination functionality at later stages. Not only is it a forcing function for your developers to build resilient services that can survive random termination, it also helps you to clean up forgotten computer resources. In this lecture we talked about some of the next steps your cloud cost optimization team can work on. Like for example, what gamifying the cloud cost experience can look like.

About the Author

Dieter Matzion is a member of Intuit’s Technology Finance team supporting the AWS cost optimization program.

Most recently, Dieter was part of Netflix’s AWS capacity team, where he helped develop Netflix’s rhythm and active management of AWS including cluster management and moving workloads to different instance families.

Prior to Netflix, Dieter spent two years at Google working on the Google Cloud offering focused on capacity planning and resource provisioning. At Google he developed demand-planning models and automation tools for capacity management.

Prior to that, Dieter spent seven years at PayPal in different roles ranging from managing databases, network operations, and batch operations, supporting all systems and processes for the corporate functions at a daily volume of $1.2B.

A native of Germany, Dieter has an M.S. in computer science. When not at work, he prioritizes spending time with family and enjoying the outdoors: hiking, camping, horseback riding, and cave exploration.