This course provides an introduction to cost management in AWS. It starts by looking at the economics of the cloud in general, including economies of scale and total cost of ownership, and you'll also learn why cost optimization is important.
We'll also cover the AWS Pricing Calculator and the AWS Well-Architected Framework and how these allow you to optimize your AWS environment and also calculate how much it will cost. We round off the course by taking a look at terminology across areas including software development, DevOps, finance, and general AWS terminology.
- Get a foundational understanding of cost optimization in AWS
- Learn the fundamentals of cloud economics including economies of scale, total cost of ownership, and why cost optimization is important
- Learn about the AWS pricing calculator
- Learn about the AWS Well-Architected Framework and how it can help to make your AWS environment more efficient and cost-effective
- Understand a range of terminology linked to cost management in AWS
This course is intended for cloud architects, business management, or anyone looking to manage their costs effectively in AWS.
To get the most out of this course, you should already have some experience with the AWS platform.
Striving for excellence by applying as many best practices as possible into the workloads, with the aim to reduce human error, save time and resources by automation and continually improve processes.
The entire workload, i.e., the development of applications together with the infrastructure, should consist exclusively of code. This way, operational errors can be reduced, and the execution, as well as updates or changes, can be automated, which saves time and resources.
Frequent but small and reversible changes
The development and infrastructure should be designed for small and light updates that are made more frequently and should be easily reversible if necessary, without breaking anything.
Evolve procedures alongside the workload
As software development progresses, so should the associated processes. Regular and dedicated routines can help to find improvement opportunities, like finding processes that can be automated, and validate the effectiveness of the procedures.
Identify sources of failures and remove the cause before issues reoccur by performing preventive exercises. Regular events and simulated exercises will increase awareness inside the team and improve reaction times.
Learning from operational failures
Every incident and failure is a lesson to analyze and learn from. Lessons and appropriate improvements should be well documented and shared throughout the whole company.
As customer needs and business demands can change rapidly, it is wise to design the development infrastructure to support agility and possible changes in advance.
Additionally, keeping lessons on success and failure well documented and easily accessible helps maintain the best possible performance and reduce spending time on recurring decision-making.
That was the first pillar, operational excellence. The next one will be Security.
The goal of the security pillar is to provide the highest possible security on data, systems, and components.
Implementation of a strong identity foundation
Going by the principle of least privilege, users are granted only as much access as they need to fulfill a task. Appropriate authorization should be a major requirement for all resources in the cloud system.
Real-time monitoring, alerting, and auditing actions and changes in the environment should be logged and made available for automatic investigation and action routines.
Apply security at all levels
Defensive strategies should be applied to all possible levels.
Security by automation
By using automated security mechanisms, architectures gain the ability to scale faster and more cost-effectively.
Make the use of data encryption, tokenization, and access control mandatory.
Reduce the risk of improper handling or modification of sensitive data by users by preventing direct access when not needed.
Prepare incident management and investigation policy in advance to handle incidences when they occur.
When designing the system architecture, identity control, and access on multiple layers should be prioritized from the beginning to avoid major changes at a later state.
Recognize demand changes as well as disruptions at an early state to acquire resources in time and recover from failures automatically.
Identify potential outages by using Key Performance Indicators (KPIs) on the workload that will trigger the monitoring system. This allows you to take appropriate action to either prevent the outage or automatically begin remediation.
Test recovery procedures
Develop recovery strategies by testing different failure scenarios. Other than an On-Premise environment, the cloud allows simulating variously scaled scenarios.
Horizontal scaling for better availability
Prevent common points of failure by scaling the infrastructure horizontally, i.e., distributing requests across multiple resources rather than hoarding the entire workload on a single resource.
Stop guessing capacity
Use monitoring as a tool to detect demand and scale the environment accordingly by automated addition or removal.
Manage changes in automation
Changes to the infrastructure should only be made by automated and trackable actions.
Monitoring and logging is the fundament of a reliable system. By analyzing logged metric data and responding accordingly in time, failures can be detected beforehand and automatically repair themselves.
Efficient usage of computer resources to meet system requirements while demand and technological advances may change.
Make use of advanced technologies
In times of cloud computing, many software solutions come as a service. Using software as a service instead of hosting, operating, and managing a tool provides more free time and resources for the development team.
Global in minutes
With AWS Regions, the workload can be deployed all around the world and at various scales, which allows the reduction of latency for customers.
Today, many traditional computing activities can be realized through serverless solutions that can be maintained without the need for a physical server, thus saving costs for operation, management, and operations.
Various tests can be performed with virtual resources to determine which service or resource and which type of configuration is best suited for individual requirements.
Know the options - make the right choices
Make sure to get to know about the service that aligns best with individual workload goals. For example, when making decisions, consider the appropriate database or storage concept that best suits the needs.
Use a data-driven approach to select high-performance designs. Collect data on all areas of your architecture, monitor deviations from expected performance, and then take action.
Deliver business value at the lowest possible price. Reduce upfront fixed costs and profit from controllable and small ongoing expenses.
Implement Cloud Financial Management
Build capabilities to manage and spread awareness of costs and expenses in the cloud environment.
Adopt a consumption model
Pay as you go. Analyze the actual business needs and match resources to current requirements.
Measure overall efficiency
Measure workload business performance and the costs associated with deployment. Use these metrics to determine the gains you make by increasing performance and reducing costs.
Stop spending money on data center operations
AWS takes over traditional data center tasks and does not charge extra to manage and update operating systems or applications with managed services.
Analyze and attribute expenses
AWS makes it easy to identify system usage and costs and transparently allocate IT costs to individual workload owners based on this data. This helps you measure return on sales (ROI) and enables workload owners to optimize resources and reduce costs.
Getting the best performance and value while spending as little as possible can be achieved by applying the whole AWS Well-Architected Framework correctly on the individual business and cloud environment. The most vital part is to spread awareness of expenses and correct resource usage across the team to get rid of inefficiency.
Oliver Gehrmann is a FinOps Consultant and CEO of kreuzwerker Frankfurt, a German consulting firm with a strong focus on AWS, software engineering, and cloud financial management. He's worked in IT for over 10 years, facilitating the migration from physical servers in data centers to modern cloud infrastructures.
He and his team have experienced first-hand that costs in the cloud are becoming more and more of a challenge when about 2.5 years ago more and more customers approached them with this topic. Costs ran out of control and could not be addressed to business values.
Since that time, we have worked extensively on the topic of cloud financial management and have already been able to save our customers many millions of dollars. He now shares this knowledge in order to help others.