The course is part of these learning paths
This course from Kevin McGragh, VP of Architecture at Spotinst.com, explains how to leverage excess cloud capacity from providers such as AWS, Microsoft Azure, and Google Cloud Platform to optimize the costs of cloud computing using spot instances.
Intended Audience
Everyone working with Cloud compute workloads, from start-ups to large corporations.
Prerequisites
A basic understanding of cloud computing and cloud computing billing models. if you are new to cloud computing we recommend completing our What is Cloud Computing course first.
Learning Objectives
This course will enable you to:
- Recognize and explain how to run and manage workloads on excess cloud capacity using Spot Instances.
- Recognize and explain the risks and benefits of the spot market.
- Recognize and implement Spot Instances to reduce cloud compute costs.
- [Instructor] How to Minimize Risks and Taking Advantage of Spot. Topic one, AWS Spot Instances, the Spot lifecycle and strategy. Spot instances are known for being preempted within a short period of time. However, this can be avoided, by understanding markets, regions, and instance capacity. With this understanding, a workload-based strategy, which we will cover in depth in section three, can be enacted for taking full advantage of what Spot capacity has to offer. As we already mentioned, Spot prices fluctuate depending on the supply and demand of each specific regions availability zones. This creates small, independent markets that have their own instance prices and availability rates. It also means that the region and availability zone is a crucial parameter when considering Spot instance purchasing. The regions and availability zones with the largest pools of Spot Instances, are in US-east-1, Northern Virginia and EU-west-1, Ireland. It might be somewhat surprising that these regions hold spare compute capacity, considering that they are likely the most popular of all AWS regions. Based on our experience, we can safely say that bidders in these regions enjoy hundreds of Spot Instances available for weeks before being reclaimed by EC2. This means less price fluctuations and higher certainty. Spotinst test found that the least utilized regions with the smallest Spot Instances pools are Eu-central-1 Frankfurt, AP-south-1 Mumbai, and AP-southeast-1 Singapore. Our research shows that in comparison to the most utilized regions these data centers hold tens of Spot Instances that are available for only several days before being reclaimed by EC2. Understanding how large a region is and all its availability zones are important to understanding how long an instance will run and where the best prices can be found per Spot capacity. Topic two, Spot Instance management strategies. Due to variation of Spot Instances availability attention must be paid to bidding and persistence models in order to maximize value. Therefore, to develop my proper strategy we must first understand how Spot prices are calculated and the concepts of capacity and Spot pools. A Capacity Pool is a set of EC2 instances that share the same region, availability zone, kernel which can be Linux or Windows, and instance type. A Capacity Pool can have both on-demand and excess or Spot Instances. The excess capacity is referred to as the Spot Pool. Each Spot Pool has its own availability and price as determined by current supply and demand conditions. Since on-demand and Spot Instances share a Capacity Pool, Spot prices can fluctuate from usage changes of both Spot and on-demand instances. Combining multiple capacity and Spot tools into a single strategy enhances availability across the combined set. Generally speaking, applications that run across more than one Spot Pool will persist for longer periods and get better prices. With any Spot strategy, instances will eventually be reclaimed. Applications should be capable of handling single instance termination, meaning that no manual changes should be required when an instance is launched. In addition to that, workloads should be able to automatically remain available when interruption occurs. There are three primary ways to handle these interruptions. First, use only Spot for transient workloads where downtime isn't an issue. This is great for development and staging environments or very short lived data processing. Second, companies can build an internal system for automatically managing Spot interruptions, scaling roots, and instant replacement. Lastly you can take advantage of using third-party Spot Management platforms. Let's take a look at each of these options. First, it's worth nothing that without a management strategy downtime will occur. This downtime, even for QA and test environments can be quite painful and cause incredible inefficiencies for an R and D team. Still, for running data processing workloads where downtime and speed are less important this could be a viable option. When managing Spot yourself follow these rules. First, build price aware applications. Start by investigating the full range of capacity pools that are available to you within the region or regions the application will be deployed to. High prices and a high degree of price variance over time indicate that demand is high and bidding for capacity in the same pool may be difficult. You will need to search for pools that have both lower prices and more stable current and historical pricing. These pools will yield lower interruption rates. Second, check the price history. Access historical prices on a per pool basis going back 90 days. Instances and instance families that are currently very popular tend to have Spot prices that are somewhat more volatile. Older generations, such as instances in the C3 and M3 families tend to be much more stable. In general, picking a previous family generation of instances will result in lower net prices and fewer interruptions. Third, use multiple capacity pools. Applications can usually run, or be adapted to run across multiple capacity pools. Having the ability to run across multiple pools, the applications sensitivity to price spikes are reduced. In general, there is little correlation between prices in different capacity pools. Spotinst has found that running up to five different pools, price swings and interruptions can be cut by 80%. And last, if you're running on AWS, always check out the Spot Advisor Page at aws.amazon.com/ec2/Spot/instance-advisor. If going it alone is not an option, there are managed solutions available. Spontinst for instance uses big data and machine learning to automatically provision and manage workloads on the best mix of well priced and highly available Spot Instances given the application's performance needs. Spotinst is also able to provide you 100% availability SLA on any workload that does not have a single point of failure. An Elastigroup managed by Spotinst will automatically fall back to on-demand to ensure that workloads never experience downtime. The platform will automatically deploy resources to optimize the balance between cost and performance continually analyzing the diversity of clouds, regions, zones, and resource types to ensure high availability of clusters. Later, we will demonstrate the advantages Spotinst brings for production and mission critical workloads. Topic three, Spot Fleet, what is it? A Spot Fleet is a collection or fleet of Spot Instances. The Spot Fleet attempts to launch the number of Spot Instances that are required to meet the target capacity that was specified in a Spot Fleet request. The fleet will also attempt to maintain target capacity if Spot Instances are interrupted due to a change in price or available capacity. Spot Fleet will attempt to launch instances that will result in lowest cost, while diversifying the instances across different capacity pools. When launching a cluster using Spot Fleet you can choose between two allocation strategies. First, is the default option, lowest price. Spot Instances will be launched solely based on the price of the market. Second is diversified. In this option Spot Fleet will launch evenly across the capacity pools that were specified in your request. For example, if an m4.large instance price rises over that of on-demand and another family or instance type is specified in the fleet, the workload will be given capacity from the second pool. Topic four, Spot Management Platforms and how they help and differ from Spot Fleet. Spotinst management platform uses predictive algorithms to manage Spot market behavior, capacity trends, pricing and interruption rate. Whenever there is a risk of interruption, Elastigroup acts accordingly to seamlessly balance capacity ensuring 100% availability and no risk of downtime. This means that your application will always run on the most cost efficient collection of instances, the best price Spot Instances when available and falling back to on-demand when Spot Pools are exhausted. In addition, Spotinst prioritizes any reserved instances you may already own, maximizing cost efficiency. This has various differences than Spot Fleet, most notably the SLA, strategic use of the bidding system, third-party integrations, automatic backup, and stateful support. Spotinst offers a 99.99% SLA availability for the Spotinst service, which is the biggest differentiator between Spotinst and Spot Fleet. This is very important for everything from development, QA, production, and mission critical workloads. Spotinst prediction algorithms will start launching new capacity 15 minutes prior to market termination. For instances with state, Spotinst will migrate the data between the instances, so that the transition is seamless. When working with Spot Fleet, there is no SLA for Spot Instances. The instances can be terminated with a two minute notice. Regarding the bidding system, Spot Management Platforms will normally offer automatic bidding, invisible to the user. In Spotinst, the AI algorithm is based on thousands of customer's data and will place bids for the cheapest instances, while guaranteeing the highest availability. In terms of third-party integrations, one of the most important questions is, how easy it to integrate Spot Instances into an existing tool set. When working with Fleet, services need to be configured independently to work with changing Spot capacity. Spot Management providers, however, will provide native integration with tools such as Chef, Ansible, CodeDeploy, OpsWorks, Kubernetes, ECS, and many others. Lastly, Spotinst provides automatic backup and stateful support. As Spot Instances terminate frequently, persistence is one of the larger challenges when migrating workloads to Spot Instances. Spotinst provides support for stateful applications by scheduling snapshots of the root and attached EBS volumes. With the Auto Backup feature, data is persisted within the cluster. In case of instance replacement, Elastigroup will use the latest snapshot recorded to keep data available to new launched instances.
Kevin McGrath is the VP of Architecture for Spotinst, specializing in Cloud Native and Serverless product designs. Kevin is responsible for researching and evaluating innovative technologies and processes leaning on his extensive background in DevOps and delivering Software as a Service within the communications and IT infrastructure industry.
Kevin started his career at USi, the first Application Service Provider (ASP) 20 years ago. It was here he began delivering enterprise applications as a service in one of the first multi-tenant shared datacenter environments. After USinternetworking was acquired by AT&T, Kevin served in the office of the CTO at Sungard Availability Services where he specialized in migrating legacy workload to cloud native and Serverless architectures.
Kevin holds a B.A. in Economics from the University of Maryland and a Masters in Technology Management from University of Maryland University College.