Teams often choose to reserve capacity instead of taking advantage of spot instances. But reserved capacity is a path to vendor lock-in and paying more in the long term.
It doesn’t make sense to cut yourself off from the opportunity to save up to 90% off the on-demand price just because AWS can pull the plug on your instance with a two-minute notice.
You can use spot instances effectively even for production workloads.
Keep on reading this guide to get six expert tips on how to handle spot instances and achieve dramatic cost savings on your cloud bill.
#1: Know when to use spot instances
The first step is having a general idea about which services can benefit from spot instances most.
For example, if your service is stateless and can be scaled out (have more than a single replica), you can definitely use spot instances here. Today, most services in modern architectures are stateless.
Here are some example workloads that can benefit from spot instances:
- Batch processing jobs that are fault-tolerant and instance-flexible.
- Containers and microservices are typically self-contained, highly available, able to handle interruptions, and fault-tolerant.
- High Performance Computing (HPC) and machine learning applications that need high compute capabilities, lots of memory, super-fast storage, and high network performance. Spot instances can help here via bursting or even serve as their primary compute infrastructure.
- CI/CD operations, no matter what tools you use – spot instances can help in your deployment process.
- Distributed databases such as Elasticsearch or MongoDB are able to cope with interruptions without losing any data or affecting the service.
- Any application in an orchestrated environment
#2: Check if your workload is spot-ready – here’s how
When looking at a potential candidate for spot instances, you need to know a few things about it.
Here are a few questions to get you a step closer:
- How much time does your workload need to finish the job?
- Is it mission- and time-critical?
- Can it handle interruptions?
- Is it tightly coupled between instance nodes?
- What tools are you going to use to move your workload when AWS pulls the plug?
Answer them and you’ll know whether spot instances are a good match for your workload.
#3: Pick the right spot instance
Take a look at what AWS has to offer. Choose less popular instances – their chances of getting interrupted here are lower and they might run stable for a longer time.
When searching for the best VM type for the job, take a look at their frequency of interruption. It’s the rate at which the instance reclaimed capacity during the trailing month.
AWS displays it in the Spot Instance Advisor in the following ranges: <5%, 5-10%,10-15%,15-20% and >20%:
It’s possible to use spot instances for production workloads with a specific type of spot instances. AWS offers one where you get uninterrupted time guaranteed for up to six hours (calculated in hourly increments). You just have to pay a little more for it. But you can still get a discount of up to 30-50% compared to the on-demand pricing.
#4: Set your price
Once you know which spot instances you’ll be using, it’s time to set the maximum price you’re willing to pay for them.
Here’s a good rule of thumb: Set the price at a maximum that matches the on-demand pricing.
Don’t forget that your spot instance will run only when its marketplace price matches your bid (or is lower than that). If your custom amount is lower than on-demand pricing and the price for the spot instance goes up, you simply risk getting interrupted.
#5: Manage spot instances in groups
This is a smart move because it gives you the option to request multiple instance types at the same time and increases your chances of getting one for your workload (avoiding any interruptions).
Another benefit is that you can set a maximum price/hour for the entire fleet of instances rather than a given spot pool. A spot pool is a group of instances with the same type, OS, availability zone, and network.
In AWS, this is called AWS Spot Fleets. This feature allows you to manage a large fleet of spot instances using various allocation strategies. For example, taking only the lowest price into account or capacity-optimized instance types).
Note: Prepare for a lot of manual configuration, setup, and maintenance tasks.
#6: Automate spot instances
How to avoid downtime from lost spot instances? Your best strategy is implementing an automation tool that manages your cloud infrastructure for you using policies and autoscaling.
By using an automated cloud cost optimization solution, you can choose how much of your workload should run on spot instances and then automatically fall back to on-demand instances if an interruption happens.
Your workload will always have a place to run. Thanks to AWS Rebalance events, it’s possible to mitigate the risk even before you receive the two-minute interruption notice.
To achieve the best results, get a solution that takes automated actions based on predictive analytics like CAST AI.
It’s the safest approach to using spot instances in production and slashing your cloud bill by up to 90%.
Editor’s note: This article is in association with CAST AI.