Keeping Disruptions in Check when using Karpenter

By default Karpenter can be very aggressive in disrupting your workloads, as soon as it detects a cheaper way to run your pod it will start to consolidate. If you are running on Spot Instances then a spot disruption that comes in at the same time as Karpenter is conslidating your workloads this can lead to trouble - so here are some settings and tips that I found to be very effective at keeping Karpenters Disruptions in check.

1. Give Karpenter many instance types to choose from

This is probably the most common Trap I see people fall into with Karpenter, they create too many nodepools that contains only a small set of allowed instance types. This is usually comes from people wanting to the perfect instance for their workload but The path to hell is paved with good intentions.

Start with a single nodepool that contains as many instance types as possible, put all of your workload into this nodepool and only create additional nodepools when you have a concrete problem that you want to solve.

2. Increase consolidateAfter

Karpenter by default has consolidateAfter set to 0s

I recommend using 3 minutes for consolidateAfter

Keeping Disruptions in Check when using Karpenter

1. Give Karpenter many instance types to choose from

2. Increase consolidateAfter

3. Use Disruption Budgets