How to Scale Down AWS EKS Clusters Nightly to Lower EC2 Cost
Learn how to scale down non-production EKS clusters on EC2 instances nightly, reducing your cloud costs efficiently in our guide.
Learn how to scale down non-production EKS clusters on EC2 instances nightly, reducing your cloud costs efficiently in our guide.
Amazon Elastic Kubernetes Service (EKS) enables organizations to run and scale Kubernetes applications. The main ways to run EKS nodes are by using EC2 instances, AWS Fargate, or using AWS Outposts to run on-premises. For this post, we’re only going to be covering the scale-down process for EKS nodes running on EC2.
For each cluster you are running, you pay a basic hourly rate per cluster ($0.10 per hour) and also pay for the cost of running nodes on EC2 instances and any associated volumes. Hourly EC2 costs vary depending on machine type.
If you have non-production clusters for testing or QA purposes, you might not need them to be available for 24 hours per day. By setting up a process to scale down these clusters on a nightly basis, you can reduce your hourly EC2 computing costs.
In this guide, we’ll show you how you can reduce your cloud costs by using the AWS CLI to scale down your EKS clusters nightly.
Pausing EKS clusters is impossible as the EKS control plane has no concept of pausing or stopping and is AWS-controlled. Instead of actually pausing, you achieve the same result by setting the node group sizes to 0 when the workload is low to lower AWS costs.
Any change in an EKS cluster will disrupt running jobs, so it’s important to send an alert to anyone working with the cluster to warning them about the imminent scaled-down scheduled for the night. If a team is working past normal hours, this alert gives them the opportunity to intervene and prevent the scale-down.
After receiving an OK to scale EKS clusters, you must list all node pools or node groups. EKS clusters have both managed and unmanaged node groups. By using the following CLI command, you can list both types of node groups:
Next, autoscale all node groups to 0 to shut down worker nodes. EKS has a default Cluster Autoscaler that uses EC2 Auto Scaling Group (ASG) for scaling managed node groups to and from 1.
If you want to scale unmanaged node groups to and from 0. By default, the Cluster Autoscaler doesn’t discover the AGS unmanaged node groups. You’ll have to add tags to AGS to detect the AGS and deploy Cluster Autoscaler:
With this CLI command, you can scale node groups to 0. The syntax is the same for both managed and unmanaged node groups:
You can also scale a node group using the config file passed to --config-file and to the node group name scaled with --name. Eksctl will locate the config file to discover that node group and its configuration values.
An error will occur if the desired number of nodes is not within the current maximum and minimum range. You can use --nodes-min and --nodes-max flags to represent values.
Eksctl can scale multiple node groups found in a config file. But the rules for scaling a single node group and multiple node groups would be the same.
For restarting worker nodes in the morning, you need to scale back all node groups to their default position. If you have labels defined on your node groups, you’ll need AGS tags to scale up. For instance, a node group has the following labels:
You need to add the following ASG tags:
The Cluster Autoscaler assumes all nodes in a group to be equivalent. So, for zone-specific workloads, you’ll need to create a separate node group for each particular zone for scaling back.
Use the same CLI command in step 3 to scale back to default, but remember to change the maximum and minimum values.
You can consistently lower your costs by pausing non-production clusters at night, but only if the process isn’t time-intensive. In the method described above, you need to run these commands for each cluster and node pool for each region. If pausing your clusters takes too much time, it might not be worth doing it every day.
With Blink, you can either import this pre-built automation from the Blink Library, or easily generate a custom workflow by typing a prompt into Blink Copilot.
When this automation runs, it executes the following actions:
With one simple automation, you could start saving on your cloud costs.
You can try typing your own prompts in Blink Copilot here. Automation has never been easier.
Get started with Blink today to see how easy automation can be.
Blink is secure, decentralized, and cloud-native. Get modern cloud and security operations today.