How to Stop Kubernetes Cluster Nightly to Save Costs
Reduce cloud costs by pausing non-production GKE clusters at night. Learn how to pause and restart them to save money.
Reduce cloud costs by pausing non-production GKE clusters at night. Learn how to pause and restart them to save money.
Google Kubernetes Engine (GKE) clusters consist of multiple nodes and at least one control plane. This allows you to manage, deploy, and scale your containerized applications using the Google infrastructure for a recurring charge of $0.10 per cluster per hour in one second increments. You also pay for the computing and storage resources running on that cluster.
If your organization has non-production clusters for testing or QA for example, you can pause your clusters on a nightly basis and restart them in the morning to save costs.
In this guide, we’ll walk you through the steps of pausing and restarting your GKE cluster to reduce your cloud costs, showing the steps for both the gcloud CLI and the Console options.
The first step to pausing a cluster for the night is sending a notification to the cluster owner alerting them of the shutdown. This gives the cluster owner and anyone who relies on the cluster an opportunity to stop or prolong the shutdown.
Pausing a GKE cluster stops all cluster Compute Engine VMs, which means that a paused cluster will cause any currently running jobs to fail. Sending a notification ensures that you are not disrupting anyone’s work by pausing the cluster for the night.
After you have received the "go ahead" to pause your GKE cluster, you need to get all node groups. Node groups or node pools use a Nodeconfig specification.
You can view your node pools in gcloud using the "gcloud container node-pools list" command followed by the cluster name ("non-prod-cluster" in this example):
You can access the Google Kubernetes Engine page using these steps:
Next, you need to set all of the node pool sizes to 0.
You can resize a cluster's node pools by running the "gcloud container clusters resize" command. Follow this command with the name of the cluster, then the name of the pool, and the number of nodes for each region the pool is in. Set the number of nodes to "0":
You will need to repeat this command for each node pool. If your cluster has only one node pool, you don’t need to specify which pool in the command.
These are the steps in the Google Cloud Console:
To restart your GKE cluster the following morning, you simply have to reset all your nodes to their default sizes. Redo steps 2 through 3 at the beginning of the day to relocate your GKE cluster node pools and resize them to their original values instead of "0." Do this for every GKE cluster and node pool for each region when you are starting your GKE cluster again.
By pausing non-production clusters at night, you can consistently lower your costs, but only if the process isn’t time-intensive. In the method described above, you need to run these commands for each cluster and node pool for each region. At a certain point, it can feel like it’s too time-consuming to be worth doing.
With Blink Copilot, you can easily automate this process so it kicks off at a scheduled time, automatically sends notifications, waits for approvals, and restarts everything in the morning.
Just type a prompt to create this automated workflow. It executes the following steps:
You can find this workflow pre-built in the Blink Library, or you can generate it by typing a prompt.
You can try typing any of your own prompts here to see how easy automating workflows can be.
Get started with Blink today and see how easy automation can be.
Blink is secure, decentralized, and cloud-native. Get modern cloud and security operations today.