Running Global Health Checks on Kubernetes Clusters
Learn how to run global health checks on Kubernetes clusters to optimize performance, prevent disruptions, and improve scalability.
Learn how to run global health checks on Kubernetes clusters to optimize performance, prevent disruptions, and improve scalability.
Running a Kubernetes global health checklist can go a long way in preventing errors before they cause disruptions, and can optimize container performance according to current scalability needs.
In this guide, we're outline why monitoring your Kubernetes cluster health should be a part of your DevOps strategy, and the steps you can take to check your cluster health.
Consisting of a master node, at least one worker node, and all of the containers and Pods inside, a cluster comprises an entire workload for a given app development team — or for the entire project. The ability to configure multiple clusters according to the needs of each department enables developers to optimize the resources that they invest into creating their apps.
For example, a machine learning application may require a graphics processing unit (GPU) to function, which would not be necessary for other operations like web service. Configuring a Kubernetes cluster to the needs of each department would enable developers to use only the resources they need for each project, and none that they don't. That means failure to customize the operation of each Kubernetes cluster can result in suboptimal configuration, which can hinder app development.
Following a Kubernetes global health checklist can help DevOps teams monitor their clusters' health, ensuring that each one runs at optimum capacity. Here are a few cluster events to watch for:
Both cluster nodes and Pods have minimum and maximum amounts of CPU and memory usage that they can consume. The minima are called requests, which impact the scheduler as it uses requests to select Pods for eviction from a node under pressure. The maxima, called limits, are used at the container runtime level. They prevent the container from using more than that limit, ending in a CrashLoop most of the time.
CPU ranges are considered compressible, so exceeding them will only cause container usage to be throttled. Memory is the amount of data consumed by each container, so containers operating outside the request and limit range will be terminated. Therefore, it is important to assign an appropriate request and limit range for both CPU and memory usage to each pod within a cluster. Otherwise, a container may be throttled or terminated.
Once you have established the request and limit ranges for both the CPU and memory use, it is important to identify how much is consumed by each node and pod. This can be done by evaluating three parameters for both CPU and memory use: percent usage, percent requested, and percent limits.
A low usage rate means that you have allotted more computing power or data than needed, and could save by scaling back your limit. A higher usage percentage means that you may be operating close to full capacity, and could struggle to scale or keep up with greater loads. If the percent limit is lower than percent requested, then you may not have assigned a limit to all of your Pods.
The Kubernetes metric, kube_node_status_allocatable, helps developers identify how many additional Pods can be added based on current CPU and memory usage trends. That way, developers will know how much room they have to scale.
In addition to making sure that each node and pod is operating within the assigned computing limits, DevOps teams should also keep Pods relatively evenly distributed across all nodes.
An uneven distribution can result in some loads being overloaded and their containers possibly terminated, while the computing power available in other nodes goes unused. This can be due to node affinity, where a certain property like GPU possession or security features causes a disproportionate number of Pods to be scheduled to it. Conversely, some node features called taints may repel pod assignment, leaving them with fewer Pods than their capacity allows.
To get the most out of the computing power available, check your affinity settings to make sure no Pods are disproportionately scheduled to certain nodes.
The Kubernetes server has three API endpoints that can be used during a global health check. They are:
If a machine checks the healthz / livez /readyz of the API server, it should examine the HTTP status code, as a status code 200 indicates the API server is healthy / live / ready, depending on the called endpoint.
When developers want to manually debug the status of the API server, they can run this command with the verbose parameter:
The output then shows the full status details for the endpoint:
For more information on this type of debugging, you can read more here.
Keeping your Kubernetes cluster at optimum performance will prevent you from wasting allotted computing power, and valuable business resources too. It will also improve scalability, and will enhance efficiency across the board. Integrate this Kubernetes global health checklist into your DevOps strategy, and improve your applications today.
Running a health check with kubectl commands isn't hard, but it requires context-switching. If you want to run a health check regularly or trigger one after an alert, a little automation can save you significant time.
For example, you can use this automation out-of-the-box from the Blink library.
This Kubernetes health check automation does the following steps:
It's a simple automation, and that makes it easy to customize. For example, you can schedule it to run regularly or send the report information to a Slack channel or email.
You can get started with over 5K automations in the Blink library, or build your own custom automations to fit your unique workflow.
Get started with Blink today to see how easy automation can be.
Blink is secure, decentralized, and cloud-native. Get modern cloud and security operations today.