Optimise compute utilisation on Kubernetes

What is compute utilisation?

Compute utilisation refers to how efficiently computational resources in a given system or infrastructure are used. It is a metric that measures the degree to which the available computing resources, such as CPUs, GPUs, and memory, are effectively utilised to perform useful work.

In other words, it measures how much of the available computing power is being used for actual computations and how much is idle or wasted. A high compute utilisation means that a system is being used efficiently, while a low compute utilisation indicates that there is a lot of idle or wasted computing power.

Compute utilisation is often used in cloud computing environments, where resources are shared among multiple users or applications. By optimising compute utilisation, you can maximise the efficiency of your infrastructure and reduce costs while still delivering high-quality services.

How to calculate the compute utilisation?

On a Kubernetes cluster, you can measure memory, CPU and disk utilisation. You can also measure network traffic. However, network traffic usually costs very little or nothing if it is contained within the cluster. 

Kubernetes schedules pods on each node based on the node’s capacity (CPU, memory, number of disk attachments, etc.) and the pods’ requests. If there are insufficient resources, either CPU or memory, on the existing node for a new pod, then this pod will be scheduled on another node. Possibly, a new node will be spined up if the Cluster Autoscaler is controlling the cluster. We calculate the percentage of the requests on a given node as:

requests percentage = total number of pods' requests / node capacity

It is calculated separately for CPU and memory. Then you need to calculate it for each node in a cluster. However, if you have a few dozen nodes in a cluster or more, then using a histogram is much more helpful. You split the percentage of the requests into a number of bins, e.g. ten bins for the percentage ranging between 0% and 100%. Then you visualise the number of nodes in each bin, for example: 

K8s memory utilisation - requests

As you can see, there are above 20 nodes having requests around 75%, which is fine. There are two nodes with requests around 100% (perfect) and one with requests around 30% (too low).

At the same time, the same cluster has the following CPU requests:

K8s cpu utilisation - requests

Most nodes have their requests at level 30% (quite low). This level of requests percentage of CPU and memory suggests that the node pool (or cluster) should have nodes with more memory per CPU, e.g. 8 GB per vCore rather than 4 GB per vCore.

Note: Given pods’ requests, the Kubernetes cluster will never schedule more than the total node’s capacity. 

What about actual utilisation? Developers tend to over-provision pods’ requests, i.e. they give higher requests than a pod would ever use. This practice leads to overly under-utilised Kubernetes clusters. The diagram below juxtaposes CPU and memory requests (lower row) with their utilisation (upper row).

K8s overall utilisation - matrix

You can easily see that CPU utilisation is very low. CPU is usually the most expensive resource in cloud environments. We should optimise it.

Optimising compute utilisation

The optimal compute utilisation depends on the specific use case and requirements. In general, the goal is to achieve the highest possible compute utilisation while maintaining acceptable performance and application stability levels. We do not want to throttle the application’s performance (CPU) or, even worse, let it crash due to an out-of-memory error (OOMKilled).   

For example, in a cloud computing environment, the optimal compute utilisation density may be different for different types of workloads. Some applications may require more computing resources than others, and some may be more sensitive to latency or network bandwidth limitations.

In addition, the optimal compute utilisation density may change over time as the workload and demand on the system fluctuate. Therefore, it is important to continually monitor and optimise compute utilisation density to ensure that resources are being used effectively and efficiently.

Overall, the optimal compute utilisation density is a balance between maximising resource utilisation and maintaining performance and availability requirements for the system or application.