Effective Resource Management in Kubernetes: Requests, Limits, Allocation, and Monitoring

Access Kubernetes Labs

After scheduling a pod on a node, the pod immediately starts to use compute resources and if it remains unchecked, it could deprive the other clusters running workloads of necessary resources which could ultimately crash the whole cluster.

Efficient resource management guarantees that applications obtain the required resources for proper functionality, all while optimizing cluster utilization and minimizing expenses.

In this article, you will learn how to use Kubernetes to manage resources effectively and efficiently. You will learn about the following topics:

Resource requests and limits: how to specify the amount of CPU and memory that each container needs and the maximum amount that each container can use.
Resource types: how to choose the appropriate type of resource for your workload, such as CPU, memory, storage, network, or ephemeral.
Resource allocation: how Kubernetes allocates resources to pods and containers based on their requests and limits, and the available capacity of the nodes.
Resource monitoring: how to monitor the resource usage and performance of your pods and containers using Kubernetes tools and metrics.

By the end of this article, you will have a better understanding of how Kubernetes works and how to use its features to manage resources well.

Resource requests and limits

One of the key aspects of resource management in Kubernetes is to specify the amount of CPU and memory that each container needs and the maximum amount that each container can use. These are called resource requests and resource limits, respectively.

Resource requests and limits serve two main purposes:

They help Kubernetes to schedule pods to nodes that have enough resources to meet their needs. Pods with resource requests are guaranteed to get the amount of resources they request, while pods without resource requests may get terminated if the node runs out of resources.
They help Kubernetes to ensure resource isolation and prevent resource starvation. Pods with resource limits are restricted to use the amount of resources they specify, while pods without resource limits may consume more resources than they need and affect other pods on the same node.

By setting resource requests and limits, you can improve the scheduling, stability, and quality of service of your pods and containers.

There are two ways to set resource requests and limits for pods and containers: using kubectl or using YAML files.

Using kubectl

Use the kubectl command-line tool to set resource requests and limits for existing pods and containers. For example, to set a resource request of 0.5 CPU and 256 MiB of memory, and a resource limit of 1 CPU and 512 MiB of memory for a container named web in a pod named my-pod, use the following command:

$ kubectl set resources pod my-pod -c=web --requests='cpu=0.5,memory=256Mi' --limits='cpu=1,memory=512Mi'

Use the kubectl edit command to edit the resource requests and limits of a pod or a container interactively. For example, to edit the resource requests and limits of a pod named my-pod, use the following command:

$ kubectl edit pod my-pod

This will open a text editor where you can modify the resource requests and limits of the pod or its containers.

Using YAML files

You can also use YAML files to set resource requests and limits for pods and containers when you create them. For example, to create a pod named my-pod with two containers named web and db, each with a resource request of 0.5 CPU and 256 MiB of memory, and a resource limit of 1 CPU and 512 MiB of memory, use the following YAML file:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: web
    image: nginx
    resources:
      requests:
        cpu: 0.5
        memory: 256Mi
      limits:
        cpu: 1
        memory: 512Mi
  - name: db
    image: mysql
    resources:
      requests:
        cpu: 0.5
        memory: 256Mi
      limits:
        cpu: 1
        memory: 512Mi

Use the kubectl create command to create the pod from the YAML file:

$ kubectl create -f my-pod.yaml

Use the kubectl apply command to update the resource requests and limits of an existing pod from a YAML file:

$ kubectl apply -f my-pod.yaml

Setting resource requests and limits is a necessary part of resource management in Kubernetes. It helps you to optimize the performance, availability, and cost of your pods and containers, as well as the stability and security of your cluster. You should set resource requests and limits that match your workload type, your desired QoS class, and your cluster capacity. You should also monitor your resource usage and adjust your resource requests and limits accordingly. You should also use tools such as the Vertical Pod Autoscaler, the Horizontal Pod Autoscaler, the Cluster Autoscaler, and the Node Auto Provisioning to automate and simplify your resource management tasks.

Resource types

CPU

CPU, measured in cores or millicores, signifies the processing power available to a container. One core equals one physical CPU core, while a millicore is a thousandth of a core (e.g., 0.5 CPU means half a core, and 500m CPU means 500 millicores).

Containers receive CPU allocation based on their resource requests and limits. Those with higher CPU requests take precedence and are scheduled to nodes with sufficient CPU capacity. Containers with lower CPU requests might be scheduled to nodes with less CPU capacity but risk throttling or eviction if the node exhausts CPU resources. Containers with CPU limits adhere strictly to the specified amount, potentially facing throttling if they exceed these limits.

CPU consumption by containers depends on their workload and the node’s available CPU resources. Containers may use more or less CPU than requested, influenced by CPU demand and the shares of other containers. While containers can utilize up to their CPU limits, they cannot surpass the node’s CPU capacity.

To define CPU specifications for pods and containers, utilize the cpu field in the resources section of the YAML file. For instance, creating a pod named my-pod with a container named web requesting 0.5 CPU and limiting to 1 CPU is accomplished through the following YAML file:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: web
    image: nginx
    resources:
      requests:
        cpu: 0.5
      limits:
        cpu: 1

Use the kubectl set resources command to set the CPU type for existing pods and containers. For example, to set a CPU request of 0.5 and a CPU limit of 1 for a container named web in a pod named my-pod, Use the following command:

$ kubectl set resources pod my-pod -c=web --requests='cpu=0.5' --limits='cpu=1'

Memory

Memory, measured in bytes or power-of-two units like KiB, MiB, GiB, etc., represents the RAM available to a container (e.g., 256 MiB equals 256 megabytes, and 1 GiB equals 1 gibibyte).

Containers receive memory allocation based on their resource requests and limits. Those with higher memory requests gain priority and are scheduled to nodes with sufficient memory capacity. Containers with lower memory requests may be scheduled to nodes with less memory capacity but face termination or eviction if the node depletes memory resources. Containers adhering to memory limits are confined to the specified amount and may be terminated if they surpass these limits.

Memory consumption by containers is influenced by their workload and the node’s available memory resources. Containers may use more or less memory than requested, depending on the memory demand and pressure from other containers on the node. Containers can utilize memory up to their limits but cannot exceed the node’s memory capacity.

To designate the memory type for pods and containers, employ the memory field in the resources section of the YAML file. For instance, creating a pod named my-pod with a container named web requesting 256 MiB of memory and limiting to 512 MiB of memory is achieved through the following YAML file:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: web
    image: nginx
    resources:
      requests:
        memory: 256Mi
      limits:
        memory: 512Mi

Use the kubectl set resources command to set the memory type for existing pods and containers. For example, to set a memory request of 256 MiB and a memory limit of 512 MiB for a container named web in a pod named my-pod, you can use the following command:

$ kubectl set resources pod my-pod -c=web --requests='memory=256Mi' --limits='memory=512Mi'

Storage

Storage, measured in bytes or power-of-two units like KiB, MiB, GiB, etc., denotes the available disk space for a container (e.g., 1 GiB equals 1 gibibyte, and 10 GiB equals 10 gibibytes).

Containers get storage allocation based on volume and persistent volume claim specifications. Volumes, attachable to pods and containers, and persistent volume claims, requests for storage resources bound to persistent volumes, are crucial. Persistent volumes, provisioned by the cluster administrator or dynamically by the storage class, facilitate this process. For instance, a persistent volume claim requesting 10 GiB can be bound to a persistent volume providing the same amount.

Storage consumption by containers depends on workload and node storage availability. Containers may use more or less storage than requested, influenced by demand and node capacity. Containers can utilize storage up to their limits but cannot exceed the node’s capacity.

To specify storage type for pods and containers, the YAML file can include the volumeMounts and volumes fields. For example, creating a pod named my-pod with a container named web that mounts a volume named my-volume to the /data directory is achieved through this YAML file:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: web
    image: nginx
    volumeMounts:
    - name: my-volume
      mountPath: /data
  volumes:
  - name: my-volume
    persistentVolumeClaim:
      claimName: my-pvc

Use the kubectl create command to create a persistent volume claim that requests 10 GiB of storage and a storage class that dynamically provisions a persistent volume that provides 10 GiB of storage. For example, to create a persistent volume claim named my-pvc and a storage class named my-sc, you can use the following commands:

$ kubectl create -f my-pvc.yaml
$ kubectl create -f my-sc.yaml

Network

Network, measured in bits per second or power-of-ten units like Kbps, Mbps, Gbps, etc., represents the bandwidth available to a container (e.g., 1 Mbps equals 1 megabit per second, and 10 Gbps equals 10 gigabits per second).

Containers receive network allocation based on network policy and service specifications. Network policy, defining rules for pod and container communication with each other and external entities, and service, a logical abstraction for pods and containers providing a network service, are pivotal. For instance, creating a network policy allowing specific pods to access a service exposing a web application is achievable.

Network consumption by containers relies on workload and node network resource availability. Containers may use more or less network than requested, influenced by demand and node network congestion. Containers can utilize networks up to their limits but cannot surpass the node’s network capacity.

To specify the network type for pods and containers, the YAML file can include the networkPolicy and service fields. For example, creating a network policy named my-np that permits only pods with the label app=web to access a service named my-svc exposing port 80 is done through this YAML file:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: my-np
spec:
  podSelector:
    matchLabels:
      app: web
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: web
    ports:
    - protocol: TCP
      port: 80

Use the kubectl create command to create a service named my-svc that exposes port 80 and selects pods with the label app=web. For example, to create a service named my-svc, use the following command:

$ kubectl create service clusterip my-svc --tcp=80:80 --selector=app=web

Ephemeral

Ephemeral is the amount of temporary storage that a container can use. Ephemeral storage is measured in bytes or power-of-two units, such as KiB, MiB, GiB, etc. Ephemeral storage is used for storing data that does not need to persist across container restarts, such as logs, caches, or temporary files. Ephemeral storage is allocated from the node’s local disk where the container is running. Each container has a limit and a request for ephemeral storage, which are used by the scheduler to place the container on a suitable node and to avoid overcommitting the node’s resources. Ephemeral storage can be configured using the ephemeral-storage resource name in the container spec.

Resource allocation

The scheduler

The scheduler determines pod-to-node assignments, employing algorithms and rules based on pod resource requests, node capacity, and factors like node affinity, pod priority, taints, and tolerations.

Node affinity designates preferred or required nodes for pod deployment, ensuring, for instance, placement on a node with a specific label like zone=us-east-1. Pod priority establishes relative pod importance; higher-priority pods get scheduled first. Taints and tolerations mark nodes and pods with attributes affecting scheduling, ensuring, for example, pod placement on nodes with specific taints like key=value:NoSchedule.

To specify resource allocation, utilize the affinity, priorityClassName, and tolerations fields in the YAML file. For instance, creating a pod named my-pod with a web container requesting 0.5 CPU, 256 MiB of memory, node affinity for nodes labeled zone=us-east-1, a pod priority of high-priority, and a toleration for nodes with the taint key=value:NoSchedule is achieved through this YAML file:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: web
    image: nginx
    resources:
      requests:
        cpu: 0.5
        memory: 256Mi
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: zone
            operator: In
            values:
            - us-east-1
  priorityClassName: high-priority
  tolerations:
  - key: key
    operator: Equal
    value: value
    effect: NoSchedule

You can also use the kubectl create command to create the pod from the YAML file:

$ kubectl create -f my-pod.yaml

The kubelet

The kubelet ensures resource isolation and prevents starvation on the node by using pod resource requests and limits. It employs cgroups to establish a hierarchy, restricting CPU, memory, and other container resources. Utilizing eviction, the kubelet reclaims resources during node resource pressure, guided by thresholds and signals like memory.available, nodefs.available, imagefs.available.

Cluster administrators configure kubelet’s resource allocation settings through command-line flags or a configuration file. For instance, setting a memory eviction threshold of 100 MiB, a nodefs eviction threshold of 10%, and an imagefs eviction threshold of 15% can be achieved with the following command-line flags:

kubelet --eviction-hard=memory.available<100Mi,nodefs.available<10%,imagefs.available<15%

or the following configuration file:

evictionHard:
  memory.available: 100Mi
  nodefs.available: 10%
  imagefs.available: 15%

The container runtime

The container runtime executes containers on the node, managing their lifecycle, resources, and interactions with the kubelet and scheduler. It creates, starts, stops, and deletes containers, communicating commands, status, and resource usage.

Using pod resource requests and limits, the container runtime allocates and limits CPU, memory, and other resources for each container. It adjusts resource shares and priorities based on these specifications and employs Quality of Service (QoS) to categorize pods into classes like Guaranteed, Burstable, and BestEffort. QoS guides resource allocation, limitation, and handling of resource contention and throttling on the node.

Cluster administrators configure the container runtime’s resource allocation settings using command-line flags or a configuration file. For instance, setting a CPU period of 100 microseconds and a CPU quota of 200 microseconds for a container with a 2-CPU limit involves using specific command-line flags.

docker run --cpu-period=100 --cpu-quota=200 ...

or the following configuration file:

cpuPeriod: 100
cpuQuota: 200

The admission controller

The admission controller validates and adjusts pod and container specs before creation and scheduling, enforcing resource allocation policies like quotas, limits, and default requests.

Quotas and limits set maximum resource usage for pods, containers, or groups, ensuring compliance with CPU, memory, or other resource constraints. Default requests and limits establish preset resource amounts for pods or containers in the absence of user specifications, guaranteeing minimum or maximum CPU, memory, or other resource levels.

Resource allocation settings for pods and containers can be specified using ResourceQuota, LimitRange, and PodPreset objects in the YAML file. For instance, creating a resource quota named my-rq to restrict total CPU and memory usage for all pods and containers in namespace my-ns is achieved through a YAML file.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: my-rq
  namespace: my-ns
spec:
  hard:
    requests.cpu: 4
    requests.memory: 8Gi
    limits.cpu: 8
    limits.memory: 16Gi

Use the kubectl create command to create the resource quota from the YAML file:

$ kubectl create -f my-rq.yaml

Resource monitoring

Resource monitoring involves different tools and metrics that Kubernetes provides to monitor resource usage and performance, such as kubectl, dashboard, node, pod, container, and custom metrics. Each of these tools and metrics has a different role and responsibility in resource monitoring.

kubectl

kubectl is the command-line tool for Kubernetes, enabling API server interaction and operations like pod and container management, scaling, and metrics visualization. Commands like kubectl top, kubectl describe, or kubectl get offer access to resource metrics.

kubectl top displays current CPU and memory usage for pods and containers. To view usage in a namespace named my-ns, use the command:

$ kubectl top pod --containers --namespace=my-ns

Output:

NAME            CONTAINER  CPU(cores)   MEMORY(bytes)
my-pod          web        0m           10Mi
my-pod          db         1m           20Mi
my-other-pod    app        2m           30Mi

kubectl describe is a command that displays the detailed information and status of your pods and containers, including their resource requests, limits, and QoS class. For example, to display the information and status of a pod named my-pod in a namespace named my-ns, you can use the following command:

$ kubectl describe pod my-pod --namespace=my-ns

Output

Name:         my-pod
Namespace:    my-ns
Priority:     0
Node:         node-1/10.0.0.1
Start Time:   Mon, 22 Jan 2024 07:07:26 GMT+00:00
Labels:       app=web
Annotations:  <none>
Status:       Running
IP:           10.0.0.2
IPs:
  IP:           10.0.0.2
Containers:
  web:
    Container ID:   docker://1234567890abcdef
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:abcdef1234567890
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 22 Jan 2024 07:07:28 GMT+00:00
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        0.5
      memory:     256Mi
    Limits:
      cpu:        1
      memory:     512Mi
    Environment:  <none>
    Mounts:
      /data from my-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xyz (ro)
    QoS Class:       Burstable
  db:
    Container ID:   docker://abcdef1234567890
    Image:          mysql
    Image ID:       docker-pullable://mysql@sha256:1234567890abcdef
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 22 Jan 2024 07:07:29 GMT+00:00
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        0.5
      memory:     256Mi
    Limits:
      cpu:        1
      memory:     512Mi
    Environment:  <none>
    Mounts:
      /data from my-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xyz (ro)
    QoS Class:       Burstable
Volumes:
  my-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  my-pvc
    ReadOnly:   false
  default-token-xyz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-xyz
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

kubectl get is a command that displays the basic information and status of your pods and containers, such as their name, age, ready state, and restart count. For example, to display the information and status of all pods and containers in a namespace named my-ns, use the following command:

$ kubectl get pod --namespace=my-ns

Output:

NAME          READY   STATUS    RESTARTS   AGE
my-pod        2/2     Running   0          10m
my-other-pod  1/1     Running   0          5m

Dashboard

Dashboard is a web-based interface for interacting with the Kubernetes API server, facilitating operations like pod and container management, scaling, and metrics visualization. You can access Dashboard via a web browser using the dashboard service URL (e.g., https://<master-ip>:<port>/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/). Alternatively, the kubectl proxy command creates a proxy server for accessing Dashboard through the Kubernetes API server. For example, use kubectl proxy with the following command to access the dashboard:

$ kubectl proxy

Output:

Starting to serve on 127.0.0.1:8001

Then, launch a web browser and navigate to the URL http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/.

Dashboard displays the resource metrics for your pods and containers in various ways, such as using graphs, charts, or tables:

Node, Pod, Container, and Custom Metrics

Kubernetes collects and exposes metrics, categorized into node, pod, container, and custom types, for resource monitoring. These numerical values reflect resource usage and performance, covering CPU, memory, disk, network, etc., and can be sourced from the node, pod, container, or the application.

Node Metrics

Source: Gathered by the kubelet.
Exposed Through: Metrics-server.
Examples: Node CPU and memory.
Access and Visualization: Tools like kubectl, dashboard, or Prometheus.

Pod Metrics

Source: Collected by the kubelet.
Exposed Through: Metrics-server.
Examples: Reflect the entire pod’s resource usage.
Access and Visualization: Tools like kubectl, dashboard, or Prometheus.

Container Metrics

Source: Collected by the kubelet.
Exposed Through: Metrics-server.
Access and Visualization: Tools like kubectl, dashboard, or Prometheus.

Custom Metrics

Reflecting: Application or service performance.
Examples: Requests per second and latency.
Collection and Exposure: By the application or service, leveraging the custom metrics API, extending the Kubernetes API.
Access and Visualization: Tools like Prometheus, Grafana, or Heapster.

Conclusion

In this article, you learned how to use Kubernetes to manage resources effectively and efficiently. You learned about the following topics:

Resource requests and limits: how to specify the amount of CPU and memory that each container needs and the maximum amount that each container can use.
Resource types: how to choose the appropriate type of resource for your workload, such as CPU, memory, storage, network, or ephemeral.
Resource allocation: how Kubernetes allocates resources to pods and containers based on their requests and limits, and the available capacity of the nodes.
Resource monitoring: how to monitor the resource usage and performance of your pods and containers using Kubernetes tools and metrics.

By applying the concepts and techniques from this article, you can optimize the performance, availability, and cost of your pods and containers, as well as the stability and security of your cluster.

Resources

Collabnix.com
Official Kubernetes documentation: The official documentation for Kubernetes.
Microsoft Learn module: A free online learning module that teaches you how to optimize resource utilization in Kubernetes using Azure Kubernetes Service (AKS).
How I learned to love Kubernetes with resource and cost optimization – Kubernetes Optimization

Effective Resource Management in Kubernetes: Requests, Limits, Allocation, and Monitoring

Resource requests and limits

Using kubectl

Using YAML files

Resource types

CPU

Memory

Storage

Network

Ephemeral

Resource allocation

The scheduler

The kubelet

The container runtime

The admission controller

Resource monitoring

kubectl

Dashboard

Node, Pod, Container, and Custom Metrics

Node Metrics

Pod Metrics

Container Metrics

Custom Metrics

Conclusion

Resources

Before and After MCP: The Evolution of AI Tool…

Which Model to Choose with Docker Model Runner?

How to Build Your First MCP Server in Python