Kubernetes is an open-source platform that automates the deployment, scaling, and management of containerized applications and services. Kubernetes is designed to run on any infrastructure, from physical servers to cloud providers, and to support a variety of workloads, such as web applications, microservices, batch processing, machine learning, and more.
Kubernetes follows principles of declarative configuration, desired state management, and modularity. Users define their application state in configuration files, and Kubernetes ensures the actual state matches the desired one. Its modular architecture allows component flexibility.
Key Kubernetes components include nodes (compute machines running applications), pods (smallest deployment units with shared containers), containers (isolated environments for application code), and the control plane (manages cluster state, communicates with nodes and pods).
The objective of this article is to provide a comprehensive and clear understanding of the Kubernetes architecture, and how it enables scalable, reliable, and efficient distributed systems.
Cluster Architecture
A Kubernetes cluster consists of two parts: the control plane and the compute machines, or nodes. The control plane is responsible for managing the cluster state and configuration, and the nodes are the machines that run the applications and services.
Control Plane
The control plane is the brain of the cluster. It maintains a record of all the objects in the cluster, such as pods, services, deployments, and secrets, and ensures that the actual state of the cluster matches the desired state specified by the user. The control plane also provides an interface for the user to interact with the cluster, either through the Kubernetes API, the command-line tool kubectl
, or the web dashboard.
The control plane consists of four core components:
- kube-apiserver: This is the front-end of the control plane. It exposes the Kubernetes API, which is the primary way of communicating with the cluster. The kube-apiserver validates and processes the requests from the user, and updates the cluster state accordingly.
- kube-scheduler: This is the component that decides which node should run each pod. The kube-scheduler watches for new pods that have no node assigned, and selects the best node for them based on various factors, such as resource availability, affinity, anti-affinity, and taints and tolerations.
- kube-controller-manager: This is the component that runs various controllers, which are background processes that handle routine tasks in the cluster. For example, the replication controller ensures that the number of pods in a replication group matches the desired number, the service controller creates load balancers for services, and the node controller monitors the health of the nodes.
- etcd: This is the distributed key-value store that serves as the single source of truth for the cluster. It stores the configuration data and the state of the cluster, and is accessed by the other components of the control plane. Etcd is designed to be consistent, secure, and reliable.
Nodes
The nodes are the machines that run the applications and services in the cluster. Each node has a Linux operating system and a set of components that allow it to communicate with the control plane and to run pods and containers.
The node components are:
- kubelet: This agent runs on each node, registers the node with the cluster, reports node status and resources to the control plane, and follows instructions from the control plane to create or remove pods and containers.
- kube-proxy: This is the network proxy that runs on each node. It maintains the network rules and enables the communication between the pods and the services, both within and outside the cluster. It also performs load balancing for the services.
- container runtime: This software runs containers. Kubernetes supports different container runtimes like Docker, containerd, and CRI-O.
- cAdvisor: This component gathers and displays metrics and performance data of nodes, pods, and containers. Integrated with kubelet and kube-proxy, it offers details like CPU, memory, disk, and network usage.
An example of a Kubernetes cluster is the one that you can create using Minikube, a tool that lets you run a single-node cluster on your local machine. Minikube creates a virtual machine that acts as both the control plane and the node, and installs the necessary components for you. You can use Minikube to learn and experiment with Kubernetes, and to test your applications and services.
Node Architecture
Each node in a Kubernetes cluster runs a Linux operating system and a set of components that allow it to communicate with the control plane and to run pods and containers. In this section, we will describe the Linux environment of each node, and how it runs pods and containers. We will also explain how the node components, such as kubelet, kube-proxy, container runtime, and cAdvisor, interact with the control plane and the pods. Finally, we will discuss the resource management and isolation mechanisms of the nodes, such as cgroups, namespaces, and resource quotas.
Linux Environment
The Linux environment of each node consists of the following elements:
- Kernel: The core of the operating system manages hardware like CPU, memory, disk, and network. It provides essential features for running pods and containers, such as namespaces, cgroups, iptables, and seccomp.
- Namespaces: Kernel features creating isolated environments for processes. Each namespace offers a distinct view of system resources like network, filesystem, process IDs, user IDs, and hostnames. Kubernetes uses namespaces to isolate pods and containers from each other and the host system.
- Cgroups: Kernel features limiting and monitoring process resource usage. Each cgroup has parameters defining maximum and minimum CPU, memory, disk, and network consumption. Kubernetes utilizes cgroups to enforce resource limits and requests for pods and containers.
- Iptables: Kernel features manipulating network packets. Each iptable has rules determining packet filtering, forwarding, or modification. Kubernetes employs iptables for implementing network policies and service routing for pods and containers.
- Seccomp: Kernel feature restricting allowed system calls for processes. Each seccomp profile lists permitted and denied system calls, along with actions for disallowed calls. Kubernetes uses seccomp to enhance security and isolation for pods and containers.
Node Components
The node components are the software that runs on each node and enables communication and coordination with the control plane and the pods. The node components are:
- Kubelet: Node agent registering the node with the cluster, reporting status/resources to the control plane. It follows control plane instructions, managing pod and container creation/destruction. Interacts with Linux through the Container Runtime Interface (CRI) and Pod Lifecycle Event Generator (PLEG) for pod/container events.
- Kube-proxy: Node network proxy maintaining rules, enabling pod-service communication within/outside the cluster. Performs service load balancing. Interacts with Linux using iptables, ipvs, or eBPF modes for network rules. Communicates with the control plane via kube-apiserver, exposing Kubernetes API and service information.
- Container runtime: Software running containers, with Kubernetes supporting Docker, containerd, CRI-O, and rkt. Interacts with Linux through the libcontainer library, providing low-level functions for container management. Communicates with kubelet through CRI, a standard API abstracting container runtime communication.
- cAdvisor: Component collecting and exposing metrics/performance data of nodes, pods, and containers. Integrated with kubelet and kube-proxy, offering CPU, memory, disk, and network usage information. Interacts with Linux through proc and sys filesystems, obtaining process and system details. Communicates with the control plane through metrics-server, serving metrics data to components like Horizontal Pod Autoscaler (HPA) and Kubernetes Dashboard.
Resource Management and Isolation
The resource management and isolation mechanisms of the nodes are the features and functionalities that ensure the efficient and secure operation of the pods and containers. The resource management and isolation mechanisms are:
- Resource requests and limits: Parameters defining minimum and maximum CPU/memory for pods or containers. User-specified in specifications, enforced by kubelet and cgroups. Used for:
- Scheduling: Determining the best node for each pod with kube-scheduler, considering available resources and affinity rules.
- Quality of Service (QoS): Assigning QoS class by kubelet based on resource requests and limits:
- Guaranteed: Equal CPU and memory request/limit; pod assured requested resources without eviction or throttling.
- Burstable: Unequal request/limit or only one set; pod may burst above request but could be throttled or evicted under node pressure.
- BestEffort: No request or limit; lowest priority, subject to throttling or eviction.
- Isolation: Enforcing resource limits via cgroups to prevent interference between pods or containers.
- Taints and tolerations: Features allowing nodes to repel/attract pods based on node attributes. Taints are key-value pairs indicating node properties (e.g., dedicated, tainted, under maintenance). Tolerations are key-value pairs in pods indicating tolerance for specific taints. Used for:
- Scheduling: kube-scheduler filters nodes based on pod’s ability to tolerate node taints. Pod can only be scheduled on a node if it has a matching toleration for every node taint.
- Eviction: kube-controller-manager uses taints and tolerations to evict pods from nodes when there’s a change in node taints. Pod may be evicted if it lacks a matching toleration for a new node taint.
- Pod security policies: Policies defining security settings and restrictions for pods/containers, set by cluster administrator, enforced by kube-apiserver and kubelet. Used for:
- Validation: kube-apiserver uses policies to validate pod/container specifications, rejecting requests that violate policies.
- Mutation: kube-apiserver applies default/required values for security settings, mutating pod/container specifications based on policies.
- Admission: kubelet uses policies to admit or deny pod/container creation, enforcing security settings and restrictions.
An example of a node in a Kubernetes cluster is the one that you can inspect using the kubectl describe node
command, which shows the details of the node, such as the status, capacity, allocatable, labels, annotations, taints, conditions, addresses, roles, and pods. You can use this command to monitor and troubleshoot the node’s performance and health, and to verify the resource requests and limits, taints and tolerations, and pod security policies of the pods and containers.
Pod and Service Architecture
Pods and services are the fundamental concepts and components of Kubernetes. Pods are the smallest and simplest unit of deployment in Kubernetes, and services are the abstraction that provide network access and load balancing for pods. In this section, we will describe the pod and service architecture, and how they work together to enable distributed applications and services in Kubernetes.
Pods
A pod is a group of one or more containers sharing the same network, storage, and lifecycle. It’s the fundamental deployment unit in Kubernetes, representing a single instance of an application or service. A pod can run a single container or multiple containers working together.
The control plane creates a pod based on user-provided specifications, defining its desired state, including container image, environment variables, ports, volumes, and resource requests/limits. The control plane assigns a unique name and an IP address, scheduling it on a node meeting requirements.
When a pod is no longer needed or fails to meet the desired state, the control plane destroys it. Monitoring health and status, the control plane takes actions like restarting, rescheduling, or scaling based on specifications and cluster state.
Pods communicate using their IP address and port, or by name and namespace if in the same cluster with an associated service. They can also communicate with the control plane via the Kubernetes API exposed by the kube-apiserver component.
An example of a pod in a Kubernetes cluster is the one that you can create using the kubectl run
command, which creates a pod that runs a single container based on the specified image. You can use this command to quickly test and debug your applications and services in a pod. For example, the following command creates a pod named nginx-pod
that runs the nginx
image:
kubectl run nginx-pod --image=nginx
You can inspect the pod using the kubectl describe pod
command, which shows the details of the pod, such as the name, namespace, IP address, status, events, containers, and volumes. You can also interact with the pod using the kubectl exec
command, which executes a command in the pod’s container, or the kubectl logs
command, which shows the logs of the pod’s container.
Conclusion
In this article, we learned about Kubernetes architecture, understanding how it facilitates scalable, reliable, and efficient distributed systems. Key topics covered include:
- Cluster Architecture: Exploring the collaboration between the control plane and nodes to form a Kubernetes cluster. We delved into the roles and functions of core control plane components.
- Node Architecture: Understanding how nodes execute pods and containers. We examined the roles and functions of node components like kubelet, kube-proxy, container runtime, and cAdvisor.
- Pod and Service Architecture: Learning about pod scheduling, creation, and destruction orchestrated by the control plane. We also discussed how pods communicate internally and externally, along with the network access and load balancing features provided by services. Service types like ClusterIP, NodePort, LoadBalancer, and Ingress were explored.
We observed that Kubernetes architecture adheres to the principles of declarative configuration, desired state management, and modularity. Users can define the desired state of their applications and services through configuration files, and Kubernetes ensures the actual state aligns with these specifications. The modular and extensible architecture allows for the integration and replacement of different components as necessary.
Kubernetes architecture offers many benefits for running distributed applications and services at scale, such as:
- Scalability: Kubernetes can scale up or down the number of pods and nodes based on the demand and the available resources, and can handle high availability and load balancing for the services.
- Reliability: Kubernetes can monitor and manage the health and status of the pods and nodes, and can perform actions such as restarting, rescheduling, or evicting the pods based on the pod’s specification and the cluster’s state.
- Efficiency: Kubernetes can optimize the resource usage and allocation of the pods and nodes, and can enforce the resource limits and requests, taints and tolerations, and pod security policies of the pods and containers.
- Portability: Kubernetes can run on any infrastructure, from physical servers to cloud providers, and can support a variety of workloads, such as web applications, microservices, batch processing, machine learning, and more.