Join our Discord Server
Abraham Dahunsi Web Developer 🌐 | Technical Writer ✍️| DevOps Enthusiast👨‍💻 | Python🐍 |

Kubernetes Node Not Ready Error and How to Fix It

3 min read

The “Node Not Ready” error is a common error faced by Kubernetes operators. When a node enters this state, it means that the node is unable to accept new pods due to underlying issues. In this troubleshooting guide, you will understand the causes of this error, and how does it impact the pods already running on the affected node?

Understanding Node States

Kubernetes nodes can exist in several states, with each one indicating the node’s operational status. Here are the four primary states:

  1. Ready: A node in the “Ready” state is fully operational and capable of running pods. It meets all the necessary conditions (resources, network connectivity, etc.) to host workloads.
  2. NotReady: When a node transitions to the “NotReady” state, it indicates that the node is experiencing issues preventing it from accepting new pods. Existing pods on the node continue to run, but no new ones can be scheduled.
  3. SchedulingDisabled: This state occurs when the node is intentionally marked as unschedulable. It won’t accept any new pods, even if it’s otherwise healthy. Administrators might set this state during maintenance or troubleshooting.
  4. Unknown: The “Unknown” state typically arises when the Kubernetes control plane loses communication with the node. It lacks information about the node’s status, making it impossible to determine whether it’s ready or not.

Impact on Pods

A “NotReady” node affects pod scheduling. While existing pods on the node continue to operate, no new pods can be assigned to it. Pods intended to run on a “NotReady” node stay in a pending state until the node returns to “Ready” status or they are rescheduled to another node.

Now let’s look into the common causes of the Node NotReady error.

Common Causes of Node Not Ready Error

  1. Lack of System Resources:
    • Memory: Insufficient memory can lead to a node being marked as “NotReady.” Pods may fail to start due to memory constraints.
    • Disk Space: Running out of disk space impacts the node’s ability to function properly.
    • Excessive Processes: Too many processes competing for resources can render the node non-operational.
  2. kubelet Issues:
    • kubelet Crashes: A crash or stoppage of the kubelet process causes the node to become “NotReady.”
    • Misconfiguration: Errors in the kubelet configuration can stop the node from achieving the “Ready” state.
  3. Network-Related Problems:
    • Network Partition: Isolation from the cluster network can cause a node to be marked as “NotReady.”
    • DNS Resolution Issues: Nodes unable to resolve DNS names may remain in the “NotReady” state.
  4. Configuration Issues:
    • CNI Plugin Misconfiguration: Problems with Container Network Interface (CNI) plugins can impact node readiness.
    • Node Labels and Taints: Incorrect labels or taints may prevent pod scheduling.

Diagnosing and Troubleshooting

There are different ways that you can use to troubleshoot the “Node NoteReady” error. Some of your options include the following:

Use kubectl describe node

Step 1: Run kubectl describe node <node-name> to get detailed information about the node’s status.

kubectl describe node <node-name>

Step 2: Look for conditions like MemoryPressure, DiskPressure, or PIDPressure. These indicate resource shortages that might cause the node to be “NotReady.”

Investigate kubelet logs

Step 1: Check the kubelet logs (journalctl -u kubelet or /var/log/kubelet.log) for any errors or warnings.

sudo journalctl -u kubelet

Step 2: Look for clues related to connectivity issues, configuration problems, or component failures.

Verify network connectivity

Ensure that the node can communicate with the control plane and other nodes.

Step 1: Check Node Communication with Control Plane

kubectl get nodes

Step 2: Check Node Communication with Control Plane Using Ping

Ensure nodes and control plane are reachable via ping. Successful ping replies indicate good network connectivity.

ping -c 4 control-plane

Step 3: Check Node Communication with Another Node

ping -c 4 node-456

Check DNS resolution, firewall rules, and network routes.

Option 1: Check DNS Resolution

Verify service names resolve correctly using nslookup. Proper resolution means DNS is functioning.

nslookup kubernetes.default.svc.cluster.local

Option 2: Check Firewall Rules

Confirm correct routes are in place using ip route. Correct routes ensure network traffic flows properly between nodes and control plane.

sudo ufw status

Resolution Strategies

Address System Resource Issues

Option 1: Shut Down Non-Kubernetes Processes

Identify any non-essential processes consuming resources on the node. Shut them down or move them to other nodes.

Step 1: List Running Processes and Their Resource Usage

top -b -n 1 | head -n 20
    

Step 2: Identify and Shut Down Non-Essential Processes

sudo systemctl stop apache2
    sudo systemctl stop mysql
    

Option 2: Run Malware Scans

Ensure the node is free from malware or malicious processes that might impact its performance.

Install and Run ClamAV

sudo apt-get update
    sudo apt-get install clamav
    sudo freshclam
    sudo clamscan -r / --log=/var/log/clamav/scan.log\
    

Option 3: Upgrade the Node

Consider upgrading the node’s hardware (CPU, memory, storage) if resource constraints persist.

Restart Components

Option 1: kubelet

Restart the kubelet service using sudo systemctl restart kubelet.

Resolution Strategies

Restart Components

Option 1: kubelet

Restart the kubelet service using sudo systemctl restart kubelet.

Option 2: kube-proxy

Similarly, restart kube-proxy using sudo systemctl restart kube-proxy.

Option 3: Docker

If you’re using Docker as the container runtime, restart it as well: sudo systemctl restart docker.

Consider Using a Higher Service Tier

If you’re using managed Kubernetes services (like AKS, EKS, or GKE), consider upgrading to a higher service tier. This often provides better performance, reliability, and resource availability.

Resources

Have Queries? Join https://launchpass.com/collabnix

Abraham Dahunsi Web Developer 🌐 | Technical Writer ✍️| DevOps Enthusiast👨‍💻 | Python🐍 |
Join our Discord Server
Index