Mastering Kubernetes Scaling: From Manual Adjustments to Intelligent Automation in just 8 steps.

Table of Contents

Scaling applications in Kubernetes is essential for maintaining optimal performance, ensuring high availability, and managing resource utilization effectively. Whether you’re handling fluctuating traffic or optimizing costs, understanding how to scale your Kubernetes deployments is crucial. In this blog, we’ll delve into the intricacies of scaling in Kubernetes, explore manual and automated scaling techniques using kubectl and KEDA, and highlight how StormForge can elevate your scaling strategy with advanced optimization features.

Understanding Kubernetes Scaling

Scaling in Kubernetes involves adjusting the number of pod replicas or modifying the resource allocations of pods to align with your application’s demand. Kubernetes supports two primary types of scaling:

Horizontal Scaling: Increasing or decreasing the number of pod replicas to handle varying loads.
Vertical Scaling: Adjusting the CPU, memory, or other resource requests of existing pods to better fit their workloads.

Proper scaling ensures that your applications remain responsive and efficient, whether you’re experiencing high traffic spikes or aiming to reduce costs during low-usage periods.

Manual Horizontal Scaling with kubectl

The kubectl command-line tool provides a straightforward way to manually scale your Kubernetes resources. Using the scale subcommand, you can adjust the number of replicas for a deployment or StatefulSet. Here’s the basic syntax:


kubectl scale --replicas=<number_of_replicas> <resource_type>/<resource_name> [options]

Examples: Scale a Deployment to 3 Replicas:


kubectl scale --replicas=3 deployment/nginx-deployment

Scale Deployments with a Specific Label to 2 Replicas:


kubectl scale --replicas=2 deployment -l app=nginx

Scale All Deployments and StatefulSets to 2 Replicas:


kubectl scale statefulset,deployment --all --replicas=2

Conditionally Scale a Deployment:


kubectl scale --current-replicas=2 --replicas=3 deployment/mysql

Using the –current-replicas option ensures that the scaling action only occurs if the current number of replicas matches the specified value, preventing race conditions.

Scaling to Zero: When and How

There are scenarios where you might want to scale a deployment down to zero replicas, effectively stopping all running pods. This can be useful for:

Cost Savings: Eliminating resource usage when the application isn’t needed, especially in non-production environments.
Maintenance: Simplifying updates or performing blue-green deployments without affecting live traffic.
Testing: Ensuring a clean state by scaling down applications during test runs.

Command to Scale a Deployment to Zero:


kubectl scale --replicas=0 deployment/my-app

Consequences of Scaling to Zero:

Application Unavailability: No pods are running, making the application inaccessible.
Service Disruption: Active sessions and connections are terminated.
Impact on Stateful Applications: Ensure that state is preserved externally to avoid data loss.

The Limitation of HPA with Zero Replicas

The Horizontal Pod Autoscaler (HPA) in Kubernetes automatically adjusts the number of replicas based on metrics like CPU utilization. However, HPA struggles when scaling a deployment to zero because no pods are available to provide the necessary metrics. This means HPA cannot scale up from zero on its own, requiring manual intervention to restart the deployment.

Introducing KEDA: Bridging the Scaling Gap

Kubernetes Event-Driven Autoscaling (KEDA) enhances the capabilities of HPA by allowing scaling based on a wide range of external metrics and event sources. Unlike HPA, KEDA can scale deployments down to zero and back up based on triggers such as message queue depth, HTTP requests, or custom metrics.

Key Features of KEDA:

Event-Driven Scaling: React to external events like queue lengths or database queries.
Scale to Zero: Automatically reduce replicas to zero when there are no events to process.
Flexibility: Support for various metrics and integration with services like Azure Monitor, Prometheus, AWS CloudWatch, and Kafka.

Demo: Scaling Down to Zero with KEDA and RabbitMQ

Now, Let’s walk through an example of using KEDA to scale a RabbitMQ consumer deployment based on queue length.

Prerequisites:

A running Kubernetes cluster (e.g., minikube or Kind)
kubectl and Helm installed

Step 1: Install KEDA


helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace

Or using microk8s kubectl:


kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.4.0/keda-2.4.0.yaml

Now we see KEDA pods running and ready within a few minutes

Step 2: Deploy RabbitMQ


helm repo add bitnami https://charts.bitnami.com/bitnami
helm install rabbitmq bitnami/rabbitmq

Step 3: Verify RabbitMQ Installation


kubectl get pods -l app.kubernetes.io/name=rabbitmq

Now the RabbitMQ Deployment Pods is Up and Running 😊

Most times, it takes some minutes before the pod is running and ready.

Sample of RabbitMQ pod logs after successful installation:

Step 4: Deploy a RabbitMQ Consumer Application


apiVersion: apps/v1
kind: Deployment
metadata:
  name: rabbitmq-consumer
  namespace: default
  labels:
    app: rabbitmq-consumer
spec:
  selector:
    matchLabels:
      app: rabbitmq-consumer
  template:
    metadata:
      labels:
        app: rabbitmq-consumer
    spec:
      containers:
        - name: rabbitmq-consumer
          image: ghcr.io/kedacore/rabbitmq-client:v1.0
          env:
            - name: RABBITMQ_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: rabbitmq
                  key: rabbitmq-password
          imagePullPolicy: Always
          command:
            - receive
          args:
            - "amqp://user:${RABBITMQ_PASSWORD}@rabbitmq.default.svc.cluster.local:5672"
" | kubectl apply -f -

Step 5: Access RabbitMQ UI


kubectl port-forward pod/rabbitmq-0 15672:15672

Navigate to http://localhost:15672 and log in with the username user and the retrieved password.

Step 6: Configure KEDA ScaledObject

Now, we create , define and apply a KEDA ScaledObject to automatically scale the worker/consumer deployment based on the RabbitMQ queue length. For Emphasis, A KEDA ScaledObject is a Custom Resource definition that defines the triggers and scaling behaviors used by KEDA to scale Deployment, StatefulSet and Custom Resource target resources.


echo "
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-worker-scaledobject
spec:
  scaleTargetRef:
    name: rabbitmq-consumer
  minReplicaCount: 0
  maxReplicaCount: 5
  triggers:
    - type: rabbitmq
      metadata:
        host: \"amqp://user:@rabbitmq.default.svc.cluster.local:5672\"
        queueName: \"hello\"
        queueLength: '1'
" | kubectl apply -f -

Ensure you replace <RABBITMQ_PASSWORD> with the actual password or use a TriggerAuthentication object for secure handling.

Step 7: Verify Scaling to Zero


kubectl get deployment rabbitmq-consumer

With an empty queue, the deployment should scale down to zero replicas.

Step 8: Test Scaling Up Add messages to the RabbitMQ queue via the UI. KEDA will detect the queue length and scale the consumer deployment up accordingly.

Enhancing Resource Management with StormForge

While HPA and KEDA offer robust scaling capabilities, achieving optimal resource utilization can be challenging. Incorrect metric thresholds or inefficient scaling policies can lead to wasted resources or performance bottlenecks. This is where StormForge Optimize Live comes into play.

Before using stormforge view the defaul graph of the CPu Request with No Autoscaling

After using stormforge to do/p>

StormForge Optimize Live Features:

Machine Learning-Based Recommendations: Analyze resource usage patterns to provide optimal settings for HPA and vertical scaling.
Automated Resource Optimization: Dynamically adjust resource requests and limits based on real-time data, ensuring efficient performance and cost management.
Trend Detection: Forecast resource needs by identifying usage patterns, enabling proactive scaling adjustments.
Integration with GitOps: Apply recommendations through GitOps workflows, allowing for controlled and automated deployments.

Benefits:

Improved Performance: Optimize resource allocations to prevent contention and ensure smooth application operations.
Cost Reduction: Minimize waste by accurately sizing resources based on actual usage.
Enhanced Efficiency: Leverage machine learning to continuously refine scaling strategies, adapting to changing workloads seamlessly.

Ready to optimize your Kubernetes scaling? Experience the power of StormForge in their sandbox environment—no email required. Access the Sandbox here

Similar to stormforge is CAST AI – Kubernetes Automation Platform and datadog. Considering CAST AI, You have the option to add extra overhead for CPU and RAM, modify their percentile values, and establish a threshold for automatically applying the recommendations you receive. For instance, adjustments can be applied to a workload only after it surpasses specific thresholds. The CAST AI team is working on enhancing automated workload optimization. In the near future, the platform will introduce seasonal models for resource management, allowing it to predict hourly, daily, weekly, and monthly usage patterns to improve responsiveness and availability.

Exciting updates are on the horizon, so stay tuned!

Conclusion

Effective scaling is pivotal for maintaining the balance between performance and cost in Kubernetes environments. While kubectl scale offers manual control and HPA provides automated scaling based on metrics, KEDA enhances these capabilities by enabling event-driven scaling, including scaling down to zero. To further optimize your scaling strategy, StormForge’s advanced machine learning and automation features ensure that your resources are utilized efficiently, delivering both performance and cost benefits. Embrace these scaling techniques and tools to achieve a resilient, cost-effective, and high-performing Kubernetes infrastructure. Continue exploring advanced autoscaling solutions with KEDA and node scaling options like Karpenterto fully harness the power of Kubernetes in your deployments.

Kubernetes MCP Server: Step by Step Guide

Running Distributed ML Training with JobSet on Kubernetes

Kubectl Quick Reference 2025