Join our Discord Server
Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

MLOps on Kubernetes: CI/CD for Machine Learning Models in 2024

4 min read

Machine Learning Operations (MLOps) has become the cornerstone of deploying and maintaining ML models at scale. As organizations increasingly adopt Kubernetes for orchestration, understanding how to build robust CI/CD pipelines for ML models has become essential. This comprehensive guide explores production-ready MLOps practices on Kubernetes, complete with practical examples and battle-tested configurations.

Why Kubernetes for MLOps?

Kubernetes provides the perfect foundation for MLOps workflows due to its scalability, resource management, and declarative configuration approach. Unlike traditional deployment methods, Kubernetes offers:

  • Automated scaling: Handle variable inference loads efficiently
  • Resource isolation: GPU/CPU allocation for training and serving
  • Version control: Seamless model versioning and rollbacks
  • Multi-tenancy: Support multiple teams and experiments
  • Cloud-agnostic: Deploy across any cloud provider or on-premises

Core Components of ML CI/CD Pipeline

A production-grade MLOps pipeline on Kubernetes consists of several interconnected components working in harmony. Let’s break down each component and implement them step-by-step.

1. Container Registry Setup

First, we need a container registry to store our ML model images. Here’s a Docker configuration for packaging an ML model:

FROM python:3.9-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy model artifacts
COPY models/ /app/models/
COPY src/ /app/src/

# Expose API port
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8080/health || exit 1

# Run the model server
CMD ["python", "src/serve.py"]

Build and push the image:

docker build -t your-registry.io/ml-model:v1.0.0 .
docker push your-registry.io/ml-model:v1.0.0

2. Model Serving with Kubernetes Deployment

Deploy your ML model using a Kubernetes Deployment with proper resource allocation and health checks:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-deployment
  namespace: mlops
  labels:
    app: ml-model
    version: v1.0.0
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
        version: v1.0.0
    spec:
      containers:
      - name: model-server
        image: your-registry.io/ml-model:v1.0.0
        ports:
        - containerPort: 8080
          name: http
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        env:
        - name: MODEL_PATH
          value: "/app/models/model.pkl"
        - name: LOG_LEVEL
          value: "INFO"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 5
        volumeMounts:
        - name: model-storage
          mountPath: /app/models
      volumes:
      - name: model-storage
        persistentVolumeClaim:
          claimName: model-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: ml-model-service
  namespace: mlops
spec:
  selector:
    app: ml-model
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer

3. Implementing CI/CD with Tekton Pipelines

Tekton provides Kubernetes-native CI/CD capabilities perfect for MLOps. Here’s a complete pipeline for model training, testing, and deployment:

apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: ml-model-pipeline
  namespace: mlops
spec:
  params:
  - name: git-url
    type: string
    description: Git repository URL
  - name: git-revision
    type: string
    default: main
  - name: image-name
    type: string
    description: Target image name
  - name: model-version
    type: string
  workspaces:
  - name: shared-workspace
  - name: docker-credentials
  tasks:
  - name: fetch-repository
    taskRef:
      name: git-clone
    workspaces:
    - name: output
      workspace: shared-workspace
    params:
    - name: url
      value: $(params.git-url)
    - name: revision
      value: $(params.git-revision)
  
  - name: train-model
    taskRef:
      name: python-task
    runAfter:
    - fetch-repository
    workspaces:
    - name: source
      workspace: shared-workspace
    params:
    - name: script
      value: |
        pip install -r requirements.txt
        python train.py --output-dir ./models
  
  - name: test-model
    taskRef:
      name: python-task
    runAfter:
    - train-model
    workspaces:
    - name: source
      workspace: shared-workspace
    params:
    - name: script
      value: |
        python test.py --model-path ./models/model.pkl
  
  - name: build-push-image
    taskRef:
      name: kaniko
    runAfter:
    - test-model
    workspaces:
    - name: source
      workspace: shared-workspace
    - name: dockerconfig
      workspace: docker-credentials
    params:
    - name: IMAGE
      value: $(params.image-name):$(params.model-version)
  
  - name: deploy-model
    taskRef:
      name: kubernetes-actions
    runAfter:
    - build-push-image
    params:
    - name: script
      value: |
        kubectl set image deployment/ml-model-deployment \
          model-server=$(params.image-name):$(params.model-version) \
          -n mlops
        kubectl rollout status deployment/ml-model-deployment -n mlops

Create a PipelineRun to execute the pipeline:

kubectl create -f - <<EOF
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  generateName: ml-model-pipeline-run-
  namespace: mlops
spec:
  pipelineRef:
    name: ml-model-pipeline
  params:
  - name: git-url
    value: https://github.com/your-org/ml-model-repo.git
  - name: image-name
    value: your-registry.io/ml-model
  - name: model-version
    value: v1.0.1
  workspaces:
  - name: shared-workspace
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
  - name: docker-credentials
    secret:
      secretName: docker-credentials
EOF

Model Versioning and A/B Testing

Implementing canary deployments for ML models allows you to test new versions with minimal risk. Here’s an Istio configuration for traffic splitting:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ml-model-vs
  namespace: mlops
spec:
  hosts:
  - ml-model-service
  http:
  - match:
    - headers:
        x-model-version:
          exact: v2
    route:
    - destination:
        host: ml-model-service
        subset: v2
  - route:
    - destination:
        host: ml-model-service
        subset: v1
      weight: 90
    - destination:
        host: ml-model-service
        subset: v2
      weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: ml-model-dr
  namespace: mlops
spec:
  host: ml-model-service
  subsets:
  - name: v1
    labels:
      version: v1.0.0
  - name: v2
    labels:
      version: v1.0.1

Monitoring and Observability

Monitoring ML models requires tracking both infrastructure and model-specific metrics. Here’s a Python example for exposing custom metrics:

from prometheus_client import Counter, Histogram, Gauge, start_http_server
import time

# Define metrics
prediction_counter = Counter('model_predictions_total', 
                             'Total predictions made',
                             ['model_version', 'status'])

prediction_latency = Histogram('model_prediction_latency_seconds',
                               'Prediction latency in seconds',
                               ['model_version'])

model_accuracy = Gauge('model_accuracy',
                       'Current model accuracy',
                       ['model_version'])

def predict(input_data, model_version='v1.0.0'):
    start_time = time.time()
    
    try:
        # Your prediction logic here
        result = model.predict(input_data)
        
        # Record metrics
        prediction_counter.labels(
            model_version=model_version,
            status='success'
        ).inc()
        
        latency = time.time() - start_time
        prediction_latency.labels(
            model_version=model_version
        ).observe(latency)
        
        return result
        
    except Exception as e:
        prediction_counter.labels(
            model_version=model_version,
            status='error'
        ).inc()
        raise e

# Start metrics server
start_http_server(8000)

Deploy Prometheus ServiceMonitor to scrape these metrics:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ml-model-metrics
  namespace: mlops
spec:
  selector:
    matchLabels:
      app: ml-model
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

Model Drift Detection

Implement automated model drift detection using a CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: model-drift-checker
  namespace: mlops
spec:
  schedule: "0 */6 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: drift-checker
            image: your-registry.io/drift-checker:latest
            env:
            - name: MODEL_ENDPOINT
              value: "http://ml-model-service"
            - name: DRIFT_THRESHOLD
              value: "0.15"
            - name: ALERT_WEBHOOK
              valueFrom:
                secretKeyRef:
                  name: alert-config
                  key: webhook-url
          restartPolicy: OnFailure

Best Practices and Troubleshooting

Best Practices

  • Immutable artifacts: Always version and tag your model artifacts and container images
  • Resource limits: Set appropriate CPU/memory limits to prevent resource exhaustion
  • Gradual rollouts: Use canary deployments for production model updates
  • Model registry: Implement a centralized model registry like MLflow or DVC
  • Data versioning: Track training data versions alongside model versions
  • Automated testing: Include unit tests, integration tests, and model performance tests

Common Issues and Solutions

Issue: Pod OOMKilled errors during inference

# Check memory usage
kubectl top pod -n mlops

# Increase memory limits
kubectl set resources deployment ml-model-deployment \
  --limits=memory=8Gi -n mlops

Issue: Slow model loading times

Use init containers to pre-load models:

initContainers:
- name: model-loader
  image: your-registry.io/model-loader:latest
  volumeMounts:
  - name: model-cache
    mountPath: /models
  command:
  - sh
  - -c
  - |
    aws s3 cp s3://your-bucket/models/model.pkl /models/

Issue: Pipeline failures due to dependency conflicts

# Use virtual environments in pipeline tasks
python -m venv /tmp/venv
source /tmp/venv/bin/activate
pip install -r requirements.txt

GitOps for MLOps

Implement GitOps practices using ArgoCD for declarative ML deployments:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: ml-model-app
  namespace: argocd
spec:
  project: mlops
  source:
    repoURL: https://github.com/your-org/ml-manifests.git
    targetRevision: main
    path: k8s/production
  destination:
    server: https://kubernetes.default.svc
    namespace: mlops
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

Conclusion

Building a robust MLOps pipeline on Kubernetes requires careful consideration of infrastructure, automation, and monitoring. By implementing the patterns and practices outlined in this guide, you can create a production-ready ML deployment system that scales with your organization’s needs.

The key to successful MLOps is treating models as first-class citizens in your CI/CD pipeline, with proper versioning, testing, and monitoring at every stage. Start small with a single model deployment, then gradually expand your pipeline to handle multiple models, A/B testing, and advanced deployment strategies.

Remember that MLOps is an evolving discipline—continuously iterate on your pipelines, monitor performance metrics, and adapt to new tools and best practices as they emerge in the Kubernetes ecosystem.

Have Queries? Join https://launchpass.com/collabnix

Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.
Join our Discord Server
Index