Machine Learning Operations (MLOps) has become the cornerstone of deploying and maintaining ML models at scale. As organizations increasingly adopt Kubernetes for orchestration, understanding how to build robust CI/CD pipelines for ML models has become essential. This comprehensive guide explores production-ready MLOps practices on Kubernetes, complete with practical examples and battle-tested configurations.
Why Kubernetes for MLOps?
Kubernetes provides the perfect foundation for MLOps workflows due to its scalability, resource management, and declarative configuration approach. Unlike traditional deployment methods, Kubernetes offers:
- Automated scaling: Handle variable inference loads efficiently
- Resource isolation: GPU/CPU allocation for training and serving
- Version control: Seamless model versioning and rollbacks
- Multi-tenancy: Support multiple teams and experiments
- Cloud-agnostic: Deploy across any cloud provider or on-premises
Core Components of ML CI/CD Pipeline
A production-grade MLOps pipeline on Kubernetes consists of several interconnected components working in harmony. Let’s break down each component and implement them step-by-step.
1. Container Registry Setup
First, we need a container registry to store our ML model images. Here’s a Docker configuration for packaging an ML model:
FROM python:3.9-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy model artifacts
COPY models/ /app/models/
COPY src/ /app/src/
# Expose API port
EXPOSE 8080
# Health check
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost:8080/health || exit 1
# Run the model server
CMD ["python", "src/serve.py"]
Build and push the image:
docker build -t your-registry.io/ml-model:v1.0.0 .
docker push your-registry.io/ml-model:v1.0.0
2. Model Serving with Kubernetes Deployment
Deploy your ML model using a Kubernetes Deployment with proper resource allocation and health checks:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-model-deployment
namespace: mlops
labels:
app: ml-model
version: v1.0.0
spec:
replicas: 3
selector:
matchLabels:
app: ml-model
template:
metadata:
labels:
app: ml-model
version: v1.0.0
spec:
containers:
- name: model-server
image: your-registry.io/ml-model:v1.0.0
ports:
- containerPort: 8080
name: http
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
env:
- name: MODEL_PATH
value: "/app/models/model.pkl"
- name: LOG_LEVEL
value: "INFO"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 15
periodSeconds: 5
volumeMounts:
- name: model-storage
mountPath: /app/models
volumes:
- name: model-storage
persistentVolumeClaim:
claimName: model-pvc
---
apiVersion: v1
kind: Service
metadata:
name: ml-model-service
namespace: mlops
spec:
selector:
app: ml-model
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: LoadBalancer
3. Implementing CI/CD with Tekton Pipelines
Tekton provides Kubernetes-native CI/CD capabilities perfect for MLOps. Here’s a complete pipeline for model training, testing, and deployment:
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
name: ml-model-pipeline
namespace: mlops
spec:
params:
- name: git-url
type: string
description: Git repository URL
- name: git-revision
type: string
default: main
- name: image-name
type: string
description: Target image name
- name: model-version
type: string
workspaces:
- name: shared-workspace
- name: docker-credentials
tasks:
- name: fetch-repository
taskRef:
name: git-clone
workspaces:
- name: output
workspace: shared-workspace
params:
- name: url
value: $(params.git-url)
- name: revision
value: $(params.git-revision)
- name: train-model
taskRef:
name: python-task
runAfter:
- fetch-repository
workspaces:
- name: source
workspace: shared-workspace
params:
- name: script
value: |
pip install -r requirements.txt
python train.py --output-dir ./models
- name: test-model
taskRef:
name: python-task
runAfter:
- train-model
workspaces:
- name: source
workspace: shared-workspace
params:
- name: script
value: |
python test.py --model-path ./models/model.pkl
- name: build-push-image
taskRef:
name: kaniko
runAfter:
- test-model
workspaces:
- name: source
workspace: shared-workspace
- name: dockerconfig
workspace: docker-credentials
params:
- name: IMAGE
value: $(params.image-name):$(params.model-version)
- name: deploy-model
taskRef:
name: kubernetes-actions
runAfter:
- build-push-image
params:
- name: script
value: |
kubectl set image deployment/ml-model-deployment \
model-server=$(params.image-name):$(params.model-version) \
-n mlops
kubectl rollout status deployment/ml-model-deployment -n mlops
Create a PipelineRun to execute the pipeline:
kubectl create -f - <<EOF
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: ml-model-pipeline-run-
namespace: mlops
spec:
pipelineRef:
name: ml-model-pipeline
params:
- name: git-url
value: https://github.com/your-org/ml-model-repo.git
- name: image-name
value: your-registry.io/ml-model
- name: model-version
value: v1.0.1
workspaces:
- name: shared-workspace
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
- name: docker-credentials
secret:
secretName: docker-credentials
EOF
Model Versioning and A/B Testing
Implementing canary deployments for ML models allows you to test new versions with minimal risk. Here’s an Istio configuration for traffic splitting:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ml-model-vs
namespace: mlops
spec:
hosts:
- ml-model-service
http:
- match:
- headers:
x-model-version:
exact: v2
route:
- destination:
host: ml-model-service
subset: v2
- route:
- destination:
host: ml-model-service
subset: v1
weight: 90
- destination:
host: ml-model-service
subset: v2
weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: ml-model-dr
namespace: mlops
spec:
host: ml-model-service
subsets:
- name: v1
labels:
version: v1.0.0
- name: v2
labels:
version: v1.0.1
Monitoring and Observability
Monitoring ML models requires tracking both infrastructure and model-specific metrics. Here’s a Python example for exposing custom metrics:
from prometheus_client import Counter, Histogram, Gauge, start_http_server
import time
# Define metrics
prediction_counter = Counter('model_predictions_total',
'Total predictions made',
['model_version', 'status'])
prediction_latency = Histogram('model_prediction_latency_seconds',
'Prediction latency in seconds',
['model_version'])
model_accuracy = Gauge('model_accuracy',
'Current model accuracy',
['model_version'])
def predict(input_data, model_version='v1.0.0'):
start_time = time.time()
try:
# Your prediction logic here
result = model.predict(input_data)
# Record metrics
prediction_counter.labels(
model_version=model_version,
status='success'
).inc()
latency = time.time() - start_time
prediction_latency.labels(
model_version=model_version
).observe(latency)
return result
except Exception as e:
prediction_counter.labels(
model_version=model_version,
status='error'
).inc()
raise e
# Start metrics server
start_http_server(8000)
Deploy Prometheus ServiceMonitor to scrape these metrics:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ml-model-metrics
namespace: mlops
spec:
selector:
matchLabels:
app: ml-model
endpoints:
- port: metrics
interval: 30s
path: /metrics
Model Drift Detection
Implement automated model drift detection using a CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: model-drift-checker
namespace: mlops
spec:
schedule: "0 */6 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: drift-checker
image: your-registry.io/drift-checker:latest
env:
- name: MODEL_ENDPOINT
value: "http://ml-model-service"
- name: DRIFT_THRESHOLD
value: "0.15"
- name: ALERT_WEBHOOK
valueFrom:
secretKeyRef:
name: alert-config
key: webhook-url
restartPolicy: OnFailure
Best Practices and Troubleshooting
Best Practices
- Immutable artifacts: Always version and tag your model artifacts and container images
- Resource limits: Set appropriate CPU/memory limits to prevent resource exhaustion
- Gradual rollouts: Use canary deployments for production model updates
- Model registry: Implement a centralized model registry like MLflow or DVC
- Data versioning: Track training data versions alongside model versions
- Automated testing: Include unit tests, integration tests, and model performance tests
Common Issues and Solutions
Issue: Pod OOMKilled errors during inference
# Check memory usage
kubectl top pod -n mlops
# Increase memory limits
kubectl set resources deployment ml-model-deployment \
--limits=memory=8Gi -n mlops
Issue: Slow model loading times
Use init containers to pre-load models:
initContainers:
- name: model-loader
image: your-registry.io/model-loader:latest
volumeMounts:
- name: model-cache
mountPath: /models
command:
- sh
- -c
- |
aws s3 cp s3://your-bucket/models/model.pkl /models/
Issue: Pipeline failures due to dependency conflicts
# Use virtual environments in pipeline tasks
python -m venv /tmp/venv
source /tmp/venv/bin/activate
pip install -r requirements.txt
GitOps for MLOps
Implement GitOps practices using ArgoCD for declarative ML deployments:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ml-model-app
namespace: argocd
spec:
project: mlops
source:
repoURL: https://github.com/your-org/ml-manifests.git
targetRevision: main
path: k8s/production
destination:
server: https://kubernetes.default.svc
namespace: mlops
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Conclusion
Building a robust MLOps pipeline on Kubernetes requires careful consideration of infrastructure, automation, and monitoring. By implementing the patterns and practices outlined in this guide, you can create a production-ready ML deployment system that scales with your organization’s needs.
The key to successful MLOps is treating models as first-class citizens in your CI/CD pipeline, with proper versioning, testing, and monitoring at every stage. Start small with a single model deployment, then gradually expand your pipeline to handle multiple models, A/B testing, and advanced deployment strategies.
Remember that MLOps is an evolving discipline—continuously iterate on your pipelines, monitor performance metrics, and adapt to new tools and best practices as they emerge in the Kubernetes ecosystem.