Comprehensive Kubeflow Tutorial for ML Pipelines
Kubeflow is no longer “nice-to-have” — it’s the MLOps engine powering 90% of production AI on Kubernetes.
But most tutorials stop at “Hello World.”
We don’t.
In 45 minutes, you’ll:
- Deploy Kubeflow Pipelines v2 on Minikube (or your cloud cluster)
- Train a MNIST CNN with TensorFlow
- Export to KServe for real-time inference
- Automate everything with GitOps (ArgoCD)
Step 1: Set Up Local Kubernetes + Kubeflow
# 1. Start Minikube with enough resources
minikube start --cpus=4 --memory=8g --disk-size=20g
# 2. Enable GPU (if available, optional)
minikube start --gpus=all
# 3. Install Kubeflow (lightweight manifest)
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=2.0.5"
kubectl wait --for=condition=established --timeout=60s crd/applications.cr.gluon.ai
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=2.0.5"
# Wait for all pods
kubectl wait --for=condition=available deployment/ml-pipeline -n kubeflow --timeout=300s
Verify: Access UI at http://localhost:8080 via port-forward:
kubectl port-forward svc/istio-ingressgateway -n istio-system 8080:80
Step 2: Write the ML Pipeline in Python (KFP SDK)Create mnist_pipeline.py:
import kfp
from kfp import dsl
from kfp.components import create_component_from_func
# 1. Data Download Component
@create_component_from_func
def download_data(output_path: str):
import tensorflow as tf
import os
(x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
os.makedirs(output_path, exist_ok=True)
import numpy as np
np.save(f"{output_path}/x_train.npy", x_train)
np.save(f"{output_path}/y_train.npy", y_train)
# 2. Train Model Component
@create_component_from_func
def train_model(data_path: str, model_path: str):
import tensorflow as tf
import numpy as np
import os
x = np.load(f"{data_path}/x_train.npy") / 255.0
y = np.load(f"{data_path}/y_train.npy")
x = x.reshape(-1, 28, 28, 1)
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28,28,1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x, y, epochs=3, batch_size=128)
os.makedirs(model_path, exist_ok=True)
model.save(f"{model_path}/mnist_cnn")
# 3. Export to TF Serving
@create_component_from_func
def export_for_serving(model_path: str, export_path: str):
import tensorflow as tf
import os
model = tf.keras.models.load_model(f"{model_path}/mnist_cnn")
tf.saved_model.save(model, export_path, signatures=model.call)
# 4. Pipeline Definition
@dsl.pipeline(name="MNIST Training Pipeline", description="End-to-end MNIST with TF")
def mnist_pipeline():
data = download_data("/tmp/mnist/data").set_cpu_request('1').set_memory_request('2Gi')
train = train_model(data.outputs['output_path'], "/tmp/mnist/model") \
.after(data) \
.set_cpu_request('2') \
.set_memory_request('4Gi')
export_for_serving(train.outputs['model_path'], "/tmp/mnist/serving/1") \
.after(train)
# Compile
if __name__ == "__main__":
kfp.compiler.Compiler().compile(mnist_pipeline, 'mnist_pipeline.yaml')
Step 3: Compile & Upload Pipeline
# Install KFP SDK
pip install kfp==2.0.5
# Compile
python mnist_pipeline.py
# Upload to Kubeflow UI
kfp client upload -f mnist_pipeline.yaml -n "MNIST Pipeline v1"
UI: Go to Pipelines → Upload Pipeline → Select mnist_pipeline.yaml
Step 4: Run the Pipeline
- In Kubeflow UI → Pipelines → Click your pipeline
- Click Create Run
- Name: mnist-run-001 → Start
Watch live logs:
kubectl logs -n kubeflow -l pipeline/pipeline-id=<run-id> -f
Expected Output:
Epoch 3/3
469/469 [==============================] - 12s 25ms/step - loss: 0.0452 - accuracy: 0.9861
Model saved to /tmp/mnist/serving/1
Step 5: Serve the Model with KServe (InferenceService)Create kserve-inference.yaml:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: mnist-predictor
namespace: kubeflow
spec:
predictor:
tensorflow:
storageUri: pvc://mnist-models-pvc/mnist/serving/
Apply:
kubectl apply -f kserve-inference.yaml
Wait:
kubectl get isvc mnist-predictor -n kubeflow -w
# Ready when STATUS = Ready
Step 6: Test Inference (cURL)
# Get ingress
INGRESS_HOST=$(minikube ip)
INGRESS_PORT=$(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
# Sample input (first digit)
cat > input.json <<EOF
{
"instances": [[[[0.0]],[[0.0]],...]] # 28x28 flattened
}
EOF
curl -v -H "Host: mnist-predictor.kubeflow.example.com" \
-H "Content-Type: application/json" \
"http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/mnist-predictor:predict" \
-d @input.json
Expected:
{"predictions": [[0.0001, 0.0000, ..., 0.9998]]} → Digit 7
Bonus: Automate with GitOps (ArgoCD)
# argocd-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kubeflow-mnist
spec:
source:
repoURL: https://github.com/kubetools/kubeflow-mnist.git
path: pipeline/
destination:
server: https://kubernetes.default.svc
namespace: kubeflow
syncPolicy:
automated:
prune: true
selfHeal: true
Cleanup
minikube delete
# Or: kubectl delete ns kubeflow