Join our Discord Server
Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

LLM Model Versioning: Best Practices and Tools for Production MLOps

5 min read

As Large Language Models (LLMs) become increasingly critical to production systems, managing their versions, dependencies, and deployments has emerged as a significant challenge for DevOps and ML teams. Unlike traditional software versioning, LLM model versioning involves tracking massive binary files, complex training configurations, dataset versions, and evaluation metrics—all while maintaining reproducibility and compliance requirements.

In this comprehensive guide, we’ll explore battle-tested strategies for versioning LLMs in production environments, complete with practical implementations using industry-standard tools.

Why LLM Model Versioning Matters

Before diving into implementation details, let’s understand the unique challenges of LLM versioning:

  • Model Size: LLMs range from hundreds of megabytes to hundreds of gigabytes, making traditional Git-based versioning impractical
  • Training Reproducibility: Tracking hyperparameters, random seeds, and training data versions is essential for debugging and compliance
  • A/B Testing: Production systems often run multiple model versions simultaneously
  • Rollback Requirements: Quick rollback capabilities are critical when model performance degrades
  • Regulatory Compliance: Industries like healthcare and finance require complete model lineage tracking

Core Components of LLM Model Versioning

An effective LLM versioning strategy encompasses four key elements:

1. Model Artifacts

The actual model weights, tokenizer files, and configuration files that define your LLM.

2. Training Metadata

Hyperparameters, training duration, hardware specifications, and framework versions used during training.

3. Dataset Versions

Snapshots or references to the exact training and validation datasets used.

4. Evaluation Metrics

Performance benchmarks, test results, and comparative analyses against baseline models.

Best Practices for LLM Model Versioning

1. Implement Semantic Versioning for Models

Adopt a semantic versioning scheme that communicates model changes effectively:

  • Major version (v2.0.0): Architecture changes, different base models, or breaking API changes
  • Minor version (v1.3.0): Fine-tuning on new data, significant performance improvements
  • Patch version (v1.2.1): Bug fixes, minor optimizations, configuration updates

2. Separate Model Storage from Code Repositories

Never store large model files directly in Git. Instead, use specialized model registries and reference them in your codebase:

# model_config.py
MODEL_REGISTRY = {
    "gpt-custom-v1.2.0": {
        "uri": "s3://models/gpt-custom/v1.2.0/model.safetensors",
        "tokenizer": "s3://models/gpt-custom/v1.2.0/tokenizer/",
        "config": "s3://models/gpt-custom/v1.2.0/config.json",
        "metadata": {
            "trained_date": "2024-01-15",
            "base_model": "llama-2-7b",
            "training_tokens": "50B",
            "perplexity": 12.3
        }
    }
}

3. Maintain Immutable Model Versions

Once a model version is published, treat it as immutable. Never overwrite existing versions—always create new versions for changes.

Essential Tools for LLM Model Versioning

DVC (Data Version Control)

DVC extends Git’s capabilities to handle large files and provides excellent integration with existing DevOps workflows.

Installation and Setup:

# Install DVC with S3 support
pip install 'dvc[s3]'

# Initialize DVC in your project
git init
dvc init

# Configure remote storage
dvc remote add -d models s3://my-company-models/llm-versions
dvc remote modify models region us-west-2

Tracking Your First LLM:

# Add model to DVC tracking
dvc add models/llama-fine-tuned-v1.0.0.safetensors

# Commit the .dvc file (small metadata file)
git add models/llama-fine-tuned-v1.0.0.safetensors.dvc .gitignore
git commit -m "Add LLM v1.0.0"

# Push model to remote storage
dvc push

# Push metadata to Git
git push

Creating a DVC Pipeline for Model Training:

# dvc.yaml
stages:
  prepare_data:
    cmd: python scripts/prepare_data.py
    deps:
      - scripts/prepare_data.py
      - data/raw/
    outs:
      - data/processed/train.jsonl
      - data/processed/val.jsonl
    params:
      - prepare.max_length
      - prepare.train_split

  train_model:
    cmd: python scripts/train.py
    deps:
      - scripts/train.py
      - data/processed/train.jsonl
      - data/processed/val.jsonl
    outs:
      - models/checkpoint/
    params:
      - train.learning_rate
      - train.batch_size
      - train.epochs
    metrics:
      - metrics/train_metrics.json:
          cache: false

  evaluate:
    cmd: python scripts/evaluate.py
    deps:
      - scripts/evaluate.py
      - models/checkpoint/
      - data/processed/val.jsonl
    metrics:
      - metrics/eval_metrics.json:
          cache: false

MLflow Model Registry

MLflow provides a centralized model registry with built-in versioning, stage transitions, and REST API access.

Setting Up MLflow Tracking:

import mlflow
import mlflow.transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

# Configure MLflow
mlflow.set_tracking_uri("http://mlflow-server:5000")
mlflow.set_experiment("llm-fine-tuning")

# Start training run
with mlflow.start_run(run_name="gpt-custom-v1.2.0") as run:
    # Log parameters
    mlflow.log_params({
        "base_model": "gpt2-medium",
        "learning_rate": 5e-5,
        "batch_size": 8,
        "epochs": 3,
        "max_length": 512
    })
    
    # Training code here...
    model = AutoModelForCausalLM.from_pretrained("gpt2-medium")
    tokenizer = AutoTokenizer.from_pretrained("gpt2-medium")
    
    # Log metrics during training
    mlflow.log_metrics({
        "train_loss": 2.34,
        "val_loss": 2.56,
        "perplexity": 12.9
    })
    
    # Log model to registry
    mlflow.transformers.log_model(
        transformers_model={"model": model, "tokenizer": tokenizer},
        artifact_path="model",
        registered_model_name="gpt-custom"
    )

Managing Model Stages:

from mlflow.tracking import MlflowClient

client = MlflowClient()

# Promote model to staging
client.transition_model_version_stage(
    name="gpt-custom",
    version=3,
    stage="Staging"
)

# After validation, promote to production
client.transition_model_version_stage(
    name="gpt-custom",
    version=3,
    stage="Production",
    archive_existing_versions=True
)

Hugging Face Model Hub

For teams using Hugging Face models, their Hub provides native versioning through Git-LFS with a user-friendly interface.

# Install Hugging Face CLI
pip install huggingface_hub

# Login to your account
huggingface-cli login

# Create a new model repository
huggingface-cli repo create my-fine-tuned-llm --type model

# Clone the repository
git clone https://huggingface.co/your-username/my-fine-tuned-llm
cd my-fine-tuned-llm

# Add your model files
cp /path/to/model/* .

# Commit and push (Git-LFS handles large files)
git add .
git commit -m "Add model v1.0.0"
git tag v1.0.0
git push --tags
git push

Kubernetes-Native Model Versioning

For production deployments on Kubernetes, implement model versioning through ConfigMaps and custom resources:

# model-version-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: llm-model-versions
  namespace: ml-production
data:
  current_version: "v1.2.0"
  canary_version: "v1.3.0"
  canary_traffic_percentage: "10"
  model_registry_url: "s3://models/gpt-custom"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-inference-stable
  namespace: ml-production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: llm-inference
      version: stable
  template:
    metadata:
      labels:
        app: llm-inference
        version: stable
    spec:
      initContainers:
      - name: model-downloader
        image: amazon/aws-cli:latest
        command:
          - sh
          - -c
          - |
            aws s3 sync s3://models/gpt-custom/$(MODEL_VERSION)/ /models/
        env:
        - name: MODEL_VERSION
          valueFrom:
            configMapKeyRef:
              name: llm-model-versions
              key: current_version
        volumeMounts:
        - name: model-storage
          mountPath: /models
      containers:
      - name: inference-server
        image: your-registry/llm-inference:latest
        env:
        - name: MODEL_PATH
          value: "/models"
        - name: MODEL_VERSION
          valueFrom:
            configMapKeyRef:
              name: llm-model-versions
              key: current_version
        volumeMounts:
        - name: model-storage
          mountPath: /models
        resources:
          limits:
            nvidia.com/gpu: 1
            memory: "16Gi"
          requests:
            memory: "8Gi"
      volumes:
      - name: model-storage
        emptyDir:
          sizeLimit: 20Gi

Implementing Model Lineage Tracking

Create a comprehensive metadata file for each model version:

# model-lineage-v1.2.0.yaml
model_version: v1.2.0
model_name: gpt-custom
created_at: "2024-01-15T10:30:00Z"
created_by: ml-team@company.com

base_model:
  name: llama-2-7b
  version: v2.0
  source: meta-llama/Llama-2-7b-hf

training:
  dataset:
    name: custom-instructions-dataset
    version: v3.1.0
    size: 50000
    hash: sha256:a3f5b8c9d2e1f4a6b7c8d9e0f1a2b3c4
  
  hyperparameters:
    learning_rate: 5.0e-5
    batch_size: 8
    gradient_accumulation_steps: 4
    epochs: 3
    warmup_steps: 500
    max_seq_length: 2048
  
  infrastructure:
    gpu_type: A100-80GB
    num_gpus: 8
    training_framework: pytorch-2.1.0
    distributed_strategy: FSDP
  
  duration_hours: 72
  total_tokens_trained: 50000000000

evaluation:
  metrics:
    perplexity: 12.34
    bleu_score: 0.45
    rouge_l: 0.38
  
  benchmarks:
    mmlu_accuracy: 0.62
    hellaswag_accuracy: 0.71
    truthfulqa_accuracy: 0.48

artifacts:
  model_weights: s3://models/gpt-custom/v1.2.0/model.safetensors
  tokenizer: s3://models/gpt-custom/v1.2.0/tokenizer/
  config: s3://models/gpt-custom/v1.2.0/config.json
  
deployment:
  production_date: "2024-01-20T14:00:00Z"
  rollback_version: v1.1.0
  deployment_strategy: blue-green
  
approvals:
  - approver: lead-ml-engineer@company.com
    date: "2024-01-18T09:00:00Z"
  - approver: platform-lead@company.com
    date: "2024-01-19T11:30:00Z"

Troubleshooting Common Issues

Issue 1: DVC Remote Storage Sync Failures

Symptom: dvc push fails with authentication or network errors.

Solution:

# Verify AWS credentials
aws sts get-caller-identity

# Test S3 access
aws s3 ls s3://my-company-models/

# Check DVC remote configuration
dvc remote list
dvc remote modify models --local access_key_id YOUR_KEY
dvc remote modify models --local secret_access_key YOUR_SECRET

# Retry with verbose output
dvc push -v

Issue 2: Model Version Conflicts in Production

Symptom: Multiple pods loading different model versions unintentionally.

Solution: Implement pod labels and version checksums:

# model_loader.py
import hashlib
import os

def verify_model_version(model_path, expected_version):
    """Verify loaded model matches expected version"""
    version_file = os.path.join(model_path, "VERSION")
    
    if not os.path.exists(version_file):
        raise ValueError(f"Version file not found at {version_file}")
    
    with open(version_file, 'r') as f:
        loaded_version = f.read().strip()
    
    if loaded_version != expected_version:
        raise ValueError(
            f"Version mismatch: expected {expected_version}, "
            f"got {loaded_version}"
        )
    
    return True

Issue 3: Large Model Download Timeouts

Solution: Implement retry logic and use persistent volumes:

# download_model.sh
#!/bin/bash
set -e

MAX_RETRIES=3
RETRY_DELAY=10
MODEL_URL=$1
DEST_PATH=$2

for i in $(seq 1 $MAX_RETRIES); do
    echo "Download attempt $i of $MAX_RETRIES"
    
    if aws s3 sync "$MODEL_URL" "$DEST_PATH" --no-progress; then
        echo "Download successful"
        exit 0
    else
        echo "Download failed, retrying in ${RETRY_DELAY}s..."
        sleep $RETRY_DELAY
    fi
done

echo "Download failed after $MAX_RETRIES attempts"
exit 1

Advanced: Automated Model Version Promotion Pipeline

Implement a GitOps-style pipeline for automated model promotion:

# .github/workflows/model-promotion.yaml
name: Model Version Promotion

on:
  workflow_dispatch:
    inputs:
      model_version:
        description: 'Model version to promote'
        required: true
      target_stage:
        description: 'Target stage (staging/production)'
        required: true
        type: choice
        options:
          - staging
          - production

jobs:
  validate-and-promote:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      
      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      
      - name: Install dependencies
        run: |
          pip install mlflow boto3 pyyaml
      
      - name: Run validation tests
        run: |
          python scripts/validate_model.py \
            --version ${{ github.event.inputs.model_version }} \
            --min-accuracy 0.85
      
      - name: Promote model
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}
        run: |
          python scripts/promote_model.py \
            --version ${{ github.event.inputs.model_version }} \
            --stage ${{ github.event.inputs.target_stage }}
      
      - name: Update Kubernetes ConfigMap
        run: |
          kubectl patch configmap llm-model-versions \
            -n ml-production \
            -p '{"data":{"current_version":"${{ github.event.inputs.model_version }}"}}}'

Key Takeaways

Effective LLM model versioning requires a multi-layered approach combining specialized tools, infrastructure automation, and rigorous processes:

  • Use DVC or MLflow for model artifact versioning and metadata tracking
  • Implement semantic versioning that communicates model changes clearly
  • Maintain comprehensive lineage documentation including training data, hyperparameters, and evaluation metrics
  • Leverage Kubernetes-native patterns for production deployment and rollback capabilities
  • Automate validation and promotion workflows to reduce human error
  • Never store large model files directly in Git repositories
  • Implement checksums and version verification in production systems

By following these practices and leveraging the right tools, you’ll build a robust MLOps pipeline that ensures reproducibility, enables rapid iteration, and maintains production stability for your LLM deployments.

Have Queries? Join https://launchpass.com/collabnix

Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.
Join our Discord Server
Index