CI for AI: Running Ollama + LLMs in GitHub Actions with Open Source Tools

Table of Contents

AI is rapidly transforming how we build software—but testing it? That’s still catching up.

If you’re building GenAI apps, you’ve probably asked:
“How do I test LLM responses in CI without relying on expensive APIs like OpenAI or SageMaker?”

In this post, I’ll show you how to run large language models locally in GitHub Actions using Ollama, powered entirely by open source tools like Docker, GitHub self-hosted runners, and NVIDIA GPU support. No cloud APIs, no billing surprises—just fast, reproducible testing with real models.

🧠 What is Ollama?

Ollama makes running open LLMs as easy as running containers. With a single command, you can pull and serve models like llama2, mistral, codellama, gemma, and more—without needing to configure Python environments or CUDA libraries.

ollama pull llama2
ollama run llama2

Ollama includes a REST API and CLI for inference, and recently added support for structured outputs and embeddings.

🛠️ Running Ollama in CI with GPU Support

To run LLMs in your CI pipeline using open tools, you’ll need:

🖥️ A self-hosted GitHub Actions runner (Linux preferred)
🚀 NVIDIA GPU (consumer or datacenter-grade)
🔧 NVIDIA Container Toolkit for GPU passthrough
🐋 Docker (optional, if you want container isolation)
🧪 Ollama installed locally or inside a Docker container

✅ Setting Up a GitHub Self-Hosted Runner with GPU Support

Install NVIDIA Drivers and verify GPU availability: nvidia-smi
Install NVIDIA Container Toolkit: sudo apt install -y nvidia-container-toolkit sudo systemctl restart docker
Register your GitHub self-hosted runner:
Follow the official guide.
Run Ollama locally or in a Docker container with GPU access: curl -fsSL https://ollama.com/install.sh | sh

⚙️ GitHub Actions Workflow Example

Here’s a sample .github/workflows/ollama-e2e.yml:

name: Ollama CI Test

on:
  workflow_dispatch:

jobs:
  ollama-test:
    runs-on: [self-hosted, gpu]
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3

      - name: Install Ollama
        run: curl -fsSL https://ollama.com/install.sh | sudo sh

      - name: Start Ollama Server
        run: |
          ollama serve &
          sleep 5
          curl -s http://localhost:11434

      - name: Pull Llama2 model
        run: ollama pull llama2

      - name: Run inference via CLI
        run: ollama run llama2 "How do MicroVMs compare to Docker in CI/CD workflows?"

      - name: Run inference via REST API
        run: |
          curl -s http://localhost:11434/api/generate -d '{
            "model": "llama2",
            "stream": false,
            "prompt": "List risks of running privileged containers in CI pipelines"
          }' | jq

🔄 Why This Matters

This approach gives you:

✅ Local control over the models you test
🔓 Vendor-free testing (no lock-in to cloud LLM providers)
💡 Customizable models with Ollama’s Modelfile
🧪 Reliable and repeatable GenAI test coverage

🔚 Wrapping Up

Ollama brings a Docker-like UX to LLMs. Combined with open-source CI workflows and GPU-powered self-hosted runners, you can now run full LLM tests—locally or in CI—without relying on paid APIs.

This opens the door to deterministic, structured, and cost-effective GenAI development workflows, from dev to test to production.

Want to extend this setup with LangChain, RAG, or multi-agent orchestration tools? Let me know—I’d be happy to walk you through more advanced examples.