Don’t have a GPU? No problem! Docker Model Runner works perfectly fine on CPU, making it accessible for development, testing, and lightweight inference workloads.
Why CPU-Only?
- Development & Testing: Test AI models locally without expensive GPU hardware
- CI/CD Pipelines: Run model validation in standard build environments
- Edge Deployments: Deploy on CPU-only servers or edge devices
- Cost Efficiency: Utilize existing infrastructure without GPU investments
Prerequisites
- Linux system (Ubuntu/Debian or RPM-based)
- Docker Engine installed
sudoaccess
Installation Steps
1. Install Docker Model Runner Plugin
For Ubuntu/Debian:
sudo apt-get update
sudo apt-get install docker-model-plugin
For RPM-based distributions (RHEL/Fedora/CentOS):
sudo dnf update
sudo dnf install docker-model-plugin
2. Verify Installation
docker model version
You should see output confirming the plugin version.
3. Run Your First Model
Let’s test with SmolLM2, a lightweight language model perfect for CPU inference:
docker model run ai/smollm2
The first run will download the model. Subsequent runs will be faster.
4. (Optional) Force CPU Backend
If you want to explicitly configure the CPU backend:
docker model install-runner --gpu none
Testing the Model
Once the model is running, you can interact with it:
docker model run ai/smollm2 "Explain Docker in simple terms"
Performance Considerations
- CPU inference is slower than GPU but sufficient for development and testing
- Smaller models like SmolLM2, Phi, or Qwen perform better on CPU
- Quantized models (4-bit, 8-bit) run faster with lower memory usage
What’s Next?
- Explore available models:
docker model ls --remote - Try different model sizes based on your CPU capabilities
- Integrate into your development workflow
- Scale to GPU when you need production performance
Docker Model Runner’s CPU support democratizes AI development—no expensive hardware required to get started!
Resources: