Docker Model Runner allows you to run AI models locally through Docker Desktop. Here’s a breakdown of the available models and their recommended use cases:

1. ai/smollm2
- Parameters: 361.82M (very small)
- Quantization: IQ2_XXS/Q4_K_M
- Architecture: Llama
- Size: ~256MB
- Best for:
- Development and prototyping
- Devices with limited resources
- Quick response applications
- Customer support assistants
- Local testing without GPU
- Low-latency applications
- Entry-level AI experimentation
2. ai/llama3.2
- Parameters: Multiple versions (1B, 3B, 11B variants)
- Architecture: Llama
- Size: 2 GB
- Best for:
- General-purpose text generation
- Efficient inference with fewer parameters
- Applications requiring quick responses
- Chatbots and conversational agents
- Content summarization
- When speed and efficiency are priorities
3. ai/llama3.3
- Parameters: 70B
- Architecture: Llama
- Size: 42 GB
- Best for:
- Complex text generation tasks
- Advanced conversational agents
- Content creation and summarization
- Applications requiring deeper understanding
- When quality is more important than speed
- Projects with sufficient hardware resources
4. ai/gemma3
- Also available as ai/gemma3-qat (quantization-aware trained)
- Architecture: Gemini-based
- Size: 2.48 GB
- Best for:
- Academic and research applications
- Cost-effective inference (one of the cheaper models)
- Applications needing strong reasoning capabilities
- When privacy and data security are critical
- Scenarios requiring offline processing
5. ai/phi4
- Size: 9.04 GB
- Best for:
- Applications requiring reasoning capabilities
- Scientific and technical content generation
- When mid-range model size is preferred (14B parameters)
- Educational applications
- Complex instruction following
6. ai/mistral and ai/mistral-nemo
- Size: 4.37GB
- Best for:
- Multilingual applications
- Content moderation
- Enterprise applications
- When code generation capabilities are needed
- Tasks requiring strong language understanding
7. ai/qwen2.5
- Size: 4.43 GB
- Best for:
- Cost-effective solutions (among cheaper models)
- When requiring faster inference with GQA (Grouped-query attention)
- Applications needing to handle longer sequences
- Multilingual applications
- Code generation
8. ai/deepseek-r1-distill-llama
- Size: 4.92 GB
- Note: This is a Llama model trained on DeepSeek-R1 inputs/outputs
- Best for:
- Specialized applications requiring distilled knowledge
- When needing a balance between performance and model size
- Research prototyping
How to Choose a Model

- Consider your hardware constraints:
- For devices with limited resources: ai/smollm2
- For mid-range hardware: ai/llama3.2, ai/gemma3, ai/phi4
- For powerful machines: ai/llama3.3, ai/deepseek-r1-distill-llama
- Consider your application needs:
- For general text generation: ai/llama3.2 or ai/llama3.3
- For code generation: ai/qwen2.5 or ai/mistral
- For reasoning tasks: ai/gemma3 or ai/phi4
- For prototyping and testing: ai/smollm2
- Consider privacy and security requirements:
- All models run locally, providing enhanced privacy
- Larger models may offer more robust outputs for sensitive applications
- Development workflow integration:
- Use Docker Compose for integrating models with your applications
- Leverage the OpenAI-compatible API for easy integration
Getting Started Example

Pull a small model for testing:
docker model pull ai/smollm2
Run the model with a prompt
:
docker model run ai/smollm2 "What is Docker?"
For integration with applications, use the OpenAI-compatible API
curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Docker?"}
]
}'
This guide should help you select the appropriate Docker AI model for your specific use case and hardware constraints.