Which Model to Choose with Docker Model Runner?

Table of Contents

Docker Model Runner allows you to run AI models locally through Docker Desktop. Here’s a breakdown of the available models and their recommended use cases:

1. ai/smollm2

Parameters: 361.82M (very small)
Quantization: IQ2_XXS/Q4_K_M
Architecture: Llama
Size: ~256MB
Best for:
- Development and prototyping
- Devices with limited resources
- Quick response applications
- Customer support assistants
- Local testing without GPU
- Low-latency applications
- Entry-level AI experimentation

2. ai/llama3.2

Parameters: Multiple versions (1B, 3B, 11B variants)
Architecture: Llama
Size: 2 GB
Best for:
- General-purpose text generation
- Efficient inference with fewer parameters
- Applications requiring quick responses
- Chatbots and conversational agents
- Content summarization
- When speed and efficiency are priorities

3. ai/llama3.3

Parameters: 70B
Architecture: Llama
Size: 42 GB
Best for:
- Complex text generation tasks
- Advanced conversational agents
- Content creation and summarization
- Applications requiring deeper understanding
- When quality is more important than speed
- Projects with sufficient hardware resources

4. ai/gemma3

Also available as ai/gemma3-qat (quantization-aware trained)
Architecture: Gemini-based
Size: 2.48 GB
Best for:
- Academic and research applications
- Cost-effective inference (one of the cheaper models)
- Applications needing strong reasoning capabilities
- When privacy and data security are critical
- Scenarios requiring offline processing

5. ai/phi4

Size: 9.04 GB
Best for:
- Applications requiring reasoning capabilities
- Scientific and technical content generation
- When mid-range model size is preferred (14B parameters)
- Educational applications
- Complex instruction following

6. ai/mistral and ai/mistral-nemo

Size: 4.37GB
Best for:
- Multilingual applications
- Content moderation
- Enterprise applications
- When code generation capabilities are needed
- Tasks requiring strong language understanding

7. ai/qwen2.5

Size: 4.43 GB
Best for:
- Cost-effective solutions (among cheaper models)
- When requiring faster inference with GQA (Grouped-query attention)
- Applications needing to handle longer sequences
- Multilingual applications
- Code generation

8. ai/deepseek-r1-distill-llama

Size: 4.92 GB
Note: This is a Llama model trained on DeepSeek-R1 inputs/outputs
Best for:
- Specialized applications requiring distilled knowledge
- When needing a balance between performance and model size
- Research prototyping

How to Choose a Model

Consider your hardware constraints:
- For devices with limited resources: ai/smollm2
- For mid-range hardware: ai/llama3.2, ai/gemma3, ai/phi4
- For powerful machines: ai/llama3.3, ai/deepseek-r1-distill-llama
Consider your application needs:
- For general text generation: ai/llama3.2 or ai/llama3.3
- For code generation: ai/qwen2.5 or ai/mistral
- For reasoning tasks: ai/gemma3 or ai/phi4
- For prototyping and testing: ai/smollm2
Consider privacy and security requirements:
- All models run locally, providing enhanced privacy
- Larger models may offer more robust outputs for sensitive applications
Development workflow integration:
- Use Docker Compose for integrating models with your applications
- Leverage the OpenAI-compatible API for easy integration

Getting Started Example

Pull a small model for testing:

docker model pull ai/smollm2

Run the model with a prompt:

docker model run ai/smollm2 "What is Docker?"

For integration with applications, use the OpenAI-compatible API

curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is Docker?"}
    ]
  }'

This guide should help you select the appropriate Docker AI model for your specific use case and hardware constraints.