Join our Discord Server
Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

Which Model to Choose with Docker Model Runner?

1 min read

Docker Model Runner allows you to run AI models locally through Docker Desktop. Here’s a breakdown of the available models and their recommended use cases:

1. ai/smollm2

  • Parameters: 361.82M (very small)
  • Quantization: IQ2_XXS/Q4_K_M
  • Architecture: Llama
  • Size: ~256MB
  • Best for:
    • Development and prototyping
    • Devices with limited resources
    • Quick response applications
    • Customer support assistants
    • Local testing without GPU
    • Low-latency applications
    • Entry-level AI experimentation

2. ai/llama3.2

  • Parameters: Multiple versions (1B, 3B, 11B variants)
  • Architecture: Llama
  • Size: 2 GB
  • Best for:
    • General-purpose text generation
    • Efficient inference with fewer parameters
    • Applications requiring quick responses
    • Chatbots and conversational agents
    • Content summarization
    • When speed and efficiency are priorities

3. ai/llama3.3

  • Parameters: 70B
  • Architecture: Llama
  • Size: 42 GB
  • Best for:
    • Complex text generation tasks
    • Advanced conversational agents
    • Content creation and summarization
    • Applications requiring deeper understanding
    • When quality is more important than speed
    • Projects with sufficient hardware resources

4. ai/gemma3

  • Also available as ai/gemma3-qat (quantization-aware trained)
  • Architecture: Gemini-based
  • Size: 2.48 GB
  • Best for:
    • Academic and research applications
    • Cost-effective inference (one of the cheaper models)
    • Applications needing strong reasoning capabilities
    • When privacy and data security are critical
    • Scenarios requiring offline processing

5. ai/phi4

  • Size: 9.04 GB
  • Best for:
    • Applications requiring reasoning capabilities
    • Scientific and technical content generation
    • When mid-range model size is preferred (14B parameters)
    • Educational applications
    • Complex instruction following

6. ai/mistral and ai/mistral-nemo

  • Size: 4.37GB
  • Best for:
    • Multilingual applications
    • Content moderation
    • Enterprise applications
    • When code generation capabilities are needed
    • Tasks requiring strong language understanding

7. ai/qwen2.5

  • Size: 4.43 GB
  • Best for:
    • Cost-effective solutions (among cheaper models)
    • When requiring faster inference with GQA (Grouped-query attention)
    • Applications needing to handle longer sequences
    • Multilingual applications
    • Code generation

8. ai/deepseek-r1-distill-llama

  • Size: 4.92 GB
  • Note: This is a Llama model trained on DeepSeek-R1 inputs/outputs
  • Best for:
    • Specialized applications requiring distilled knowledge
    • When needing a balance between performance and model size
    • Research prototyping

How to Choose a Model

  1. Consider your hardware constraints:
    • For devices with limited resources: ai/smollm2
    • For mid-range hardware: ai/llama3.2, ai/gemma3, ai/phi4
    • For powerful machines: ai/llama3.3, ai/deepseek-r1-distill-llama
  2. Consider your application needs:
    • For general text generation: ai/llama3.2 or ai/llama3.3
    • For code generation: ai/qwen2.5 or ai/mistral
    • For reasoning tasks: ai/gemma3 or ai/phi4
    • For prototyping and testing: ai/smollm2
  3. Consider privacy and security requirements:
    • All models run locally, providing enhanced privacy
    • Larger models may offer more robust outputs for sensitive applications
  4. Development workflow integration:
    • Use Docker Compose for integrating models with your applications
    • Leverage the OpenAI-compatible API for easy integration

Getting Started Example

Pull a small model for testing:

docker model pull ai/smollm2

Run the model with a prompt:

docker model run ai/smollm2 "What is Docker?"

For integration with applications, use the OpenAI-compatible API

curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is Docker?"}
    ]
  }'

This guide should help you select the appropriate Docker AI model for your specific use case and hardware constraints.

Have Queries? Join https://launchpass.com/collabnix

Collabnix Team The Collabnix Team is a diverse collective of Docker, Kubernetes, and IoT experts united by a passion for cloud-native technologies. With backgrounds spanning across DevOps, platform engineering, cloud architecture, and container orchestration, our contributors bring together decades of combined experience from various industries and technical domains.

How to Build Your First MCP Server in Python

The Model Context Protocol (MCP) is an open standard designed to help AI systems maintain context throughout a conversation. It provides a consistent way...
Ajeet Raina
5 min read

Leave a Reply

Collabnixx
Chatbot
Join our Discord Server
Index