🚀 100K+ Downloads | Latest Alibaba AI Model | Docker Hub: ai/qwen3
What is Qwen3?

Qwen3 is Alibaba’s third-generation large language model family that revolutionizes AI deployment with breakthrough Mixture-of-Experts (MoE) architecture. Unlike traditional models that activate all parameters, Qwen3 achieves massive scale efficiency by activating only 3-22 billion parameters from models containing up to 235 billion total parameters.
Key Technical Specifications
# Architecture Overview
Total Models: 8 variants (0.6B → 235B parameters)
Languages: 119 (Indo-European, Sino-Tibetan, Afro-Asiatic, etc.)
Context Window: 41K tokens (expandable to 262K)
License: Apache 2.0 (fully open source)
Provider: Alibaba Cloud
Architecture: qwen3 (custom transformer with MoE)
Training Cutoff: April 2025
What Makes Qwen3 Different
# Traditional AI Model
model_size = "70B parameters"
active_params = "70B parameters" # 100% activation
inference_cost = "$$"
memory_required = "140GB VRAM"
# Qwen3 MoE Model
model_size = "235B parameters"
active_params = "22B parameters" # Only 9.4% activation
inference_cost = "$" # 90% cost reduction
memory_required = "56GB VRAM" # 60% memory reduction
performance = "Superior to 70B models"
What is Docker Model Runner?
- Docker Model Runner (DMR) is a tool that makes it easy to manage, run, and deploy AI models using Docker.
- It allows developers to pull, run, and serve large language models (LLMs) and other AI models directly from Docker Hub or any OCI-compliant registry.
- DMR integrates with Docker Desktop and Docker Engine, enabling you to serve models via OpenAI-compatible APIs, interact with models from the command line, and manage them through a graphical interface.
- Models are cached locally after the first pull and are loaded into memory only at runtime to optimize resource usage.
- DMR supports both command-line and API-based interactions, making it suitable for building generative AI applications, experimenting with ML workflows, or integrating AI into software development pipelines
Quick Start
# Pull latest model (8B-Q4_K_M)
docker model pull ai/qwen3
All Available Models
# Lightweight models (< 2GB)
docker model pull ai/qwen3:0.6B-Q4_0 # 442MB
docker model pull ai/qwen3:0.6B-Q4_K_M # 456MB
docker model pull ai/qwen3:0.6B-F16 # 1.40GB
# Standard models
docker model pull ai/qwen3:8B-Q4_0 # 4.44GB
docker model pull ai/qwen3:8B-Q4_K_M # 4.68GB (recommended)
docker model pull ai/qwen3:8B-F16 # 15.26GB
docker model pull ai/qwen3:14B-Q6_K # 11.28GB
# MoE models
docker model pull ai/qwen3:30B-A3B-Q4_K_M # 17.28GB
docker model pull ai/qwen3:30B-A3B-F16 # 56.89GB
Basic Usage Examples
1. Simple Chat Interface
docker model run ai/qwen3:8B-Q4_K_M
> Hello, can you help me write Python code?
Assistant: I'd be happy to help you with Python code! What specific task or problem would you like me to help you with?
> Create a function to calculate fibonacci numbers
2. Docker Model Runner Rest API
Once Docker Model Runner is enabled, you can interact with your models programmatically using OpenAI-compatible REST API endpoints. Here’s an example of how to send a chat completion request from the host using curl:
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Please write 500 words about the fall of Rome."
}
]
}'
This request sends a prompt to the model and receives a response, just like you would with the OpenAI API. The base URL and port may vary depending on your setup (e.g., localhost:12434 for host access with TCP enabled)
Thinking Mode Examples
Enable Thinking Mode
docker model run ai/qwen3:8B-Q4_K_M
> /think Solve this step by step: What is 127 × 43?
To find the product of 127 and 43, I will follow the order of operations (PEMDAS):
1. Multiply 127 and 40 (which is 43 - 3):
127 × 40 = 5080
2. Multiply 127 and 3:
127 × 3 = 381
3. Add the results of steps 1 and 2:
5080 + 381 = 5461
The product of 127 and 43 is 5461.
The answer is 5,461.
> /no_think What is 2+2?
4
Determine the base URL
The base URL to interact with the endpoints depends on how you run Docker:
- From containers:
http://model-runner.docker.internal/ - From host processes:
http://localhost:12434/, assuming TCP host access is enabled on the default port (12434).