Join our Discord Server

Qwen 3 AI Model

🚀 100K+ Downloads | Latest Alibaba AI Model | Docker Hub: ai/qwen3

What is Qwen3?

Qwen3 is Alibaba’s third-generation large language model family that revolutionizes AI deployment with breakthrough Mixture-of-Experts (MoE) architecture. Unlike traditional models that activate all parameters, Qwen3 achieves massive scale efficiency by activating only 3-22 billion parameters from models containing up to 235 billion total parameters.

Key Technical Specifications

# Architecture Overview
Total Models: 8 variants (0.6B → 235B parameters)
Languages: 119 (Indo-European, Sino-Tibetan, Afro-Asiatic, etc.)
Context Window: 41K tokens (expandable to 262K)
License: Apache 2.0 (fully open source)
Provider: Alibaba Cloud
Architecture: qwen3 (custom transformer with MoE)
Training Cutoff: April 2025

What Makes Qwen3 Different

# Traditional AI Model
model_size = "70B parameters"
active_params = "70B parameters"  # 100% activation
inference_cost = "$$"
memory_required = "140GB VRAM"

# Qwen3 MoE Model  
model_size = "235B parameters"
active_params = "22B parameters"  # Only 9.4% activation
inference_cost = "$"               # 90% cost reduction
memory_required = "56GB VRAM"      # 60% memory reduction
performance = "Superior to 70B models"

What is Docker Model Runner?

  • Docker Model Runner (DMR) is a tool that makes it easy to manage, run, and deploy AI models using Docker.
  • It allows developers to pull, run, and serve large language models (LLMs) and other AI models directly from Docker Hub or any OCI-compliant registry.
  • DMR integrates with Docker Desktop and Docker Engine, enabling you to serve models via OpenAI-compatible APIs, interact with models from the command line, and manage them through a graphical interface.
  • Models are cached locally after the first pull and are loaded into memory only at runtime to optimize resource usage.
  • DMR supports both command-line and API-based interactions, making it suitable for building generative AI applications, experimenting with ML workflows, or integrating AI into software development pipelines

Quick Start

# Pull latest model (8B-Q4_K_M)
docker model pull ai/qwen3

All Available Models

# Lightweight models (< 2GB)
docker model pull ai/qwen3:0.6B-Q4_0        # 442MB
docker model pull ai/qwen3:0.6B-Q4_K_M      # 456MB
docker model pull ai/qwen3:0.6B-F16         # 1.40GB

# Standard models
docker model pull ai/qwen3:8B-Q4_0          # 4.44GB
docker model pull ai/qwen3:8B-Q4_K_M        # 4.68GB (recommended)
docker model pull ai/qwen3:8B-F16           # 15.26GB
docker model pull ai/qwen3:14B-Q6_K         # 11.28GB

# MoE models
docker model pull ai/qwen3:30B-A3B-Q4_K_M   # 17.28GB
docker model pull ai/qwen3:30B-A3B-F16      # 56.89GB

Basic Usage Examples

1. Simple Chat Interface

docker model run ai/qwen3:8B-Q4_K_M
> Hello, can you help me write Python code?
Assistant: I'd be happy to help you with Python code! What specific task or problem would you like me to help you with?

> Create a function to calculate fibonacci numbers

2. Docker Model Runner Rest API

Once Docker Model Runner is enabled, you can interact with your models programmatically using OpenAI-compatible REST API endpoints. Here’s an example of how to send a chat completion request from the host using curl:

curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

This request sends a prompt to the model and receives a response, just like you would with the OpenAI API. The base URL and port may vary depending on your setup (e.g., localhost:12434 for host access with TCP enabled)

Thinking Mode Examples

Enable Thinking Mode

docker model run ai/qwen3:8B-Q4_K_M


> /think Solve this step by step: What is 127 × 43?
To find the product of 127 and 43, I will follow the order of operations (PEMDAS):

1. Multiply 127 and 40 (which is 43 - 3):
   127 × 40 = 5080

2. Multiply 127 and 3:
   127 × 3 = 381

3. Add the results of steps 1 and 2:
   5080 + 381 = 5461

The product of 127 and 43 is 5461.

The answer is 5,461.

> /no_think What is 2+2?
4

Determine the base URL

The base URL to interact with the endpoints depends on how you run Docker:

  • From containers: http://model-runner.docker.internal/
  • From host processes: http://localhost:12434/, assuming TCP host access is enabled on the default port (12434).

Join our Discord Server