A cheat-sheet is a concise summary of important information meant to be used as a quick reference. In the context of Ollama, this cheatsheet covers all commonly used commands for server management, model management, interactive session controls, API usage, custom model creation, and key environment variables — so you can run local LLMs fast without digging through docs.
📋 Table of Contents
- 🖥️ Server Management
- 🧰 Model Management
- 💬 Interactive Session Commands
- ⚙️ Run Parameters
- 🔧 System & Configuration
- 🌐 API Usage via CLI
- 🏗️ Modelfile & Custom Models
🖥️ Server Management
| Command | Description |
|---|---|
ollama start | Start the Ollama server |
ollama serve | Start the Ollama server (alias for start) |
ollama ps | Check if server is running / list running models |
ollama ps --verbose | Check server status with system resource details |
ollama logs | View Ollama server logs |
ollama prune | Clear the model cache |
rm -rf ~/.ollama | Fully reset Ollama (removes all data) |
Server Management Commands
💡 Tip: On Linux with Ollama installed as a systemd service, use journalctl -u ollama.service to view logs instead of ollama logs. Stop the server with Ctrl+C or by killing the process.
🧰 Model Management
| Command | Description |
|---|---|
ollama list | List all locally available (pulled) models |
ollama pull qwen2.5:0.5b | Pull / download a model from the Ollama registry |
ollama pull mistral:1b | Pull the Mistral 1B model |
ollama pull gemma3:1b | Pull the Gemma 3 1B model |
ollama rm mistral:1b | Remove a locally stored model |
ollama show gemma3:1b | Show model information and metadata |
ollama show gemma3:1b --verbose | Show detailed model information |
ollama cp qwen2.5:0.5b qwen2.5-mydev | Copy a model under a new name |
ollama run qwen2.5:0.5b | Run a model in interactive mode |
ollama run qwen2.5:0.5b --stream | Run a model with streaming output |
ollama generate qwen2.5:0.5b "prompt" | Single non-interactive prompt generation |
ollama generate qwen2.5:0.5b "prompt" --format json | Generate output in JSON format |
Model Management Commands
📦 Models: Browse all available models at ollama.com/models. Use the tag suffix (e.g. :1b, :7b) to pull a specific parameter size.
💬 Interactive Session Commands
Once a model is running interactively with ollama run <model>, use the following >>> prompt commands:
Help & Session Control
| Command | Description |
|---|---|
>>> /? | Show help |
>>> /help | Show help (alias) |
>>> /clear | Clear the current session context |
>>> /bye | Exit interactive mode |
>>> """multi-line prompt""" | Send a multi-line prompt using triple quotes |
Session Control
Session Parameters (/set)
| Command | Description |
|---|---|
>>> /set parameter seed 13 | Set random number seed for reproducibility |
>>> /set parameter num_predict 100 | Max number of tokens to predict |
>>> /set parameter top_k 3 | Pick from top K tokens at each step |
>>> /set parameter top_p 0.5 | Pick tokens based on cumulative probability |
>>> /set parameter min_p 0.1 | Discard tokens below this probability threshold |
>>> /set parameter num_ctx 1024 | Set the context window size |
>>> /set parameter temperature 0.5 | Set creativity level (higher = more creative) |
>>> /set parameter stop word1 word2 | Set stop words to end generation |
>>> /set system "message" | Set the system prompt for the session |
>>> /set history | Enable CLI prompt history (Up/Down arrow recall) |
>>> /set nohistory | Stop recording prompt history |
>>> /set format json | Set output format to JSON |
>>> /set noformat | Remove output formatting |
>>> /set verbose | Show LLM generation stats after each response |
>>> /set quiet | Disable LLM stats display |
Interactive Session Parameters
Model Info (/show)
| Command | Description |
|---|---|
>>> /show | Show model information summary |
>>> /show info | Show detailed info about the current model |
>>> /show license | Show the model’s license information |
>>> /show modelfile | Show the Modelfile for the current model |
>>> /show parameters | Show all parameters set for this model |
>>> /show system | Show the system message in use |
>>> /show template | Show the prompt template for this model |
Model Info Commands
⚙️ Run Parameters
| Command | Description |
|---|---|
ollama run gemma3:1b --temperature 0.8 | Run with higher temperature for more creative output |
ollama run gemma3:1b --num-ctx 4096 | Run with a larger context window (4096 tokens) |
ollama run qwen2.5:0.5b --stream | Run with streaming token output |
Run-time Parameters
🔧 System & Configuration
| Command / Variable | Description |
|---|---|
ollama version | Check the installed Ollama version |
export OLLAMA_HOST=0.0.0.0:11434 | Expose Ollama server on all network interfaces |
export OLLAMA_MODELS=/path/to/models | Set a custom directory to store models |
curl -fsSL https://ollama.com/install.sh | sh | Update Ollama to the latest version (Linux/macOS) |
System & Configuration
💡 Windows: On Windows, Ollama will automatically prompt you to update when a new version is available.
🌐 API Usage via CLI
Ollama exposes a REST API on http://localhost:11434 by default. Use curl to interact with it directly.
Generate (Text Completion)
# Single prompt generation via API
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{"model": "llama2", "prompt": "Hello world"}'
Chat (Multi-turn)
# Chat with message history
curl http://localhost:11434/api/chat -d '{
"model": "llama2",
"messages": [{"role": "user", "content": "Hello"}]
}'
List Models
# List all available models via API curl http://localhost:11434/api/tags
| Endpoint | Description |
|---|---|
POST /api/generate | Single-turn text generation |
POST /api/chat | Multi-turn chat with message history |
GET /api/tags | List all locally available models |
POST /api/pull | Pull a model via API |
DELETE /api/delete | Delete a model via API |
Ollama REST API Endpoints
🏗️ Modelfile & Custom Models
A Modelfile lets you create a customised model with your own parameters and system prompt, based on any existing model.
Example Modelfile
# Basic Modelfile — save as ./Modelfile FROM qwen2.5:0.5b PARAMETER temperature 0.8 PARAMETER num_ctx 4096 SYSTEM "You are a smart and focused AI Agent."
Modelfile Instructions
| Instruction | Description |
|---|---|
FROM <model> | Base model to build from (required) |
PARAMETER temperature | Set the temperature (creativity level) |
PARAMETER num_ctx | Set the context window size |
PARAMETER top_k | Set top-K sampling |
PARAMETER top_p | Set top-P (nucleus) sampling |
PARAMETER seed | Set random seed for reproducibility |
SYSTEM "message" | Set the system prompt for the model |
TEMPLATE | Override the prompt template |
LICENSE | Specify the model license |
Modelfile Instructions
Create & Use Custom Model
| Command | Description |
|---|---|
ollama create myagentmodel -f ./Modelfile | Create a custom model from a Modelfile |
ollama run myagentmodel | Run your custom model interactively |
ollama show myagentmodel --modelfile | Inspect the Modelfile of a custom model |
ollama push myagentmodel | Push a custom model to the Ollama registry |
ollama rm myagentmodel | Remove the custom model |