Ollama Cheatsheet 2025

Table of Contents

Ollama is an open-source framework that lets you run large language models (LLMs) locally on your own computer instead of using cloud-based AI services. It’s designed to make running these powerful AI models simple and accessible to individual users and developers.

Key features of Ollama include:

Local execution – All processing happens on your own hardware, providing privacy and eliminating the need for internet connectivity after model download
Model library – Supports various open-source LLMs like Llama, Mistral, Vicuna, and many others
Simple interface – Provides an easy command-line interface and API for interacting with models
Resource optimization – Includes tools to manage memory usage and optimize performance based on your hardware capabilities
Customization – Allows you to create and modify models with custom system prompts using Modelfiles
Cross-platform – Works on macOS, Linux, and Windows

Ollama Cheatsheet

Ollama is a lightweight, open-source framework for running large language models (LLMs) locally on your machine. This cheatsheet provides a quick reference for common Ollama commands and configurations to help you get started and make the most of your local AI models.

Installation

Platform	Command
macOS/Linux	curl -fsSL https://ollama.com/install.sh \
Windows	Download from https://ollama.com/download/windows

Basic Commands

Action	Command
Run a model	ollama run llama3
List models	ollama list
Pull a model	ollama pull mistral
Remove a model	ollama rm codellama
Show model info	ollama show llama3

Running Models

Action	Command
Start chat session	ollama run llama3
Run with parameters	ollama run llama3:13b –temperature 0.7 –top_p 0.9
One-shot generation	echo “Write a poem about coding” \
Save output to file	echo “Explain Docker” \
Multiline input	ollama run llama3 << EOF Please write a short story about artificial intelligence. EOF

API Usage

Action	Command
Generate response	curl -X POST http://localhost:11434/api/generate -d ‘{ “model”: “llama3”, “prompt”: “What is Docker?” }’
Chat with history	curl -X POST http://localhost:11434/api/chat -d ‘{ “model”: “llama3”, “messages”: [ { “role”: “user”, “content”: “Hello, who are you?” }, { “role”: “assistant”, “content”: “I am an AI assistant.” }, { “role”: “user”, “content”: “What can you do?” } ] }’

Advanced Usage

Action	Command
Create custom model	ollama create custom-llama -f Modelfile
Example Modelfile	FROM llama3 SYSTEM “You are a helpful programming assistant focused on Python.”
Run with GPU limits	OLLAMA_GPU_LAYERS=35 ollama run llama3
Multi-modal with image	ollama run llava < image.jpg

Performance Tips

Tip	Command
Reduce memory usage	OLLAMA_HOST=0.0.0.0 OLLAMA_KEEP_ALIVE=5m ollama serve
Run quantized models	ollama run llama3:8b-q4_0
Use mmap (Linux)	OLLAMA_MMAP=1 ollama run llama3

Environment Variables

Variable	Description
OLLAMA_HOST	Default: 127.0.0.1 (use 0.0.0.0 for network access)
OLLAMA_PORT	Default: 11434
OLLAMA_MODELS	Custom path to store models
OLLAMA_KEEP_ALIVE	Duration to keep models loaded (e.g., 5m, 1h)
OLLAMA_GPU_LAYERS	Number of layers to offload to GPU

Ollama essentially bridges the gap between powerful AI capabilities and local computing, making it possible to have conversations with AI, generate text, answer questions, and create content without sending your data to third-party services. It’s particularly useful for developers who want to integrate AI into their applications while maintaining data privacy or for users who want to experiment with AI without recurring subscription costs.