Ollama is an open-source framework that lets you run large language models (LLMs) locally on your own computer instead of using cloud-based AI services. It’s designed to make running these powerful AI models simple and accessible to individual users and developers.
Key features of Ollama include:
- Local execution – All processing happens on your own hardware, providing privacy and eliminating the need for internet connectivity after model download
- Model library – Supports various open-source LLMs like Llama, Mistral, Vicuna, and many others
- Simple interface – Provides an easy command-line interface and API for interacting with models
- Resource optimization – Includes tools to manage memory usage and optimize performance based on your hardware capabilities
- Customization – Allows you to create and modify models with custom system prompts using Modelfiles
- Cross-platform – Works on macOS, Linux, and Windows
Ollama Cheatsheet
Ollama is a lightweight, open-source framework for running large language models (LLMs) locally on your machine. This cheatsheet provides a quick reference for common Ollama commands and configurations to help you get started and make the most of your local AI models.
Installation
Platform | Command |
---|---|
macOS/Linux | curl -fsSL https://ollama.com/install.sh \ |
Windows | Download from https://ollama.com/download/windows |
Basic Commands
Action | Command |
---|---|
Run a model | ollama run llama3 |
List models | ollama list |
Pull a model | ollama pull mistral |
Remove a model | ollama rm codellama |
Show model info | ollama show llama3 |
Running Models
Action | Command |
---|---|
Start chat session | ollama run llama3 |
Run with parameters | ollama run llama3:13b –temperature 0.7 –top_p 0.9 |
One-shot generation | echo “Write a poem about coding” \ |
Save output to file | echo “Explain Docker” \ |
Multiline input | ollama run llama3 << EOF Please write a short story about artificial intelligence. EOF |
API Usage
Action | Command |
---|---|
Generate response | curl -X POST http://localhost:11434/api/generate -d ‘{ “model”: “llama3”, “prompt”: “What is Docker?” }’ |
Chat with history | curl -X POST http://localhost:11434/api/chat -d ‘{ “model”: “llama3”, “messages”: [ { “role”: “user”, “content”: “Hello, who are you?” }, { “role”: “assistant”, “content”: “I am an AI assistant.” }, { “role”: “user”, “content”: “What can you do?” } ] }’ |
Advanced Usage
Action | Command |
---|---|
Create custom model | ollama create custom-llama -f Modelfile |
Example Modelfile | FROM llama3 SYSTEM “You are a helpful programming assistant focused on Python.” |
Run with GPU limits | OLLAMA_GPU_LAYERS=35 ollama run llama3 |
Multi-modal with image | ollama run llava < image.jpg |
Performance Tips
Tip | Command |
---|---|
Reduce memory usage | OLLAMA_HOST=0.0.0.0 OLLAMA_KEEP_ALIVE=5m ollama serve |
Run quantized models | ollama run llama3:8b-q4_0 |
Use mmap (Linux) | OLLAMA_MMAP=1 ollama run llama3 |
Environment Variables
Variable | Description |
---|---|
OLLAMA_HOST | Default: 127.0.0.1 (use 0.0.0.0 for network access) |
OLLAMA_PORT | Default: 11434 |
OLLAMA_MODELS | Custom path to store models |
OLLAMA_KEEP_ALIVE | Duration to keep models loaded (e.g., 5m, 1h) |
OLLAMA_GPU_LAYERS | Number of layers to offload to GPU |
Ollama essentially bridges the gap between powerful AI capabilities and local computing, making it possible to have conversations with AI, generate text, answer questions, and create content without sending your data to third-party services. It’s particularly useful for developers who want to integrate AI into their applications while maintaining data privacy or for users who want to experiment with AI without recurring subscription costs.