Join our Discord Server
Tanvir Kour Tanvir Kour is a passionate technical blogger and open source enthusiast. She is a graduate in Computer Science and Engineering and has 4 years of experience in providing IT solutions. She is well-versed with Linux, Docker and Cloud-Native application. You can connect to her via Twitter https://x.com/tanvirkour

Complete Ollama Guide: Installation, Usage & Code Examples

4 min read

What is Ollama?

Ollama is a lightweight, extensible framework for building and running large language models locally. Run LLaMA, Mistral, CodeLlama, and other models on your machine without cloud dependencies.

Quick Installation

macOS

# Download and install
curl -fsSL https://ollama.com/install.sh | sh

# Alternative: Homebrew
brew install ollama

Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

# Download installer from ollama.com
# Or use winget
winget install Ollama.Ollama

Docker Installation

# Run Ollama in Docker
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Pull and run a model
docker exec -it ollama ollama run llama2

Starting Ollama Service

# Start Ollama service
ollama serve

# Run in background (Linux/macOS)
nohup ollama serve > ollama.log 2>&1 &

Basic Model Operations

Pull Models

# Pull LLaMA 2 (7B)
ollama pull llama2

# Pull specific model versions
ollama pull llama2:13b
ollama pull llama2:70b

# Pull other popular models
ollama pull mistral
ollama pull codellama
ollama pull vicuna
ollama pull neural-chat

List Available Models

# List installed models
ollama list

# Show model information
ollama show llama2

Remove Models

# Remove specific model
ollama rm llama2:13b

# Remove all versions of a model
ollama rm llama2

Running Models

Interactive Chat

# Start interactive session
ollama run llama2

# Run with specific parameters
ollama run llama2 --temperature 0.7

# Exit interactive mode
/bye

Single Prompt

# One-time prompt
ollama run llama2 "Explain quantum computing"

# With custom temperature
ollama run llama2 "Write Python code for sorting" --temperature 0.1

API Usage

REST API Examples

Basic Chat Completion

# Simple chat request
curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "messages": [
      {
        "role": "user",
        "content": "Why is the sky blue?"
      }
    ]
  }'

Streaming Response

# Stream response
curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function"
      }
    ],
    "stream": true
  }'

Generate Text

# Generate completion
curl http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama2",
    "prompt": "The future of AI is",
    "stream": false
  }'

Python Integration

Basic Python Client

import requests
import json

def chat_with_ollama(model, message):
    url = "http://localhost:11434/api/chat"
    data = {
        "model": model,
        "messages": [{"role": "user", "content": message}],
        "stream": False
    }
    
    response = requests.post(url, json=data)
    return response.json()["message"]["content"]

# Usage
result = chat_with_ollama("llama2", "Explain machine learning")
print(result)

Streaming Python Client

import requests
import json

def stream_chat(model, message):
    url = "http://localhost:11434/api/chat"
    data = {
        "model": model,
        "messages": [{"role": "user", "content": message}],
        "stream": True
    }
    
    with requests.post(url, json=data, stream=True) as response:
        for line in response.iter_lines():
            if line:
                chunk = json.loads(line)
                if "message" in chunk:
                    yield chunk["message"]["content"]

# Usage
for token in stream_chat("llama2", "Write a story"):
    print(token, end="", flush=True)

Async Python Client

import aiohttp
import asyncio
import json

async def async_chat(model, message):
    url = "http://localhost:11434/api/chat"
    data = {
        "model": model,
        "messages": [{"role": "user", "content": message}],
        "stream": False
    }
    
    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=data) as response:
            result = await response.json()
            return result["message"]["content"]

# Usage
async def main():
    result = await async_chat("llama2", "Explain async programming")
    print(result)

asyncio.run(main())

JavaScript/Node.js Integration

Basic Node.js Client

const axios = require('axios');

async function chatWithOllama(model, message) {
    const response = await axios.post('http://localhost:11434/api/chat', {
        model: model,
        messages: [{ role: 'user', content: message }],
        stream: false
    });
    
    return response.data.message.content;
}

// Usage
chatWithOllama('llama2', 'Explain JavaScript promises')
    .then(result => console.log(result))
    .catch(err => console.error(err));

Browser Integration

<!DOCTYPE html>
<html>
<head>
    <title>Ollama Chat</title>
</head>
<body>
    <div id="chat"></div>
    <input type="text" id="input" placeholder="Ask something...">
    <button onclick="sendMessage()">Send</button>

    <script>
        async function sendMessage() {
            const input = document.getElementById('input');
            const message = input.value;
            
            try {
                const response = await fetch('http://localhost:11434/api/chat', {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                    },
                    body: JSON.stringify({
                        model: 'llama2',
                        messages: [{ role: 'user', content: message }],
                        stream: false
                    })
                });
                
                const data = await response.json();
                document.getElementById('chat').innerHTML += 
                    `<p><strong>You:</strong> ${message}</p>
                     <p><strong>AI:</strong> ${data.message.content}</p>`;
                
                input.value = '';
            } catch (error) {
                console.error('Error:', error);
            }
        }
    </script>
</body>
</html>

Custom Model Configuration

Create Modelfile

# Modelfile for custom configuration
FROM llama2

# Set custom parameters
PARAMETER temperature 0.8
PARAMETER top_p 0.9
PARAMETER top_k 40

# Set custom system prompt
SYSTEM """
You are a helpful coding assistant. Always provide code examples.
"""

# Set custom template
TEMPLATE """{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
"""

Build Custom Model

# Create custom model
ollama create my-assistant -f ./Modelfile

# Run custom model
ollama run my-assistant

Advanced Configuration

Environment Variables

# Set custom host and port
export OLLAMA_HOST=0.0.0.0:11434

# Set model storage location
export OLLAMA_MODELS=/custom/path/models

# Enable debug logging
export OLLAMA_DEBUG=1

# Set maximum concurrent requests
export OLLAMA_MAX_LOADED_MODELS=3

GPU Configuration

# Check GPU availability
ollama run llama2 --verbose

# Force CPU usage
OLLAMA_NUM_GPU=0 ollama serve

# Use specific GPU
CUDA_VISIBLE_DEVICES=0 ollama serve

Memory Management

# Set context window size
ollama run llama2 --context-length 4096

# Set batch size for processing
ollama run llama2 --batch-size 512

Popular Models and Use Cases

Code Generation

# CodeLlama for programming
ollama pull codellama

# Use for code generation
ollama run codellama "Write a Python web scraper"

# Code completion
ollama run codellama:code "def fibonacci(n):"

Specialized Models

# For chat/conversation
ollama pull vicuna

# For reasoning tasks
ollama pull neural-chat

# For multilingual tasks
ollama pull orca-mini

# For SQL generation
ollama pull sqlcoder

Troubleshooting

Common Issues

Port Already in Use

# Kill existing Ollama process
pkill ollama

# Or find and kill specific PID
lsof -i :11434
kill -9 <PID>

# Start with different port
OLLAMA_HOST=0.0.0.0:11435 ollama serve

Out of Memory

# Check system resources
free -h  # Linux
vm_stat  # macOS

# Use smaller model
ollama pull llama2:7b  # Instead of 13b or 70b

# Reduce context length
ollama run llama2 --context-length 2048

Model Download Issues

# Clear corrupted downloads
rm -rf ~/.ollama/models/incomplete

# Download with resume support
ollama pull llama2 --insecure

Performance Optimization

System Tuning

# Increase file descriptors (Linux)
ulimit -n 65536

# Set memory limits
export OLLAMA_MAX_VRAM=8GB

# Enable memory mapping
export OLLAMA_MMAP=1

Monitoring

# Check model status
curl http://localhost:11434/api/tags

# Health check
curl http://localhost:11434/api/version

# Resource usage
curl http://localhost:11434/api/ps

Docker Compose Setup

version: '3.8'
services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0:11434
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

volumes:
  ollama_data:

Integration Examples

FastAPI Integration

from fastapi import FastAPI
import requests

app = FastAPI()

@app.post("/chat")
async def chat_endpoint(message: str, model: str = "llama2"):
    response = requests.post("http://localhost:11434/api/chat", json={
        "model": model,
        "messages": [{"role": "user", "content": message}],
        "stream": False
    })
    return {"response": response.json()["message"]["content"]}

# Run with: uvicorn main:app --reload

Express.js Integration

const express = require('express');
const axios = require('axios');

const app = express();
app.use(express.json());

app.post('/chat', async (req, res) => {
    try {
        const { message, model = 'llama2' } = req.body;
        
        const response = await axios.post('http://localhost:11434/api/chat', {
            model,
            messages: [{ role: 'user', content: message }],
            stream: false
        });
        
        res.json({ response: response.data.message.content });
    } catch (error) {
        res.status(500).json({ error: error.message });
    }
});

app.listen(3000, () => console.log('Server running on port 3000'));

CLI Automation Scripts

Batch Processing

#!/bin/bash
# batch_process.sh

MODEL="llama2"
INPUT_FILE="prompts.txt"
OUTPUT_DIR="outputs"

mkdir -p "$OUTPUT_DIR"

while IFS= read -r prompt; do
    echo "Processing: $prompt"
    output_file="$OUTPUT_DIR/$(echo "$prompt" | sed 's/[^a-zA-Z0-9]/_/g').txt"
    ollama run "$MODEL" "$prompt" > "$output_file"
done < "$INPUT_FILE"

Model Management Script

#!/bin/bash
# manage_models.sh

case "$1" in
    "install")
        ollama pull llama2
        ollama pull codellama
        ollama pull mistral
        ;;
    "cleanup")
        ollama list | grep -v "NAME" | awk '{print $1}' | xargs -I {} ollama rm {}
        ;;
    "status")
        ollama list
        ;;
    *)
        echo "Usage: $0 {install|cleanup|status}"
        exit 1
        ;;
esac

This comprehensive guide covers installation, basic usage, API integration, troubleshooting, and advanced configurations for Ollama, providing developers with practical code examples for immediate implementation.

Have Queries? Join https://launchpass.com/collabnix

Tanvir Kour Tanvir Kour is a passionate technical blogger and open source enthusiast. She is a graduate in Computer Science and Engineering and has 4 years of experience in providing IT solutions. She is well-versed with Linux, Docker and Cloud-Native application. You can connect to her via Twitter https://x.com/tanvirkour
Join our Discord Server
Index