Join our Discord Server
Tanvir Kour Tanvir Kour is a passionate technical blogger and open source enthusiast. She is a graduate in Computer Science and Engineering and has 4 years of experience in providing IT solutions. She is well-versed with Linux, Docker and Cloud-Native application. You can connect to her via Twitter https://x.com/tanvirkour

The Complete Ollama Guide 2025: From Zero to AI Hero (With 50+ Code Examples!)

21 min read

Run ChatGPT-level AI models on your laptop for FREE – No API bills, complete privacy, and unlimited usage!

Ollama has revolutionized local AI by making it ridiculously easy to run powerful language models on your own hardware. Think of it as “Docker for AI models” – one command and you’re running models that rival ChatGPT, completely offline.

Why Developers Are Switching to Ollama:

Zero API costs – Run unlimited queries
Complete privacy – Data never leaves your machine
Offline capable – Works without internet
OpenAI API compatible – Drop-in replacement for existing apps
100+ models available – From 1B to 70B+ parameters


Installation Guide for Every Platform

macOS Installation

# Download and install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

# Start the service
ollama serve

Windows Installation

# Option 1: Download from official website
# Visit https://ollama.com and download .exe installer

# Option 2: Using winget
winget install Ollama.Ollama

# Verify installation
ollama --version

# Start Ollama
ollama serve

Linux Installation (Ubuntu/Debian)

# One-line install
curl -fsSL https://ollama.com/install.sh | sh

# Alternative: Manual installation
sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama
sudo chmod +x /usr/bin/ollama

# Create systemd service
sudo tee /etc/systemd/system/ollama.service > /dev/null <<EOF
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target
EOF

# Start and enable service
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

# Check status
sudo systemctl status ollama

Docker Installation

# Run Ollama in Docker
docker run -d \
  --name ollama \
  -p 11434:11434 \
  -v ollama:/root/.ollama \
  ollama/ollama

# For GPU support (NVIDIA)
docker run -d \
  --gpus=all \
  --name ollama-gpu \
  -p 11434:11434 \
  -v ollama:/root/.ollama \
  ollama/ollama

# Pull a model
docker exec -it ollama ollama pull llama3.1:8b

Quick Start Test

# Pull your first model
ollama pull llama3.1:8b

# Start chatting
ollama run llama3.1:8b "Hello! Explain quantum computing in simple terms."

# List installed models
ollama list

# Check system info
ollama ps

Best Models in 2025: Performance Comparison

🏆 Top Recommended Models by Use Case

General Purpose & Chat

# Best overall balance (8GB+ RAM required)
ollama pull llama3.1:8b

# Lightweight option (4GB RAM)
ollama pull phi3:mini

# Multilingual powerhouse (16GB+ RAM)
ollama pull qwen2.5:14b

Coding & Development

# Code generation specialist
ollama pull deepseek-coder:6.7b

# Python expert
ollama pull codellama:13b-python

# General coding (34B version for best quality)
ollama pull codellama:34b-instruct

Vision & Multimodal

# Image understanding
ollama pull llava:13b

# Latest vision model
ollama pull llava:34b

# Document analysis
ollama pull qwen2-vl:7b

Model Performance Comparison Table

ModelSizeRAM RequiredBest ForSpeedQuality
phi3:mini3.8B4GBQuick tasks⭐⭐⭐⭐⭐⭐⭐⭐
mistral:7b7B8GBGeneral use⭐⭐⭐⭐⭐⭐⭐⭐
llama3.1:8b8B8GBBest balance⭐⭐⭐⭐⭐⭐⭐⭐⭐
codellama:13b13B16GBCoding⭐⭐⭐⭐⭐⭐⭐⭐
llama3.1:70b70B64GB+Maximum quality⭐⭐⭐⭐⭐⭐⭐

Model Selection Script

#!/usr/bin/env python3
"""
Ollama Model Recommender
Suggests the best model based on your hardware and use case
"""

import psutil
import subprocess
import json

def get_system_info():
    """Get system RAM and GPU info"""
    ram_gb = psutil.virtual_memory().total / (1024**3)
    
    try:
        # Check for NVIDIA GPU
        result = subprocess.run(['nvidia-smi', '--query-gpu=memory.total', '--format=csv,noheader,nounits'], 
                              capture_output=True, text=True)
        if result.returncode == 0:
            gpu_memory = int(result.stdout.strip()) / 1024  # Convert MB to GB
        else:
            gpu_memory = 0
    except FileNotFoundError:
        gpu_memory = 0
    
    return ram_gb, gpu_memory

def recommend_model(ram_gb, gpu_memory, use_case):
    """Recommend best model based on hardware and use case"""
    recommendations = {
        "coding": {
            "low": "phi3:mini",
            "medium": "deepseek-coder:6.7b", 
            "high": "codellama:34b-instruct"
        },
        "general": {
            "low": "phi3:mini",
            "medium": "llama3.1:8b",
            "high": "llama3.1:70b"
        },
        "vision": {
            "low": "moondream:1.8b",
            "medium": "llava:7b",
            "high": "llava:34b"
        }
    }
    
    total_memory = max(ram_gb, gpu_memory)
    
    if total_memory < 8:
        tier = "low"
    elif total_memory < 32:
        tier = "medium"
    else:
        tier = "high"
    
    return recommendations.get(use_case, recommendations["general"])[tier]

def install_model(model_name):
    """Install recommended model"""
    print(f"Installing {model_name}...")
    result = subprocess.run(['ollama', 'pull', model_name], capture_output=True, text=True)
    
    if result.returncode == 0:
        print(f"✅ Successfully installed {model_name}")
        return True
    else:
        print(f"❌ Failed to install {model_name}: {result.stderr}")
        return False

if __name__ == "__main__":
    ram, gpu_mem = get_system_info()
    print(f"System Info: {ram:.1f}GB RAM, {gpu_mem:.1f}GB GPU")
    
    use_case = input("Use case (coding/general/vision): ").lower()
    recommended = recommend_model(ram, gpu_mem, use_case)
    
    print(f"Recommended model: {recommended}")
    
    if input("Install now? (y/N): ").lower() == 'y':
        install_model(recommended)

OpenAI API Compatibility: Drop-in Replacement

Ollama’s killer feature is 100% OpenAI API compatibility. Replace expensive API calls with local models without changing a single line of code!

Basic Setup

# Start Ollama with OpenAI compatibility
ollama serve

# Test the API
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain machine learning in 3 sentences."}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

Python Implementation

#!/usr/bin/env python3
"""
Drop-in OpenAI replacement using Ollama
Just change the base_url and api_key!
"""

from openai import OpenAI
import os

# Initialize Ollama client (OpenAI compatible)
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Required but unused
)

class OllamaChat:
    def __init__(self, model="llama3.1:8b"):
        self.model = model
        self.conversation = []
    
    def chat(self, message, system_prompt=None):
        """Send a chat message and get response"""
        messages = []
        
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        
        # Add conversation history
        messages.extend(self.conversation)
        messages.append({"role": "user", "content": message})
        
        try:
            response = client.chat.completions.create(
                model=self.model,
                messages=messages,
                temperature=0.7,
                max_tokens=1000
            )
            
            ai_response = response.choices[0].message.content
            
            # Update conversation history
            self.conversation.append({"role": "user", "content": message})
            self.conversation.append({"role": "assistant", "content": ai_response})
            
            return ai_response
            
        except Exception as e:
            return f"Error: {str(e)}"
    
    def stream_chat(self, message):
        """Stream chat response in real-time"""
        messages = self.conversation + [{"role": "user", "content": message}]
        
        stream = client.chat.completions.create(
            model=self.model,
            messages=messages,
            temperature=0.7,
            stream=True
        )
        
        full_response = ""
        for chunk in stream:
            if chunk.choices[0].delta.content is not None:
                content = chunk.choices[0].delta.content
                print(content, end="", flush=True)
                full_response += content
        
        print()  # New line after streaming
        
        # Update conversation
        self.conversation.extend([
            {"role": "user", "content": message},
            {"role": "assistant", "content": full_response}
        ])
        
        return full_response
    
    def clear_conversation(self):
        """Reset conversation history"""
        self.conversation = []

# Example usage
if __name__ == "__main__":
    chat = OllamaChat("llama3.1:8b")
    
    # Regular chat
    response = chat.chat("What is Python?")
    print(f"AI: {response}")
    
    # Follow-up question (with context)
    response = chat.chat("Give me a simple example")
    print(f"AI: {response}")
    
    # Streaming response
    print("\nStreaming response:")
    chat.stream_chat("Write a Python function to sort a list")

Node.js Implementation

// npm install openai
const OpenAI = require('openai');

const client = new OpenAI({
    baseURL: 'http://localhost:11434/v1',
    apiKey: 'ollama' // Required but unused
});

class OllamaChat {
    constructor(model = 'llama3.1:8b') {
        this.model = model;
        this.conversation = [];
    }

    async chat(message, systemPrompt = null) {
        const messages = [];
        
        if (systemPrompt) {
            messages.push({ role: 'system', content: systemPrompt });
        }
        
        messages.push(...this.conversation);
        messages.push({ role: 'user', content: message });

        try {
            const response = await client.chat.completions.create({
                model: this.model,
                messages: messages,
                temperature: 0.7,
                max_tokens: 1000
            });

            const aiResponse = response.choices[0].message.content;
            
            // Update conversation history
            this.conversation.push({ role: 'user', content: message });
            this.conversation.push({ role: 'assistant', content: aiResponse });
            
            return aiResponse;
        } catch (error) {
            return `Error: ${error.message}`;
        }
    }

    async streamChat(message) {
        const messages = [...this.conversation, { role: 'user', content: message }];
        
        const stream = await client.chat.completions.create({
            model: this.model,
            messages: messages,
            temperature: 0.7,
            stream: true
        });

        let fullResponse = '';
        
        for await (const chunk of stream) {
            if (chunk.choices[0]?.delta?.content) {
                const content = chunk.choices[0].delta.content;
                process.stdout.write(content);
                fullResponse += content;
            }
        }
        
        console.log(); // New line
        
        // Update conversation
        this.conversation.push({ role: 'user', content: message });
        this.conversation.push({ role: 'assistant', content: fullResponse });
        
        return fullResponse;
    }

    clearConversation() {
        this.conversation = [];
    }
}

// Example usage
async function main() {
    const chat = new OllamaChat('llama3.1:8b');
    
    // Regular chat
    const response = await chat.chat('Explain async/await in JavaScript');
    console.log('AI:', response);
    
    // Streaming response
    console.log('\nStreaming response:');
    await chat.streamChat('Give me a practical example');
}

main().catch(console.error);

Migration Script for Existing OpenAI Apps

#!/usr/bin/env python3
"""
Migrate existing OpenAI applications to Ollama
Replace API calls automatically
"""

import re
import os
import shutil
from pathlib import Path

def migrate_python_file(file_path):
    """Migrate a Python file from OpenAI to Ollama"""
    with open(file_path, 'r') as f:
        content = f.read()
    
    # Backup original file
    shutil.copy2(file_path, f"{file_path}.backup")
    
    # Replace OpenAI initialization
    content = re.sub(
        r'OpenAI\(\s*api_key\s*=\s*[^)]+\)',
        'OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")',
        content
    )
    
    # Replace model names
    model_mapping = {
        'gpt-3.5-turbo': 'llama3.1:8b',
        'gpt-4': 'llama3.1:70b',
        'gpt-4-turbo': 'qwen2.5:32b',
        'text-embedding-ada-002': 'nomic-embed-text'
    }
    
    for openai_model, ollama_model in model_mapping.items():
        content = re.sub(
            rf'["\']{{openai_model}}["\']',
            f'"{ollama_model}"',
            content
        )
    
    # Write modified content
    with open(file_path, 'w') as f:
        f.write(content)
    
    print(f"✅ Migrated {file_path}")

def migrate_project(project_dir):
    """Migrate an entire project"""
    project_path = Path(project_dir)
    
    # Find Python files
    python_files = list(project_path.rglob("*.py"))
    
    print(f"Found {len(python_files)} Python files to migrate")
    
    for file_path in python_files:
        if 'venv' not in str(file_path) and '.git' not in str(file_path):
            migrate_python_file(file_path)
    
    # Create migration info file
    info_content = """
# Ollama Migration Complete

Your project has been migrated to use Ollama instead of OpenAI.

## Changes Made:
- OpenAI client initialization updated to use localhost
- Model names mapped to Ollama equivalents
- Original files backed up with .backup extension

## Setup Required:
1. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
2. Pull required models:
   - ollama pull llama3.1:8b
   - ollama pull llama3.1:70b
   - ollama pull qwen2.5:32b
   - ollama pull nomic-embed-text

3. Start Ollama: ollama serve

Your app should now work with local models!
"""
    
    with open(project_path / "OLLAMA_MIGRATION.md", 'w') as f:
        f.write(info_content)

if __name__ == "__main__":
    project_dir = input("Enter project directory path: ")
    if os.path.exists(project_dir):
        migrate_project(project_dir)
        print("\n🎉 Migration complete! Check OLLAMA_MIGRATION.md for next steps.")
    else:
        print("❌ Directory not found!")

Common Problems & Solutions

Problem 1: “ollama: command not found”

# Solution 1: Check if Ollama is installed
which ollama

# Solution 2: Add to PATH (Linux/Mac)
echo 'export PATH=$PATH:/usr/local/bin' >> ~/.bashrc
source ~/.bashrc

# Solution 3: Reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Solution 4: Manual installation
sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/local/bin/ollama
sudo chmod +x /usr/local/bin/ollama

Problem 2: “Connection refused” or API not working

# Check if Ollama is running
ps aux | grep ollama

# Start Ollama service
ollama serve

# Check port availability
netstat -tulpn | grep 11434

# Test API endpoint
curl http://localhost:11434/api/version

# Fix CORS issues (if accessing from browser)
export OLLAMA_ORIGINS="*"
ollama serve

# Custom host binding
export OLLAMA_HOST="0.0.0.0:11434"
ollama serve

Problem 3: Model download failures

# Check internet connection
ping ollama.com

# Clear cache and retry
rm -rf ~/.ollama/models
ollama pull llama3.1:8b

# Download with specific tag
ollama pull llama3.1:8b-instruct-q4_0

# Check disk space
df -h

# Manual model verification
ollama list
ollama show llama3.1:8b

Problem 4: Out of memory errors

# Check available RAM
free -h

# List running models
ollama ps

# Stop unused models
ollama stop llama3.1:8b

# Configure memory limits
export OLLAMA_MAX_LOADED_MODELS=1
export OLLAMA_NUM_PARALLEL=1
ollama serve

# Use smaller model variants
ollama pull llama3.1:8b-q4_0  # 4-bit quantized

Problem 5: Slow performance

# Check GPU availability
nvidia-smi  # For NVIDIA
rocm-smi   # For AMD

# Enable GPU acceleration
export CUDA_VISIBLE_DEVICES=0
ollama serve

# Optimize CPU usage
export OLLAMA_NUM_PARALLEL=4
export OLLAMA_MAX_QUEUE=512

# Use faster models
ollama pull phi3:mini      # Fastest
ollama pull mistral:7b     # Good balance

Comprehensive Diagnostic Script

#!/bin/bash
# Ollama Health Check Script

echo "🔍 Ollama Diagnostic Report"
echo "=========================="

# Check installation
if command -v ollama &> /dev/null; then
    echo "✅ Ollama installed: $(ollama --version)"
else
    echo "❌ Ollama not found in PATH"
    exit 1
fi

# Check service status
if pgrep -x "ollama" > /dev/null; then
    echo "✅ Ollama service running"
else
    echo "⚠️  Ollama service not running"
    echo "   Start with: ollama serve"
fi

# Check API endpoint
if curl -s http://localhost:11434/api/version > /dev/null; then
    echo "✅ API endpoint responding"
else
    echo "❌ API endpoint not accessible"
fi

# Check system resources
echo ""
echo "💾 System Resources:"
echo "   RAM: $(free -h | awk '/^Mem:/ {print $2}')"
echo "   Available: $(free -h | awk '/^Mem:/ {print $7}')"

# Check GPU
if command -v nvidia-smi &> /dev/null; then
    echo "   GPU: $(nvidia-smi --query-gpu=name --format=csv,noheader,nounits | head -1)"
    echo "   VRAM: $(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits | head -1) MB"
else
    echo "   GPU: Not detected"
fi

# Check models
echo ""
echo "🤖 Installed Models:"
ollama list

echo ""
echo "📊 Performance Test:"
echo "Testing model response time..."
start_time=$(date +%s.%3N)
ollama run llama3.1:8b "Say hello" --verbose 2>/dev/null | head -1
end_time=$(date +%s.%3N)
duration=$(echo "$end_time - $start_time" | bc)
echo "   Response time: ${duration}s"

echo ""
echo "✅ Diagnostic complete!"

Advanced Use Cases with Code Examples

1. Code Review Assistant

#!/usr/bin/env python3
"""
Automated Code Review using Ollama
Analyzes code quality, suggests improvements
"""

from openai import OpenAI
import os
import subprocess
import git

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

class CodeReviewer:
    def __init__(self, model="deepseek-coder:33b"):
        self.model = model
        
    def review_file(self, file_path, file_content):
        """Review a single file"""
        prompt = f"""
        Review this {file_path} code for:
        - Security vulnerabilities
        - Performance issues  
        - Code style and best practices
        - Potential bugs
        - Suggestions for improvement

        Code:
        ```
        {file_content}
        ```

        Format your response as:
        ## Security Issues:
        ## Performance Issues:  
        ## Style Issues:
        ## Potential Bugs:
        ## Suggestions:
        """
        
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": "You are an expert code reviewer with 10+ years experience."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.1,
            max_tokens=2000
        )
        
        return response.choices[0].message.content
    
    def review_diff(self, diff_content):
        """Review git diff changes"""
        prompt = f"""
        Review this git diff for potential issues:
        
        {diff_content}
        
        Focus on:
        - Breaking changes
        - Security implications
        - Performance impact
        - Code quality
        """
        
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": "You are reviewing code changes in a pull request."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.1
        )
        
        return response.choices[0].message.content
    
    def review_repository(self, repo_path):
        """Review entire repository"""
        repo = git.Repo(repo_path)
        
        # Get recent commits
        commits = list(repo.iter_commits(max_count=5))
        
        results = {}
        
        for commit in commits:
            print(f"Reviewing commit: {commit.hexsha[:8]}")
            
            # Get diff for this commit
            if commit.parents:
                diff = commit.parents[0].diff(commit)
                diff_text = ""
                
                for diff_item in diff:
                    if diff_item.a_path.endswith(('.py', '.js', '.java', '.cpp', '.c')):
                        diff_text += f"\n--- {diff_item.a_path}\n"
                        if diff_item.diff:
                            diff_text += diff_item.diff.decode('utf-8', errors='ignore')
                
                if diff_text:
                    review = self.review_diff(diff_text)
                    results[commit.hexsha[:8]] = review
        
        return results

def main():
    reviewer = CodeReviewer()
    
    # Review single file
    file_path = input("Enter file path to review: ")
    if os.path.exists(file_path):
        with open(file_path, 'r') as f:
            content = f.read()
        
        print("\n" + "="*50)
        print(f"REVIEWING: {file_path}")
        print("="*50)
        
        review = reviewer.review_file(file_path, content)
        print(review)
    
    # Review git repository
    repo_path = input("\nEnter git repository path (or press Enter to skip): ")
    if repo_path and os.path.exists(repo_path):
        print("\n" + "="*50)
        print("REVIEWING RECENT COMMITS")
        print("="*50)
        
        reviews = reviewer.review_repository(repo_path)
        for commit_hash, review in reviews.items():
            print(f"\n--- Commit {commit_hash} ---")
            print(review)

if __name__ == "__main__":
    main()

2. Document Intelligence System

#!/usr/bin/env python3
"""
Document Intelligence with Ollama
Extract insights from documents, PDFs, images
"""

from openai import OpenAI
import base64
import PyPDF2
from PIL import Image
import io
import os

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

class DocumentIntelligence:
    def __init__(self, vision_model="llava:13b", text_model="llama3.1:8b"):
        self.vision_model = vision_model
        self.text_model = text_model
    
    def analyze_image(self, image_path, query="Describe this image"):
        """Analyze image content"""
        # Note: This example shows the concept. Actual image analysis with Ollama
        # requires using the ollama run command directly or the REST API
        
        with open(image_path, "rb") as img_file:
            img_data = base64.b64encode(img_file.read()).decode()
        
        # For actual implementation, use Ollama's multimodal API
        prompt = f"""
        Analyze this image and answer: {query}
        
        Provide detailed information about:
        - Main subjects/objects
        - Text content (if any)
        - Layout and structure
        - Key insights
        """
        
        # This is a conceptual example - actual multimodal implementation
        # would use Ollama's vision model API
        return self.query_text_model(prompt + f"\n[Image: {image_path}]")
    
    def extract_pdf_text(self, pdf_path):
        """Extract text from PDF"""
        text = ""
        with open(pdf_path, 'rb') as file:
            pdf_reader = PyPDF2.PdfReader(file)
            for page in pdf_reader.pages:
                text += page.extract_text() + "\n"
        return text
    
    def summarize_document(self, text, summary_type="executive"):
        """Summarize document content"""
        prompts = {
            "executive": "Create an executive summary highlighting key points, decisions, and action items:",
            "technical": "Create a technical summary focusing on methods, findings, and recommendations:",
            "bullet": "Create a bullet-point summary with main topics and subtopics:",
            "abstract": "Create an academic abstract summarizing purpose, methods, results, and conclusions:"
        }
        
        prompt = f"""
        {prompts.get(summary_type, prompts["executive"])}
        
        Document text:
        {text[:8000]}  # Limit text to fit context window
        
        Summary:
        """
        
        return self.query_text_model(prompt)
    
    def extract_key_info(self, text, info_type="all"):
        """Extract specific information from text"""
        info_prompts = {
            "dates": "Extract all dates, deadlines, and time-sensitive information:",
            "people": "Extract all names, roles, and contact information:",
            "numbers": "Extract all important numbers, statistics, and financial data:",
            "actions": "Extract all action items, tasks, and next steps:",
            "all": "Extract key information including dates, people, numbers, and action items:"
        }
        
        prompt = f"""
        {info_prompts.get(info_type, info_prompts["all"])}
        
        Text: {text[:8000]}
        
        Extracted information (format as structured list):
        """
        
        return self.query_text_model(prompt)
    
    def query_text_model(self, prompt):
        """Query text model"""
        response = client.chat.completions.create(
            model=self.text_model,
            messages=[
                {"role": "system", "content": "You are a document analysis expert."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.1,
            max_tokens=1500
        )
        
        return response.choices[0].message.content
    
    def analyze_document_batch(self, document_folder):
        """Analyze multiple documents"""
        results = {}
        
        for filename in os.listdir(document_folder):
            file_path = os.path.join(document_folder, filename)
            
            if filename.endswith('.pdf'):
                print(f"Processing PDF: {filename}")
                text = self.extract_pdf_text(file_path)
                summary = self.summarize_document(text)
                key_info = self.extract_key_info(text)
                
                results[filename] = {
                    "type": "PDF",
                    "summary": summary,
                    "key_info": key_info,
                    "text_length": len(text)
                }
                
            elif filename.lower().endswith(('.png', '.jpg', '.jpeg', '.gif')):
                print(f"Processing image: {filename}")
                analysis = self.analyze_image(file_path)
                
                results[filename] = {
                    "type": "Image", 
                    "analysis": analysis
                }
        
        return results

def main():
    doc_ai = DocumentIntelligence()
    
    print("Document Intelligence System")
    print("1. Analyze single PDF")
    print("2. Analyze image")  
    print("3. Batch analyze folder")
    
    choice = input("Choose option (1-3): ")
    
    if choice == "1":
        pdf_path = input("Enter PDF path: ")
        if os.path.exists(pdf_path):
            text = doc_ai.extract_pdf_text(pdf_path)
            print("\n" + "="*50)
            print("DOCUMENT SUMMARY:")
            print("="*50)
            print(doc_ai.summarize_document(text))
            
            print("\n" + "="*50)
            print("KEY INFORMATION:")
            print("="*50)
            print(doc_ai.extract_key_info(text))
    
    elif choice == "2":
        img_path = input("Enter image path: ")
        query = input("What do you want to know about this image? ")
        if os.path.exists(img_path):
            result = doc_ai.analyze_image(img_path, query)
            print("\n" + "="*50)
            print("IMAGE ANALYSIS:")
            print("="*50)
            print(result)
    
    elif choice == "3":
        folder_path = input("Enter folder path: ")
        if os.path.exists(folder_path):
            results = doc_ai.analyze_document_batch(folder_path)
            
            for filename, analysis in results.items():
                print(f"\n{'='*50}")
                print(f"FILE: {filename}")
                print("="*50)
                for key, value in analysis.items():
                    print(f"{key.upper()}:")
                    print(value)
                    print("-" * 30)

if __name__ == "__main__":
    main()

3. Real-time Data Analysis Pipeline

#!/usr/bin/env python3
"""
Real-time Data Analysis with Ollama
Process streaming data and generate insights
"""

import json
import time
import threading
from datetime import datetime
from openai import OpenAI
from collections import deque
import pandas as pd
import numpy as np

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

class DataAnalyzer:
    def __init__(self, model="llama3.1:8b"):
        self.model = model
        self.data_buffer = deque(maxlen=1000)
        self.analysis_queue = deque()
        self.is_running = False
        
    def add_data_point(self, data_point):
        """Add new data point to buffer"""
        data_point['timestamp'] = datetime.now().isoformat()
        self.data_buffer.append(data_point)
        
        # Trigger analysis if buffer has enough data
        if len(self.data_buffer) >= 10:
            self.queue_analysis()
    
    def queue_analysis(self):
        """Queue data for analysis"""
        recent_data = list(self.data_buffer)[-10:]  # Last 10 points
        self.analysis_queue.append({
            'data': recent_data,
            'timestamp': datetime.now().isoformat()
        })
    
    def analyze_data_trends(self, data_points):
        """Analyze trends in data"""
        # Convert to DataFrame for analysis
        df = pd.DataFrame(data_points)
        
        # Basic statistics
        numeric_cols = df.select_dtypes(include=[np.number]).columns
        stats = {}
        
        for col in numeric_cols:
            stats[col] = {
                'mean': df[col].mean(),
                'std': df[col].std(),
                'trend': 'increasing' if df[col].iloc[-1] > df[col].iloc[0] else 'decreasing',
                'change_percent': ((df[col].iloc[-1] - df[col].iloc[0]) / df[col].iloc[0]) * 100
            }
        
        # Generate AI analysis
        prompt = f"""
        Analyze this data pattern and provide insights:
        
        Data Statistics:
        {json.dumps(stats, indent=2, default=str)}
        
        Recent Data Points:
        {json.dumps(data_points[-5:], indent=2, default=str)}
        
        Provide:
        1. Key trends and patterns
        2. Anomalies or outliers
        3. Predictions for next few data points
        4. Recommended actions
        """
        
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": "You are a data analyst expert specializing in trend analysis."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.3,
            max_tokens=800
        )
        
        return {
            'statistics': stats,
            'ai_insights': response.choices[0].message.content,
            'timestamp': datetime.now().isoformat()
        }
    
    def detect_anomalies(self, data_points, threshold=2):
        """Detect statistical anomalies"""
        if len(data_points) < 5:
            return []
        
        df = pd.DataFrame(data_points)
        numeric_cols = df.select_dtypes(include=[np.number]).columns
        anomalies = []
        
        for col in numeric_cols:
            values = df[col].values
            mean = np.mean(values)
            std = np.std(values)
            
            for i, value in enumerate(values):
                z_score = abs((value - mean) / std) if std > 0 else 0
                if z_score > threshold:
                    anomalies.append({
                        'column': col,
                        'value': value,
                        'z_score': z_score,
                        'timestamp': data_points[i].get('timestamp'),
                        'index': i
                    })
        
        return anomalies
    
    def generate_report(self, analysis_results):
        """Generate comprehensive report"""
        prompt = f"""
        Generate an executive report based on this data analysis:
        
        Analysis Results:
        {json.dumps(analysis_results, indent=2, default=str)}
        
        Format the report with:
        ## Executive Summary
        ## Key Findings  
        ## Trend Analysis
        ## Risk Assessment
        ## Recommendations
        ## Next Steps
        """
        
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": "You are creating executive reports for business stakeholders."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.2,
            max_tokens=1200
        )
        
        return response.choices[0].message.content
    
    def start_analysis_worker(self):
        """Start background analysis worker"""
        self.is_running = True
        
        def worker():
            while self.is_running:
                if self.analysis_queue:
                    analysis_job = self.analysis_queue.popleft()
                    
                    try:
                        # Perform analysis
                        results = self.analyze_data_trends(analysis_job['data'])
                        
                        # Detect anomalies
                        anomalies = self.detect_anomalies(analysis_job['data'])
                        results['anomalies'] = anomalies
                        
                        # Generate alerts for significant findings
                        if anomalies:
                            print(f"🚨 ANOMALIES DETECTED: {len(anomalies)} items")
                        
                        # Print insights
                        print(f"\n📊 ANALYSIS UPDATE ({results['timestamp']})")
                        print("="*60)
                        print(results['ai_insights'])
                        
                        if anomalies:
                            print(f"\n⚠️  ANOMALIES:")
                            for anomaly in anomalies[:3]:  # Show top 3
                                print(f"  - {anomaly['column']}: {anomaly['value']} (Z-score: {anomaly['z_score']:.2f})")
                        
                    except Exception as e:
                        print(f"Analysis error: {e}")
                
                time.sleep(5)  # Check every 5 seconds
        
        thread = threading.Thread(target=worker, daemon=True)
        thread.start()
    
    def stop_analysis_worker(self):
        """Stop background analysis"""
        self.is_running = False

# Example usage and data simulation
def simulate_ecommerce_data():
    """Simulate e-commerce data stream"""
    import random
    
    base_sales = 1000
    base_visitors = 5000
    
    while True:
        # Simulate business metrics
        hour = datetime.now().hour
        
        # Business hours multiplier
        business_multiplier = 1.5 if 9 <= hour <= 17 else 0.8
        
        # Add random fluctuation
        sales = int(base_sales * business_multiplier * (1 + random.uniform(-0.3, 0.3)))
        visitors = int(base_visitors * business_multiplier * (1 + random.uniform(-0.2, 0.4)))
        conversion_rate = (sales / visitors) * 100 if visitors > 0 else 0
        avg_order_value = random.uniform(50, 200)
        
        # Occasionally add anomalies
        if random.random() < 0.05:  # 5% chance
            sales *= random.choice([0.3, 3.0])  # Dramatic drop or spike
        
        yield {
            'sales': sales,
            'visitors': visitors,
            'conversion_rate': conversion_rate,
            'avg_order_value': avg_order_value,
            'hour': hour
        }

def main():
    analyzer = DataAnalyzer("llama3.1:8b")
    analyzer.start_analysis_worker()
    
    print("🚀 Starting real-time data analysis...")
    print("Simulating e-commerce data stream...")
    
    try:
        data_stream = simulate_ecommerce_data()
        
        for i, data_point in enumerate(data_stream):
            analyzer.add_data_point(data_point)
            
            # Print current data point
            print(f"Data point {i+1}: Sales: {data_point['sales']}, "
                  f"Visitors: {data_point['visitors']}, "
                  f"Conversion: {data_point['conversion_rate']:.2f}%")
            
            time.sleep(2)  # New data every 2 seconds
            
    except KeyboardInterrupt:
        print("\n🛑 Stopping analysis...")
        analyzer.stop_analysis_worker()

if __name__ == "__main__":
    main()

Performance Optimization Tips {#optimization}

Hardware Optimization

# GPU Memory Optimization
export OLLAMA_MAX_LOADED_MODELS=1
export OLLAMA_MAX_VRAM=8GB

# CPU Optimization  
export OLLAMA_NUM_PARALLEL=8
export OLLAMA_NUM_THREAD=16

# Context Window Management
export OLLAMA_CONTEXT_SIZE=4096

Model Quantization Guide

#!/usr/bin/env python3
"""
Model Quantization and Performance Testing
Find the best quantization for your hardware
"""

import time
import subprocess
import psutil
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

# Available quantization levels
QUANTIZATIONS = {
    'fp16': 'Full precision (largest, best quality)',
    'q8_0': '8-bit quantization (good balance)',
    'q4_0': '4-bit quantization (smaller, faster)',
    'q2_K': '2-bit quantization (smallest, fastest)'
}

def test_model_performance(model_name, test_prompt="Explain machine learning"):
    """Test model response time and quality"""
    start_time = time.time()
    
    try:
        response = client.chat.completions.create(
            model=model_name,
            messages=[{"role": "user", "content": test_prompt}],
            max_tokens=100
        )
        
        end_time = time.time()
        response_time = end_time - start_time
        response_text = response.choices[0].message.content
        
        return {
            'response_time': response_time,
            'response_length': len(response_text),
            'tokens_per_second': 100 / response_time,
            'memory_usage': psutil.virtual_memory().percent
        }
        
    except Exception as e:
        return {'error': str(e)}

def benchmark_quantizations(base_model):
    """Benchmark different quantizations of a model"""
    results = {}
    
    for quant, description in QUANTIZATIONS.items():
        model_name = f"{base_model}:{quant}"
        
        print(f"\nTesting {model_name}...")
        
        # Try to pull model
        try:
            subprocess.run(['ollama', 'pull', model_name], 
                         capture_output=True, check=True)
        except subprocess.CalledProcessError:
            print(f"❌ Failed to pull {model_name}")
            continue
        
        # Test performance
        perf = test_model_performance(model_name)
        results[quant] = perf
        
        if 'error' not in perf:
            print(f"✅ {description}")
            print(f"   Response time: {perf['response_time']:.2f}s")
            print(f"   Tokens/second: {perf['tokens_per_second']:.1f}")
            print(f"   Memory usage: {perf['memory_usage']:.1f}%")
        else:
            print(f"❌ Error: {perf['error']}")
    
    return results

if __name__ == "__main__":
    model = input("Enter base model name (e.g., llama3.1:8b): ").strip()
    results = benchmark_quantizations(model)
    
    print("\n" + "="*60)
    print("BENCHMARK RESULTS")
    print("="*60)
    
    for quant, result in results.items():
        if 'error' not in result:
            print(f"{quant:10} | {result['tokens_per_second']:6.1f} tok/s | "
                  f"{result['response_time']:6.2f}s | {result['memory_usage']:5.1f}% RAM")

Integration Examples {#integration-examples}

VS Code Extension Integration

// VS Code extension for Ollama integration
import * as vscode from 'vscode';
import axios from 'axios';

export function activate(context: vscode.ExtensionContext) {
    let disposable = vscode.commands.registerCommand('ollama.explainCode', async () => {
        const editor = vscode.window.activeTextEditor;
        if (!editor) {
            vscode.window.showErrorMessage('No active editor');
            return;
        }

        const selection = editor.selection;
        const selectedText = editor.document.getText(selection);

        if (!selectedText) {
            vscode.window.showErrorMessage('No code selected');
            return;
        }

        try {
            const explanation = await explainCode(selectedText);
            
            // Show explanation in new document
            const doc = await vscode.workspace.openTextDocument({
                content: explanation,
                language: 'markdown'
            });
            
            await vscode.window.showTextDocument(doc);
            
        } catch (error) {
            vscode.window.showErrorMessage(`Error: ${error}`);
        }
    });

    context.subscriptions.push(disposable);
}

async function explainCode(code: string): Promise<string> {
    const response = await axios.post('http://localhost:11434/v1/chat/completions', {
        model: 'deepseek-coder:6.7b',
        messages: [
            {
                role: 'system',
                content: 'You are a code explanation expert. Explain code clearly and concisely.'
            },
            {
                role: 'user', 
                content: `Explain this code:\n\n${code}`
            }
        ],
        temperature: 0.1,
        max_tokens: 1000
    });
    
    return response.data.choices[0].message.content;
}

export function deactivate() {}

Slack Bot Integration

#!/usr/bin/env python3
"""
Slack Bot powered by Ollama
Smart assistant for your team
"""

import os
from slack_bolt import App
from slack_bolt.adapter.socket_mode import SocketModeHandler
from openai import OpenAI

# Initialize Slack app
app = App(token=os.environ.get("SLACK_BOT_TOKEN"))

# Initialize Ollama client
ollama_client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

@app.message("hello")
def handle_hello(message, say):
    """Handle hello messages"""
    say(f"Hi <@{message['user']}>! I'm your AI assistant powered by Ollama. How can I help?")

@app.message("explain")
def handle_explain(message, say, client):
    """Explain code or concepts"""
    text = message['text'].replace('explain', '').strip()
    
    if not text:
        say("Please provide something to explain. Example: `explain recursion in Python`")
        return
    
    try:
        response = ollama_client.chat.completions.create(
            model="llama3.1:8b",
            messages=[
                {"role": "system", "content": "You are a helpful technical assistant. Provide clear, concise explanations."},
                {"role": "user", "content": f"Explain: {text}"}
            ],
            temperature=0.3,
            max_tokens=800
        )
        
        explanation = response.choices[0].message.content
        
        # Post as thread reply
        client.chat_postMessage(
            channel=message['channel'],
            thread_ts=message['ts'],
            text=f"```\n{explanation}\n```"
        )
        
    except Exception as e:
        say(f"Sorry, I encountered an error: {str(e)}")

@app.command("/analyze")
def handle_analyze_command(ack, respond, command):
    """Analyze code or data"""
    ack()
    
    text = command['text']
    
    try:
        response = ollama_client.chat.completions.create(
            model="deepseek-coder:6.7b",
            messages=[
                {"role": "system", "content": "You are a code analysis expert."},
                {"role": "user", "content": f"Analyze this: {text}"}
            ],
            temperature=0.2
        )
        
        analysis = response.choices[0].message.content
        respond(f"Analysis:\n```\n{analysis}\n```")
        
    except Exception as e:
        respond(f"Error: {str(e)}")

@app.event("file_shared")  
def handle_file_share(event, say):
    """Analyze shared files"""
    # This would integrate with document analysis
    say("I can analyze documents! Upload a file and mention me.")

if __name__ == "__main__":
    # Start the app
    handler = SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"])
    handler.start()

Web Dashboard

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Ollama Dashboard</title>
    <style>
        body { 
            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
            max-width: 1200px;
            margin: 0 auto;
            padding: 20px;
            background: #f5f5f5;
        }
        
        .card {
            background: white;
            border-radius: 10px;
            padding: 20px;
            margin-bottom: 20px;
            box-shadow: 0 2px 10px rgba(0,0,0,0.1);
        }
        
        .chat-container {
            height: 400px;
            overflow-y: auto;
            border: 1px solid #ddd;
            padding: 15px;
            background: #f9f9f9;
        }
        
        .message {
            margin-bottom: 15px;
            padding: 10px;
            border-radius: 8px;
        }
        
        .user { background: #007bff; color: white; margin-left: 20%; }
        .assistant { background: #e9ecef; margin-right: 20%; }
        
        .input-group {
            display: flex;
            gap: 10px;
            margin-top: 10px;
        }
        
        input, select, textarea {
            padding: 10px;
            border: 1px solid #ddd;
            border-radius: 5px;
        }
        
        button {
            background: #007bff;
            color: white;
            border: none;
            padding: 10px 20px;
            border-radius: 5px;
            cursor: pointer;
        }
        
        button:hover { background: #0056b3; }
        
        .model-info {
            display: grid;
            grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
            gap: 15px;
        }
        
        .metric {
            text-align: center;
            padding: 15px;
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            border-radius: 8px;
        }
    </style>
</head>
<body>
    <h1>🤖 Ollama AI Dashboard</h1>
    
    <div class="card">
        <h2>Model Selection</h2>
        <div class="input-group">
            <select id="modelSelect">
                <option value="llama3.1:8b">Llama 3.1 8B</option>
                <option value="mistral:7b">Mistral 7B</option>
                <option value="deepseek-coder:6.7b">DeepSeek Coder</option>
            </select>
            <button onclick="loadModels()">Refresh Models</button>
            <button onclick="getModelInfo()">Model Info</button>
        </div>
    </div>
    
    <div class="card">
        <h2>System Status</h2>
        <div class="model-info">
            <div class="metric">
                <h3 id="responseTime">-- ms</h3>
                <p>Response Time</p>
            </div>
            <div class="metric">
                <h3 id="memoryUsage">-- %</h3>
                <p>Memory Usage</p>
            </div>
            <div class="metric">
                <h3 id="tokensPerSec">-- tok/s</h3>
                <p>Tokens/Second</p>
            </div>
        </div>
    </div>
    
    <div class="card">
        <h2>AI Chat</h2>
        <div id="chatContainer" class="chat-container"></div>
        <div class="input-group">
            <textarea id="messageInput" placeholder="Type your message..." rows="2" style="flex: 1;"></textarea>
            <button onclick="sendMessage()">Send</button>
            <button onclick="clearChat()">Clear</button>
        </div>
    </div>

    <script>
        class OllamaDashboard {
            constructor() {
                this.baseUrl = 'http://localhost:11434';
                this.currentModel = 'llama3.1:8b';
                this.conversation = [];
                
                this.loadModels();
                this.updateStatus();
                setInterval(() => this.updateStatus(), 5000);
            }

            async loadModels() {
                try {
                    const response = await fetch(`${this.baseUrl}/api/tags`);
                    const data = await response.json();
                    
                    const select = document.getElementById('modelSelect');
                    select.innerHTML = '';
                    
                    data.models?.forEach(model => {
                        const option = document.createElement('option');
                        option.value = model.name;
                        option.textContent = model.name;
                        select.appendChild(option);
                    });
                } catch (error) {
                    console.error('Failed to load models:', error);
                }
            }

            async getModelInfo() {
                const model = document.getElementById('modelSelect').value;
                try {
                    const response = await fetch(`${this.baseUrl}/api/show`, {
                        method: 'POST',
                        headers: { 'Content-Type': 'application/json' },
                        body: JSON.stringify({ name: model })
                    });
                    
                    const data = await response.json();
                    alert(`Model: ${model}\nParameters: ${data.details?.parameter_size || 'Unknown'}\nFamily: ${data.details?.family || 'Unknown'}`);
                } catch (error) {
                    console.error('Failed to get model info:', error);
                }
            }

            async updateStatus() {
                try {
                    // Test response time
                    const startTime = performance.now();
                    const response = await fetch(`${this.baseUrl}/api/version`);
                    const endTime = performance.now();
                    
                    if (response.ok) {
                        document.getElementById('responseTime').textContent = `${Math.round(endTime - startTime)} ms`;
                    }
                    
                    // Mock memory and token metrics (would need actual implementation)
                    document.getElementById('memoryUsage').textContent = `${Math.round(Math.random() * 30 + 50)}%`;
                    document.getElementById('tokensPerSec').textContent = `${Math.round(Math.random() * 20 + 30)} tok/s`;
                    
                } catch (error) {
                    document.getElementById('responseTime').textContent = 'Offline';
                }
            }

            async sendMessage() {
                const input = document.getElementById('messageInput');
                const message = input.value.trim();
                
                if (!message) return;
                
                // Add user message to chat
                this.addMessage('user', message);
                input.value = '';
                
                // Show typing indicator
                const typingDiv = this.addMessage('assistant', 'Thinking...');
                
                try {
                    const model = document.getElementById('modelSelect').value;
                    const response = await fetch(`${this.baseUrl}/v1/chat/completions`, {
                        method: 'POST',
                        headers: { 'Content-Type': 'application/json' },
                        body: JSON.stringify({
                            model: model,
                            messages: [
                                ...this.conversation,
                                { role: 'user', content: message }
                            ],
                            temperature: 0.7,
                            stream: false
                        })
                    });
                    
                    const data = await response.json();
                    const aiResponse = data.choices[0].message.content;
                    
                    // Update conversation history
                    this.conversation.push({ role: 'user', content: message });
                    this.conversation.push({ role: 'assistant', content: aiResponse });
                    
                    // Replace typing indicator with actual response
                    typingDiv.textContent = aiResponse;
                    
                } catch (error) {
                    typingDiv.textContent = `Error: ${error.message}`;
                    typingDiv.style.background = '#ffebee';
                }
            }

            addMessage(role, content) {
                const container = document.getElementById('chatContainer');
                const messageDiv = document.createElement('div');
                messageDiv.className = `message ${role}`;
                messageDiv.textContent = content;
                
                container.appendChild(messageDiv);
                container.scrollTop = container.scrollHeight;
                
                return messageDiv;
            }

            clearChat() {
                document.getElementById('chatContainer').innerHTML = '';
                this.conversation = [];
            }
        }

        // Global functions for HTML onclick handlers
        let dashboard;

        document.addEventListener('DOMContentLoaded', () => {
            dashboard = new OllamaDashboard();
        });

        function loadModels() {
            dashboard.loadModels();
        }

        function getModelInfo() {
            dashboard.getModelInfo();
        }

        function sendMessage() {
            dashboard.sendMessage();
        }

        function clearChat() {
            dashboard.clearChat();
        }

        // Enter key to send message
        document.addEventListener('keydown', (e) => {
            if (e.key === 'Enter' && e.target.id === 'messageInput' && !e.shiftKey) {
                e.preventDefault();
                sendMessage();
            }
        });
    </script>
</body>
</html>

Conclusion

Ollama has transformed the AI landscape by making powerful language models accessible to everyone. Whether you’re a developer building the next AI-powered app, a researcher exploring language models, or a business looking to leverage AI without breaking the bank, Ollama provides the perfect solution.

Key Takeaways:

🎯 Start Simple: Begin with llama3.1:8b for the best balance
🚀 Scale Gradually: Upgrade to larger models as you need more capability
💰 Save Money: Replace expensive API calls with free local models
🔒 Stay Private: Keep sensitive data on your own hardware
Go Fast: Optimize performance with the right quantization

What’s Next?

  • Ollama is rapidly evolving with new features like tool calling, multimodal support, and improved performance
  • The community is building amazing integrations and applications
  • More models are being added regularly, including specialized variants for coding, vision, and reasoning

Ready to get started? Pick the installation method for your platform, pull your first model, and join the local AI revolution!


Found this guide helpful? Share it with your team and help spread the word about local AI. The future is decentralized, private, and powerful – and it starts with Ollama.

Have Queries? Join https://launchpass.com/collabnix

Tanvir Kour Tanvir Kour is a passionate technical blogger and open source enthusiast. She is a graduate in Computer Science and Engineering and has 4 years of experience in providing IT solutions. She is well-versed with Linux, Docker and Cloud-Native application. You can connect to her via Twitter https://x.com/tanvirkour
Join our Discord Server
Index