Run ChatGPT-level AI models on your laptop for FREE – No API bills, complete privacy, and unlimited usage!
Ollama has revolutionized local AI by making it ridiculously easy to run powerful language models on your own hardware. Think of it as “Docker for AI models” – one command and you’re running models that rival ChatGPT, completely offline.
Why Developers Are Switching to Ollama:
✅ Zero API costs – Run unlimited queries
✅ Complete privacy – Data never leaves your machine
✅ Offline capable – Works without internet
✅ OpenAI API compatible – Drop-in replacement for existing apps
✅ 100+ models available – From 1B to 70B+ parameters
Installation Guide for Every Platform
macOS Installation
# Download and install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Verify installation
ollama --version
# Start the service
ollama serve
Windows Installation
# Option 1: Download from official website
# Visit https://ollama.com and download .exe installer
# Option 2: Using winget
winget install Ollama.Ollama
# Verify installation
ollama --version
# Start Ollama
ollama serve
Linux Installation (Ubuntu/Debian)
# One-line install
curl -fsSL https://ollama.com/install.sh | sh
# Alternative: Manual installation
sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama
sudo chmod +x /usr/bin/ollama
# Create systemd service
sudo tee /etc/systemd/system/ollama.service > /dev/null <<EOF
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
EOF
# Start and enable service
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
# Check status
sudo systemctl status ollama
Docker Installation
# Run Ollama in Docker
docker run -d \
--name ollama \
-p 11434:11434 \
-v ollama:/root/.ollama \
ollama/ollama
# For GPU support (NVIDIA)
docker run -d \
--gpus=all \
--name ollama-gpu \
-p 11434:11434 \
-v ollama:/root/.ollama \
ollama/ollama
# Pull a model
docker exec -it ollama ollama pull llama3.1:8b
Quick Start Test
# Pull your first model
ollama pull llama3.1:8b
# Start chatting
ollama run llama3.1:8b "Hello! Explain quantum computing in simple terms."
# List installed models
ollama list
# Check system info
ollama ps
Best Models in 2025: Performance Comparison
🏆 Top Recommended Models by Use Case
General Purpose & Chat
# Best overall balance (8GB+ RAM required)
ollama pull llama3.1:8b
# Lightweight option (4GB RAM)
ollama pull phi3:mini
# Multilingual powerhouse (16GB+ RAM)
ollama pull qwen2.5:14b
Coding & Development
# Code generation specialist
ollama pull deepseek-coder:6.7b
# Python expert
ollama pull codellama:13b-python
# General coding (34B version for best quality)
ollama pull codellama:34b-instruct
Vision & Multimodal
# Image understanding
ollama pull llava:13b
# Latest vision model
ollama pull llava:34b
# Document analysis
ollama pull qwen2-vl:7b
Model Performance Comparison Table
| Model | Size | RAM Required | Best For | Speed | Quality |
|---|---|---|---|---|---|
| phi3:mini | 3.8B | 4GB | Quick tasks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| mistral:7b | 7B | 8GB | General use | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| llama3.1:8b | 8B | 8GB | Best balance | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| codellama:13b | 13B | 16GB | Coding | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| llama3.1:70b | 70B | 64GB+ | Maximum quality | ⭐⭐ | ⭐⭐⭐⭐⭐ |
Model Selection Script
#!/usr/bin/env python3
"""
Ollama Model Recommender
Suggests the best model based on your hardware and use case
"""
import psutil
import subprocess
import json
def get_system_info():
"""Get system RAM and GPU info"""
ram_gb = psutil.virtual_memory().total / (1024**3)
try:
# Check for NVIDIA GPU
result = subprocess.run(['nvidia-smi', '--query-gpu=memory.total', '--format=csv,noheader,nounits'],
capture_output=True, text=True)
if result.returncode == 0:
gpu_memory = int(result.stdout.strip()) / 1024 # Convert MB to GB
else:
gpu_memory = 0
except FileNotFoundError:
gpu_memory = 0
return ram_gb, gpu_memory
def recommend_model(ram_gb, gpu_memory, use_case):
"""Recommend best model based on hardware and use case"""
recommendations = {
"coding": {
"low": "phi3:mini",
"medium": "deepseek-coder:6.7b",
"high": "codellama:34b-instruct"
},
"general": {
"low": "phi3:mini",
"medium": "llama3.1:8b",
"high": "llama3.1:70b"
},
"vision": {
"low": "moondream:1.8b",
"medium": "llava:7b",
"high": "llava:34b"
}
}
total_memory = max(ram_gb, gpu_memory)
if total_memory < 8:
tier = "low"
elif total_memory < 32:
tier = "medium"
else:
tier = "high"
return recommendations.get(use_case, recommendations["general"])[tier]
def install_model(model_name):
"""Install recommended model"""
print(f"Installing {model_name}...")
result = subprocess.run(['ollama', 'pull', model_name], capture_output=True, text=True)
if result.returncode == 0:
print(f"✅ Successfully installed {model_name}")
return True
else:
print(f"❌ Failed to install {model_name}: {result.stderr}")
return False
if __name__ == "__main__":
ram, gpu_mem = get_system_info()
print(f"System Info: {ram:.1f}GB RAM, {gpu_mem:.1f}GB GPU")
use_case = input("Use case (coding/general/vision): ").lower()
recommended = recommend_model(ram, gpu_mem, use_case)
print(f"Recommended model: {recommended}")
if input("Install now? (y/N): ").lower() == 'y':
install_model(recommended)
OpenAI API Compatibility: Drop-in Replacement
Ollama’s killer feature is 100% OpenAI API compatibility. Replace expensive API calls with local models without changing a single line of code!
Basic Setup
# Start Ollama with OpenAI compatibility
ollama serve
# Test the API
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain machine learning in 3 sentences."}
],
"temperature": 0.7,
"max_tokens": 150
}'
Python Implementation
#!/usr/bin/env python3
"""
Drop-in OpenAI replacement using Ollama
Just change the base_url and api_key!
"""
from openai import OpenAI
import os
# Initialize Ollama client (OpenAI compatible)
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # Required but unused
)
class OllamaChat:
def __init__(self, model="llama3.1:8b"):
self.model = model
self.conversation = []
def chat(self, message, system_prompt=None):
"""Send a chat message and get response"""
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
# Add conversation history
messages.extend(self.conversation)
messages.append({"role": "user", "content": message})
try:
response = client.chat.completions.create(
model=self.model,
messages=messages,
temperature=0.7,
max_tokens=1000
)
ai_response = response.choices[0].message.content
# Update conversation history
self.conversation.append({"role": "user", "content": message})
self.conversation.append({"role": "assistant", "content": ai_response})
return ai_response
except Exception as e:
return f"Error: {str(e)}"
def stream_chat(self, message):
"""Stream chat response in real-time"""
messages = self.conversation + [{"role": "user", "content": message}]
stream = client.chat.completions.create(
model=self.model,
messages=messages,
temperature=0.7,
stream=True
)
full_response = ""
for chunk in stream:
if chunk.choices[0].delta.content is not None:
content = chunk.choices[0].delta.content
print(content, end="", flush=True)
full_response += content
print() # New line after streaming
# Update conversation
self.conversation.extend([
{"role": "user", "content": message},
{"role": "assistant", "content": full_response}
])
return full_response
def clear_conversation(self):
"""Reset conversation history"""
self.conversation = []
# Example usage
if __name__ == "__main__":
chat = OllamaChat("llama3.1:8b")
# Regular chat
response = chat.chat("What is Python?")
print(f"AI: {response}")
# Follow-up question (with context)
response = chat.chat("Give me a simple example")
print(f"AI: {response}")
# Streaming response
print("\nStreaming response:")
chat.stream_chat("Write a Python function to sort a list")
Node.js Implementation
// npm install openai
const OpenAI = require('openai');
const client = new OpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'ollama' // Required but unused
});
class OllamaChat {
constructor(model = 'llama3.1:8b') {
this.model = model;
this.conversation = [];
}
async chat(message, systemPrompt = null) {
const messages = [];
if (systemPrompt) {
messages.push({ role: 'system', content: systemPrompt });
}
messages.push(...this.conversation);
messages.push({ role: 'user', content: message });
try {
const response = await client.chat.completions.create({
model: this.model,
messages: messages,
temperature: 0.7,
max_tokens: 1000
});
const aiResponse = response.choices[0].message.content;
// Update conversation history
this.conversation.push({ role: 'user', content: message });
this.conversation.push({ role: 'assistant', content: aiResponse });
return aiResponse;
} catch (error) {
return `Error: ${error.message}`;
}
}
async streamChat(message) {
const messages = [...this.conversation, { role: 'user', content: message }];
const stream = await client.chat.completions.create({
model: this.model,
messages: messages,
temperature: 0.7,
stream: true
});
let fullResponse = '';
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
const content = chunk.choices[0].delta.content;
process.stdout.write(content);
fullResponse += content;
}
}
console.log(); // New line
// Update conversation
this.conversation.push({ role: 'user', content: message });
this.conversation.push({ role: 'assistant', content: fullResponse });
return fullResponse;
}
clearConversation() {
this.conversation = [];
}
}
// Example usage
async function main() {
const chat = new OllamaChat('llama3.1:8b');
// Regular chat
const response = await chat.chat('Explain async/await in JavaScript');
console.log('AI:', response);
// Streaming response
console.log('\nStreaming response:');
await chat.streamChat('Give me a practical example');
}
main().catch(console.error);
Migration Script for Existing OpenAI Apps
#!/usr/bin/env python3
"""
Migrate existing OpenAI applications to Ollama
Replace API calls automatically
"""
import re
import os
import shutil
from pathlib import Path
def migrate_python_file(file_path):
"""Migrate a Python file from OpenAI to Ollama"""
with open(file_path, 'r') as f:
content = f.read()
# Backup original file
shutil.copy2(file_path, f"{file_path}.backup")
# Replace OpenAI initialization
content = re.sub(
r'OpenAI\(\s*api_key\s*=\s*[^)]+\)',
'OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")',
content
)
# Replace model names
model_mapping = {
'gpt-3.5-turbo': 'llama3.1:8b',
'gpt-4': 'llama3.1:70b',
'gpt-4-turbo': 'qwen2.5:32b',
'text-embedding-ada-002': 'nomic-embed-text'
}
for openai_model, ollama_model in model_mapping.items():
content = re.sub(
rf'["\']{{openai_model}}["\']',
f'"{ollama_model}"',
content
)
# Write modified content
with open(file_path, 'w') as f:
f.write(content)
print(f"✅ Migrated {file_path}")
def migrate_project(project_dir):
"""Migrate an entire project"""
project_path = Path(project_dir)
# Find Python files
python_files = list(project_path.rglob("*.py"))
print(f"Found {len(python_files)} Python files to migrate")
for file_path in python_files:
if 'venv' not in str(file_path) and '.git' not in str(file_path):
migrate_python_file(file_path)
# Create migration info file
info_content = """
# Ollama Migration Complete
Your project has been migrated to use Ollama instead of OpenAI.
## Changes Made:
- OpenAI client initialization updated to use localhost
- Model names mapped to Ollama equivalents
- Original files backed up with .backup extension
## Setup Required:
1. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
2. Pull required models:
- ollama pull llama3.1:8b
- ollama pull llama3.1:70b
- ollama pull qwen2.5:32b
- ollama pull nomic-embed-text
3. Start Ollama: ollama serve
Your app should now work with local models!
"""
with open(project_path / "OLLAMA_MIGRATION.md", 'w') as f:
f.write(info_content)
if __name__ == "__main__":
project_dir = input("Enter project directory path: ")
if os.path.exists(project_dir):
migrate_project(project_dir)
print("\n🎉 Migration complete! Check OLLAMA_MIGRATION.md for next steps.")
else:
print("❌ Directory not found!")
Common Problems & Solutions
Problem 1: “ollama: command not found”
# Solution 1: Check if Ollama is installed
which ollama
# Solution 2: Add to PATH (Linux/Mac)
echo 'export PATH=$PATH:/usr/local/bin' >> ~/.bashrc
source ~/.bashrc
# Solution 3: Reinstall Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Solution 4: Manual installation
sudo curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/local/bin/ollama
sudo chmod +x /usr/local/bin/ollama
Problem 2: “Connection refused” or API not working
# Check if Ollama is running
ps aux | grep ollama
# Start Ollama service
ollama serve
# Check port availability
netstat -tulpn | grep 11434
# Test API endpoint
curl http://localhost:11434/api/version
# Fix CORS issues (if accessing from browser)
export OLLAMA_ORIGINS="*"
ollama serve
# Custom host binding
export OLLAMA_HOST="0.0.0.0:11434"
ollama serve
Problem 3: Model download failures
# Check internet connection
ping ollama.com
# Clear cache and retry
rm -rf ~/.ollama/models
ollama pull llama3.1:8b
# Download with specific tag
ollama pull llama3.1:8b-instruct-q4_0
# Check disk space
df -h
# Manual model verification
ollama list
ollama show llama3.1:8b
Problem 4: Out of memory errors
# Check available RAM
free -h
# List running models
ollama ps
# Stop unused models
ollama stop llama3.1:8b
# Configure memory limits
export OLLAMA_MAX_LOADED_MODELS=1
export OLLAMA_NUM_PARALLEL=1
ollama serve
# Use smaller model variants
ollama pull llama3.1:8b-q4_0 # 4-bit quantized
Problem 5: Slow performance
# Check GPU availability
nvidia-smi # For NVIDIA
rocm-smi # For AMD
# Enable GPU acceleration
export CUDA_VISIBLE_DEVICES=0
ollama serve
# Optimize CPU usage
export OLLAMA_NUM_PARALLEL=4
export OLLAMA_MAX_QUEUE=512
# Use faster models
ollama pull phi3:mini # Fastest
ollama pull mistral:7b # Good balance
Comprehensive Diagnostic Script
#!/bin/bash
# Ollama Health Check Script
echo "🔍 Ollama Diagnostic Report"
echo "=========================="
# Check installation
if command -v ollama &> /dev/null; then
echo "✅ Ollama installed: $(ollama --version)"
else
echo "❌ Ollama not found in PATH"
exit 1
fi
# Check service status
if pgrep -x "ollama" > /dev/null; then
echo "✅ Ollama service running"
else
echo "⚠️ Ollama service not running"
echo " Start with: ollama serve"
fi
# Check API endpoint
if curl -s http://localhost:11434/api/version > /dev/null; then
echo "✅ API endpoint responding"
else
echo "❌ API endpoint not accessible"
fi
# Check system resources
echo ""
echo "💾 System Resources:"
echo " RAM: $(free -h | awk '/^Mem:/ {print $2}')"
echo " Available: $(free -h | awk '/^Mem:/ {print $7}')"
# Check GPU
if command -v nvidia-smi &> /dev/null; then
echo " GPU: $(nvidia-smi --query-gpu=name --format=csv,noheader,nounits | head -1)"
echo " VRAM: $(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits | head -1) MB"
else
echo " GPU: Not detected"
fi
# Check models
echo ""
echo "🤖 Installed Models:"
ollama list
echo ""
echo "📊 Performance Test:"
echo "Testing model response time..."
start_time=$(date +%s.%3N)
ollama run llama3.1:8b "Say hello" --verbose 2>/dev/null | head -1
end_time=$(date +%s.%3N)
duration=$(echo "$end_time - $start_time" | bc)
echo " Response time: ${duration}s"
echo ""
echo "✅ Diagnostic complete!"
Advanced Use Cases with Code Examples
1. Code Review Assistant
#!/usr/bin/env python3
"""
Automated Code Review using Ollama
Analyzes code quality, suggests improvements
"""
from openai import OpenAI
import os
import subprocess
import git
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama"
)
class CodeReviewer:
def __init__(self, model="deepseek-coder:33b"):
self.model = model
def review_file(self, file_path, file_content):
"""Review a single file"""
prompt = f"""
Review this {file_path} code for:
- Security vulnerabilities
- Performance issues
- Code style and best practices
- Potential bugs
- Suggestions for improvement
Code:
```
{file_content}
```
Format your response as:
## Security Issues:
## Performance Issues:
## Style Issues:
## Potential Bugs:
## Suggestions:
"""
response = client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": "You are an expert code reviewer with 10+ years experience."},
{"role": "user", "content": prompt}
],
temperature=0.1,
max_tokens=2000
)
return response.choices[0].message.content
def review_diff(self, diff_content):
"""Review git diff changes"""
prompt = f"""
Review this git diff for potential issues:
{diff_content}
Focus on:
- Breaking changes
- Security implications
- Performance impact
- Code quality
"""
response = client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": "You are reviewing code changes in a pull request."},
{"role": "user", "content": prompt}
],
temperature=0.1
)
return response.choices[0].message.content
def review_repository(self, repo_path):
"""Review entire repository"""
repo = git.Repo(repo_path)
# Get recent commits
commits = list(repo.iter_commits(max_count=5))
results = {}
for commit in commits:
print(f"Reviewing commit: {commit.hexsha[:8]}")
# Get diff for this commit
if commit.parents:
diff = commit.parents[0].diff(commit)
diff_text = ""
for diff_item in diff:
if diff_item.a_path.endswith(('.py', '.js', '.java', '.cpp', '.c')):
diff_text += f"\n--- {diff_item.a_path}\n"
if diff_item.diff:
diff_text += diff_item.diff.decode('utf-8', errors='ignore')
if diff_text:
review = self.review_diff(diff_text)
results[commit.hexsha[:8]] = review
return results
def main():
reviewer = CodeReviewer()
# Review single file
file_path = input("Enter file path to review: ")
if os.path.exists(file_path):
with open(file_path, 'r') as f:
content = f.read()
print("\n" + "="*50)
print(f"REVIEWING: {file_path}")
print("="*50)
review = reviewer.review_file(file_path, content)
print(review)
# Review git repository
repo_path = input("\nEnter git repository path (or press Enter to skip): ")
if repo_path and os.path.exists(repo_path):
print("\n" + "="*50)
print("REVIEWING RECENT COMMITS")
print("="*50)
reviews = reviewer.review_repository(repo_path)
for commit_hash, review in reviews.items():
print(f"\n--- Commit {commit_hash} ---")
print(review)
if __name__ == "__main__":
main()
2. Document Intelligence System
#!/usr/bin/env python3
"""
Document Intelligence with Ollama
Extract insights from documents, PDFs, images
"""
from openai import OpenAI
import base64
import PyPDF2
from PIL import Image
import io
import os
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama"
)
class DocumentIntelligence:
def __init__(self, vision_model="llava:13b", text_model="llama3.1:8b"):
self.vision_model = vision_model
self.text_model = text_model
def analyze_image(self, image_path, query="Describe this image"):
"""Analyze image content"""
# Note: This example shows the concept. Actual image analysis with Ollama
# requires using the ollama run command directly or the REST API
with open(image_path, "rb") as img_file:
img_data = base64.b64encode(img_file.read()).decode()
# For actual implementation, use Ollama's multimodal API
prompt = f"""
Analyze this image and answer: {query}
Provide detailed information about:
- Main subjects/objects
- Text content (if any)
- Layout and structure
- Key insights
"""
# This is a conceptual example - actual multimodal implementation
# would use Ollama's vision model API
return self.query_text_model(prompt + f"\n[Image: {image_path}]")
def extract_pdf_text(self, pdf_path):
"""Extract text from PDF"""
text = ""
with open(pdf_path, 'rb') as file:
pdf_reader = PyPDF2.PdfReader(file)
for page in pdf_reader.pages:
text += page.extract_text() + "\n"
return text
def summarize_document(self, text, summary_type="executive"):
"""Summarize document content"""
prompts = {
"executive": "Create an executive summary highlighting key points, decisions, and action items:",
"technical": "Create a technical summary focusing on methods, findings, and recommendations:",
"bullet": "Create a bullet-point summary with main topics and subtopics:",
"abstract": "Create an academic abstract summarizing purpose, methods, results, and conclusions:"
}
prompt = f"""
{prompts.get(summary_type, prompts["executive"])}
Document text:
{text[:8000]} # Limit text to fit context window
Summary:
"""
return self.query_text_model(prompt)
def extract_key_info(self, text, info_type="all"):
"""Extract specific information from text"""
info_prompts = {
"dates": "Extract all dates, deadlines, and time-sensitive information:",
"people": "Extract all names, roles, and contact information:",
"numbers": "Extract all important numbers, statistics, and financial data:",
"actions": "Extract all action items, tasks, and next steps:",
"all": "Extract key information including dates, people, numbers, and action items:"
}
prompt = f"""
{info_prompts.get(info_type, info_prompts["all"])}
Text: {text[:8000]}
Extracted information (format as structured list):
"""
return self.query_text_model(prompt)
def query_text_model(self, prompt):
"""Query text model"""
response = client.chat.completions.create(
model=self.text_model,
messages=[
{"role": "system", "content": "You are a document analysis expert."},
{"role": "user", "content": prompt}
],
temperature=0.1,
max_tokens=1500
)
return response.choices[0].message.content
def analyze_document_batch(self, document_folder):
"""Analyze multiple documents"""
results = {}
for filename in os.listdir(document_folder):
file_path = os.path.join(document_folder, filename)
if filename.endswith('.pdf'):
print(f"Processing PDF: {filename}")
text = self.extract_pdf_text(file_path)
summary = self.summarize_document(text)
key_info = self.extract_key_info(text)
results[filename] = {
"type": "PDF",
"summary": summary,
"key_info": key_info,
"text_length": len(text)
}
elif filename.lower().endswith(('.png', '.jpg', '.jpeg', '.gif')):
print(f"Processing image: {filename}")
analysis = self.analyze_image(file_path)
results[filename] = {
"type": "Image",
"analysis": analysis
}
return results
def main():
doc_ai = DocumentIntelligence()
print("Document Intelligence System")
print("1. Analyze single PDF")
print("2. Analyze image")
print("3. Batch analyze folder")
choice = input("Choose option (1-3): ")
if choice == "1":
pdf_path = input("Enter PDF path: ")
if os.path.exists(pdf_path):
text = doc_ai.extract_pdf_text(pdf_path)
print("\n" + "="*50)
print("DOCUMENT SUMMARY:")
print("="*50)
print(doc_ai.summarize_document(text))
print("\n" + "="*50)
print("KEY INFORMATION:")
print("="*50)
print(doc_ai.extract_key_info(text))
elif choice == "2":
img_path = input("Enter image path: ")
query = input("What do you want to know about this image? ")
if os.path.exists(img_path):
result = doc_ai.analyze_image(img_path, query)
print("\n" + "="*50)
print("IMAGE ANALYSIS:")
print("="*50)
print(result)
elif choice == "3":
folder_path = input("Enter folder path: ")
if os.path.exists(folder_path):
results = doc_ai.analyze_document_batch(folder_path)
for filename, analysis in results.items():
print(f"\n{'='*50}")
print(f"FILE: {filename}")
print("="*50)
for key, value in analysis.items():
print(f"{key.upper()}:")
print(value)
print("-" * 30)
if __name__ == "__main__":
main()
3. Real-time Data Analysis Pipeline
#!/usr/bin/env python3
"""
Real-time Data Analysis with Ollama
Process streaming data and generate insights
"""
import json
import time
import threading
from datetime import datetime
from openai import OpenAI
from collections import deque
import pandas as pd
import numpy as np
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama"
)
class DataAnalyzer:
def __init__(self, model="llama3.1:8b"):
self.model = model
self.data_buffer = deque(maxlen=1000)
self.analysis_queue = deque()
self.is_running = False
def add_data_point(self, data_point):
"""Add new data point to buffer"""
data_point['timestamp'] = datetime.now().isoformat()
self.data_buffer.append(data_point)
# Trigger analysis if buffer has enough data
if len(self.data_buffer) >= 10:
self.queue_analysis()
def queue_analysis(self):
"""Queue data for analysis"""
recent_data = list(self.data_buffer)[-10:] # Last 10 points
self.analysis_queue.append({
'data': recent_data,
'timestamp': datetime.now().isoformat()
})
def analyze_data_trends(self, data_points):
"""Analyze trends in data"""
# Convert to DataFrame for analysis
df = pd.DataFrame(data_points)
# Basic statistics
numeric_cols = df.select_dtypes(include=[np.number]).columns
stats = {}
for col in numeric_cols:
stats[col] = {
'mean': df[col].mean(),
'std': df[col].std(),
'trend': 'increasing' if df[col].iloc[-1] > df[col].iloc[0] else 'decreasing',
'change_percent': ((df[col].iloc[-1] - df[col].iloc[0]) / df[col].iloc[0]) * 100
}
# Generate AI analysis
prompt = f"""
Analyze this data pattern and provide insights:
Data Statistics:
{json.dumps(stats, indent=2, default=str)}
Recent Data Points:
{json.dumps(data_points[-5:], indent=2, default=str)}
Provide:
1. Key trends and patterns
2. Anomalies or outliers
3. Predictions for next few data points
4. Recommended actions
"""
response = client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": "You are a data analyst expert specializing in trend analysis."},
{"role": "user", "content": prompt}
],
temperature=0.3,
max_tokens=800
)
return {
'statistics': stats,
'ai_insights': response.choices[0].message.content,
'timestamp': datetime.now().isoformat()
}
def detect_anomalies(self, data_points, threshold=2):
"""Detect statistical anomalies"""
if len(data_points) < 5:
return []
df = pd.DataFrame(data_points)
numeric_cols = df.select_dtypes(include=[np.number]).columns
anomalies = []
for col in numeric_cols:
values = df[col].values
mean = np.mean(values)
std = np.std(values)
for i, value in enumerate(values):
z_score = abs((value - mean) / std) if std > 0 else 0
if z_score > threshold:
anomalies.append({
'column': col,
'value': value,
'z_score': z_score,
'timestamp': data_points[i].get('timestamp'),
'index': i
})
return anomalies
def generate_report(self, analysis_results):
"""Generate comprehensive report"""
prompt = f"""
Generate an executive report based on this data analysis:
Analysis Results:
{json.dumps(analysis_results, indent=2, default=str)}
Format the report with:
## Executive Summary
## Key Findings
## Trend Analysis
## Risk Assessment
## Recommendations
## Next Steps
"""
response = client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": "You are creating executive reports for business stakeholders."},
{"role": "user", "content": prompt}
],
temperature=0.2,
max_tokens=1200
)
return response.choices[0].message.content
def start_analysis_worker(self):
"""Start background analysis worker"""
self.is_running = True
def worker():
while self.is_running:
if self.analysis_queue:
analysis_job = self.analysis_queue.popleft()
try:
# Perform analysis
results = self.analyze_data_trends(analysis_job['data'])
# Detect anomalies
anomalies = self.detect_anomalies(analysis_job['data'])
results['anomalies'] = anomalies
# Generate alerts for significant findings
if anomalies:
print(f"🚨 ANOMALIES DETECTED: {len(anomalies)} items")
# Print insights
print(f"\n📊 ANALYSIS UPDATE ({results['timestamp']})")
print("="*60)
print(results['ai_insights'])
if anomalies:
print(f"\n⚠️ ANOMALIES:")
for anomaly in anomalies[:3]: # Show top 3
print(f" - {anomaly['column']}: {anomaly['value']} (Z-score: {anomaly['z_score']:.2f})")
except Exception as e:
print(f"Analysis error: {e}")
time.sleep(5) # Check every 5 seconds
thread = threading.Thread(target=worker, daemon=True)
thread.start()
def stop_analysis_worker(self):
"""Stop background analysis"""
self.is_running = False
# Example usage and data simulation
def simulate_ecommerce_data():
"""Simulate e-commerce data stream"""
import random
base_sales = 1000
base_visitors = 5000
while True:
# Simulate business metrics
hour = datetime.now().hour
# Business hours multiplier
business_multiplier = 1.5 if 9 <= hour <= 17 else 0.8
# Add random fluctuation
sales = int(base_sales * business_multiplier * (1 + random.uniform(-0.3, 0.3)))
visitors = int(base_visitors * business_multiplier * (1 + random.uniform(-0.2, 0.4)))
conversion_rate = (sales / visitors) * 100 if visitors > 0 else 0
avg_order_value = random.uniform(50, 200)
# Occasionally add anomalies
if random.random() < 0.05: # 5% chance
sales *= random.choice([0.3, 3.0]) # Dramatic drop or spike
yield {
'sales': sales,
'visitors': visitors,
'conversion_rate': conversion_rate,
'avg_order_value': avg_order_value,
'hour': hour
}
def main():
analyzer = DataAnalyzer("llama3.1:8b")
analyzer.start_analysis_worker()
print("🚀 Starting real-time data analysis...")
print("Simulating e-commerce data stream...")
try:
data_stream = simulate_ecommerce_data()
for i, data_point in enumerate(data_stream):
analyzer.add_data_point(data_point)
# Print current data point
print(f"Data point {i+1}: Sales: {data_point['sales']}, "
f"Visitors: {data_point['visitors']}, "
f"Conversion: {data_point['conversion_rate']:.2f}%")
time.sleep(2) # New data every 2 seconds
except KeyboardInterrupt:
print("\n🛑 Stopping analysis...")
analyzer.stop_analysis_worker()
if __name__ == "__main__":
main()
Performance Optimization Tips {#optimization}
Hardware Optimization
# GPU Memory Optimization
export OLLAMA_MAX_LOADED_MODELS=1
export OLLAMA_MAX_VRAM=8GB
# CPU Optimization
export OLLAMA_NUM_PARALLEL=8
export OLLAMA_NUM_THREAD=16
# Context Window Management
export OLLAMA_CONTEXT_SIZE=4096
Model Quantization Guide
#!/usr/bin/env python3
"""
Model Quantization and Performance Testing
Find the best quantization for your hardware
"""
import time
import subprocess
import psutil
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
# Available quantization levels
QUANTIZATIONS = {
'fp16': 'Full precision (largest, best quality)',
'q8_0': '8-bit quantization (good balance)',
'q4_0': '4-bit quantization (smaller, faster)',
'q2_K': '2-bit quantization (smallest, fastest)'
}
def test_model_performance(model_name, test_prompt="Explain machine learning"):
"""Test model response time and quality"""
start_time = time.time()
try:
response = client.chat.completions.create(
model=model_name,
messages=[{"role": "user", "content": test_prompt}],
max_tokens=100
)
end_time = time.time()
response_time = end_time - start_time
response_text = response.choices[0].message.content
return {
'response_time': response_time,
'response_length': len(response_text),
'tokens_per_second': 100 / response_time,
'memory_usage': psutil.virtual_memory().percent
}
except Exception as e:
return {'error': str(e)}
def benchmark_quantizations(base_model):
"""Benchmark different quantizations of a model"""
results = {}
for quant, description in QUANTIZATIONS.items():
model_name = f"{base_model}:{quant}"
print(f"\nTesting {model_name}...")
# Try to pull model
try:
subprocess.run(['ollama', 'pull', model_name],
capture_output=True, check=True)
except subprocess.CalledProcessError:
print(f"❌ Failed to pull {model_name}")
continue
# Test performance
perf = test_model_performance(model_name)
results[quant] = perf
if 'error' not in perf:
print(f"✅ {description}")
print(f" Response time: {perf['response_time']:.2f}s")
print(f" Tokens/second: {perf['tokens_per_second']:.1f}")
print(f" Memory usage: {perf['memory_usage']:.1f}%")
else:
print(f"❌ Error: {perf['error']}")
return results
if __name__ == "__main__":
model = input("Enter base model name (e.g., llama3.1:8b): ").strip()
results = benchmark_quantizations(model)
print("\n" + "="*60)
print("BENCHMARK RESULTS")
print("="*60)
for quant, result in results.items():
if 'error' not in result:
print(f"{quant:10} | {result['tokens_per_second']:6.1f} tok/s | "
f"{result['response_time']:6.2f}s | {result['memory_usage']:5.1f}% RAM")
Integration Examples {#integration-examples}
VS Code Extension Integration
// VS Code extension for Ollama integration
import * as vscode from 'vscode';
import axios from 'axios';
export function activate(context: vscode.ExtensionContext) {
let disposable = vscode.commands.registerCommand('ollama.explainCode', async () => {
const editor = vscode.window.activeTextEditor;
if (!editor) {
vscode.window.showErrorMessage('No active editor');
return;
}
const selection = editor.selection;
const selectedText = editor.document.getText(selection);
if (!selectedText) {
vscode.window.showErrorMessage('No code selected');
return;
}
try {
const explanation = await explainCode(selectedText);
// Show explanation in new document
const doc = await vscode.workspace.openTextDocument({
content: explanation,
language: 'markdown'
});
await vscode.window.showTextDocument(doc);
} catch (error) {
vscode.window.showErrorMessage(`Error: ${error}`);
}
});
context.subscriptions.push(disposable);
}
async function explainCode(code: string): Promise<string> {
const response = await axios.post('http://localhost:11434/v1/chat/completions', {
model: 'deepseek-coder:6.7b',
messages: [
{
role: 'system',
content: 'You are a code explanation expert. Explain code clearly and concisely.'
},
{
role: 'user',
content: `Explain this code:\n\n${code}`
}
],
temperature: 0.1,
max_tokens: 1000
});
return response.data.choices[0].message.content;
}
export function deactivate() {}
Slack Bot Integration
#!/usr/bin/env python3
"""
Slack Bot powered by Ollama
Smart assistant for your team
"""
import os
from slack_bolt import App
from slack_bolt.adapter.socket_mode import SocketModeHandler
from openai import OpenAI
# Initialize Slack app
app = App(token=os.environ.get("SLACK_BOT_TOKEN"))
# Initialize Ollama client
ollama_client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama"
)
@app.message("hello")
def handle_hello(message, say):
"""Handle hello messages"""
say(f"Hi <@{message['user']}>! I'm your AI assistant powered by Ollama. How can I help?")
@app.message("explain")
def handle_explain(message, say, client):
"""Explain code or concepts"""
text = message['text'].replace('explain', '').strip()
if not text:
say("Please provide something to explain. Example: `explain recursion in Python`")
return
try:
response = ollama_client.chat.completions.create(
model="llama3.1:8b",
messages=[
{"role": "system", "content": "You are a helpful technical assistant. Provide clear, concise explanations."},
{"role": "user", "content": f"Explain: {text}"}
],
temperature=0.3,
max_tokens=800
)
explanation = response.choices[0].message.content
# Post as thread reply
client.chat_postMessage(
channel=message['channel'],
thread_ts=message['ts'],
text=f"```\n{explanation}\n```"
)
except Exception as e:
say(f"Sorry, I encountered an error: {str(e)}")
@app.command("/analyze")
def handle_analyze_command(ack, respond, command):
"""Analyze code or data"""
ack()
text = command['text']
try:
response = ollama_client.chat.completions.create(
model="deepseek-coder:6.7b",
messages=[
{"role": "system", "content": "You are a code analysis expert."},
{"role": "user", "content": f"Analyze this: {text}"}
],
temperature=0.2
)
analysis = response.choices[0].message.content
respond(f"Analysis:\n```\n{analysis}\n```")
except Exception as e:
respond(f"Error: {str(e)}")
@app.event("file_shared")
def handle_file_share(event, say):
"""Analyze shared files"""
# This would integrate with document analysis
say("I can analyze documents! Upload a file and mention me.")
if __name__ == "__main__":
# Start the app
handler = SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"])
handler.start()
Web Dashboard
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Ollama Dashboard</title>
<style>
body {
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
max-width: 1200px;
margin: 0 auto;
padding: 20px;
background: #f5f5f5;
}
.card {
background: white;
border-radius: 10px;
padding: 20px;
margin-bottom: 20px;
box-shadow: 0 2px 10px rgba(0,0,0,0.1);
}
.chat-container {
height: 400px;
overflow-y: auto;
border: 1px solid #ddd;
padding: 15px;
background: #f9f9f9;
}
.message {
margin-bottom: 15px;
padding: 10px;
border-radius: 8px;
}
.user { background: #007bff; color: white; margin-left: 20%; }
.assistant { background: #e9ecef; margin-right: 20%; }
.input-group {
display: flex;
gap: 10px;
margin-top: 10px;
}
input, select, textarea {
padding: 10px;
border: 1px solid #ddd;
border-radius: 5px;
}
button {
background: #007bff;
color: white;
border: none;
padding: 10px 20px;
border-radius: 5px;
cursor: pointer;
}
button:hover { background: #0056b3; }
.model-info {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
gap: 15px;
}
.metric {
text-align: center;
padding: 15px;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
border-radius: 8px;
}
</style>
</head>
<body>
<h1>🤖 Ollama AI Dashboard</h1>
<div class="card">
<h2>Model Selection</h2>
<div class="input-group">
<select id="modelSelect">
<option value="llama3.1:8b">Llama 3.1 8B</option>
<option value="mistral:7b">Mistral 7B</option>
<option value="deepseek-coder:6.7b">DeepSeek Coder</option>
</select>
<button onclick="loadModels()">Refresh Models</button>
<button onclick="getModelInfo()">Model Info</button>
</div>
</div>
<div class="card">
<h2>System Status</h2>
<div class="model-info">
<div class="metric">
<h3 id="responseTime">-- ms</h3>
<p>Response Time</p>
</div>
<div class="metric">
<h3 id="memoryUsage">-- %</h3>
<p>Memory Usage</p>
</div>
<div class="metric">
<h3 id="tokensPerSec">-- tok/s</h3>
<p>Tokens/Second</p>
</div>
</div>
</div>
<div class="card">
<h2>AI Chat</h2>
<div id="chatContainer" class="chat-container"></div>
<div class="input-group">
<textarea id="messageInput" placeholder="Type your message..." rows="2" style="flex: 1;"></textarea>
<button onclick="sendMessage()">Send</button>
<button onclick="clearChat()">Clear</button>
</div>
</div>
<script>
class OllamaDashboard {
constructor() {
this.baseUrl = 'http://localhost:11434';
this.currentModel = 'llama3.1:8b';
this.conversation = [];
this.loadModels();
this.updateStatus();
setInterval(() => this.updateStatus(), 5000);
}
async loadModels() {
try {
const response = await fetch(`${this.baseUrl}/api/tags`);
const data = await response.json();
const select = document.getElementById('modelSelect');
select.innerHTML = '';
data.models?.forEach(model => {
const option = document.createElement('option');
option.value = model.name;
option.textContent = model.name;
select.appendChild(option);
});
} catch (error) {
console.error('Failed to load models:', error);
}
}
async getModelInfo() {
const model = document.getElementById('modelSelect').value;
try {
const response = await fetch(`${this.baseUrl}/api/show`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name: model })
});
const data = await response.json();
alert(`Model: ${model}\nParameters: ${data.details?.parameter_size || 'Unknown'}\nFamily: ${data.details?.family || 'Unknown'}`);
} catch (error) {
console.error('Failed to get model info:', error);
}
}
async updateStatus() {
try {
// Test response time
const startTime = performance.now();
const response = await fetch(`${this.baseUrl}/api/version`);
const endTime = performance.now();
if (response.ok) {
document.getElementById('responseTime').textContent = `${Math.round(endTime - startTime)} ms`;
}
// Mock memory and token metrics (would need actual implementation)
document.getElementById('memoryUsage').textContent = `${Math.round(Math.random() * 30 + 50)}%`;
document.getElementById('tokensPerSec').textContent = `${Math.round(Math.random() * 20 + 30)} tok/s`;
} catch (error) {
document.getElementById('responseTime').textContent = 'Offline';
}
}
async sendMessage() {
const input = document.getElementById('messageInput');
const message = input.value.trim();
if (!message) return;
// Add user message to chat
this.addMessage('user', message);
input.value = '';
// Show typing indicator
const typingDiv = this.addMessage('assistant', 'Thinking...');
try {
const model = document.getElementById('modelSelect').value;
const response = await fetch(`${this.baseUrl}/v1/chat/completions`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: model,
messages: [
...this.conversation,
{ role: 'user', content: message }
],
temperature: 0.7,
stream: false
})
});
const data = await response.json();
const aiResponse = data.choices[0].message.content;
// Update conversation history
this.conversation.push({ role: 'user', content: message });
this.conversation.push({ role: 'assistant', content: aiResponse });
// Replace typing indicator with actual response
typingDiv.textContent = aiResponse;
} catch (error) {
typingDiv.textContent = `Error: ${error.message}`;
typingDiv.style.background = '#ffebee';
}
}
addMessage(role, content) {
const container = document.getElementById('chatContainer');
const messageDiv = document.createElement('div');
messageDiv.className = `message ${role}`;
messageDiv.textContent = content;
container.appendChild(messageDiv);
container.scrollTop = container.scrollHeight;
return messageDiv;
}
clearChat() {
document.getElementById('chatContainer').innerHTML = '';
this.conversation = [];
}
}
// Global functions for HTML onclick handlers
let dashboard;
document.addEventListener('DOMContentLoaded', () => {
dashboard = new OllamaDashboard();
});
function loadModels() {
dashboard.loadModels();
}
function getModelInfo() {
dashboard.getModelInfo();
}
function sendMessage() {
dashboard.sendMessage();
}
function clearChat() {
dashboard.clearChat();
}
// Enter key to send message
document.addEventListener('keydown', (e) => {
if (e.key === 'Enter' && e.target.id === 'messageInput' && !e.shiftKey) {
e.preventDefault();
sendMessage();
}
});
</script>
</body>
</html>
Conclusion
Ollama has transformed the AI landscape by making powerful language models accessible to everyone. Whether you’re a developer building the next AI-powered app, a researcher exploring language models, or a business looking to leverage AI without breaking the bank, Ollama provides the perfect solution.
Key Takeaways:
🎯 Start Simple: Begin with llama3.1:8b for the best balance
🚀 Scale Gradually: Upgrade to larger models as you need more capability
💰 Save Money: Replace expensive API calls with free local models
🔒 Stay Private: Keep sensitive data on your own hardware
⚡ Go Fast: Optimize performance with the right quantization
What’s Next?
- Ollama is rapidly evolving with new features like tool calling, multimodal support, and improved performance
- The community is building amazing integrations and applications
- More models are being added regularly, including specialized variants for coding, vision, and reasoning
Ready to get started? Pick the installation method for your platform, pull your first model, and join the local AI revolution!
Found this guide helpful? Share it with your team and help spread the word about local AI. The future is decentralized, private, and powerful – and it starts with Ollama.