Join our Discord Server
Abraham Dahunsi Web Developer 🌐 | Technical Writer ✍️| DevOps EnthusiastπŸ‘¨β€πŸ’» | Python🐍 |

How to Build a RAG Application Using Ollama and Docker

2 min read

A Retrieval-Augmented Generation (RAG) app combines search tools and AI to provide accurate, context-aware results. This guide explains how to build a RAG app using Ollama and Docker. Ollama helps run large language models on your computer, and Docker simplifies deploying and managing apps in containers.

In this tutorial, you'll learn how to put a RAG app into a container and run it as a virtual sommelier, offering the best wine and food pairings. The app uses Ollama for language modeling, Qdrant as a vector database, and Streamlit for the user interface.

By the end, you'll have a working RAG app running locally with Docker, ready for development and testing in a container setup.

Prerequisites

Before starting:

  1. Install Docker on your system.
  2. Install Python 3.8 or later and pip.

Set Up the Project Environment

1. Clone the Sample Project Repository

git clone https://github.com/mfranzon/winy.git
cd winy

2. Review the Project Structure

The project should include the following structure:

β”œβ”€β”€ winy/
β”‚   β”œβ”€β”€ .gitignore
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ main.py
β”‚   β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”‚   └── requirements.txt
β”‚   β”œβ”€β”€ tools/
β”‚   β”‚   β”œβ”€β”€ create_db.py
β”‚   β”‚   β”œβ”€β”€ create_embeddings.py
β”‚   β”‚   β”œβ”€β”€ requirements.txt
β”‚   β”‚   β”œβ”€β”€ test.py
β”‚   β”‚   └── download_model.sh
β”‚   β”œβ”€β”€ docker-compose.yaml
β”‚   β”œβ”€β”€ wine_database.db
β”‚   β”œβ”€β”€ LICENSE
β”‚   └── README.md

3. Build and Run the Application

Use Docker Compose to build and start the services:

docker compose up --build

This command builds the Docker image and starts containers for the RAG application, including services for Qdrant and Ollama.

4. Access the Application

Once running, open the Streamlit application in your browser at http://localhost:8501.

Configure Ollama API Access

Ollama provides APIs for model management, embedding generation, and completions.

1. Ensure Ollama Service is Running

Confirm that the docker-compose.yaml file includes the Ollama service:

ollama:
  image: ollama/ollama
  container_name: ollama
  ports:
    - "11434:11434"

2. Create a Python Module to Interact with Ollama’s API

Add the following to a new file, ollama_api.py:

import requests

BASE_URL = "http://ollama:11434"  # Docker container URL
# BASE_URL = "http://localhost:11434"  # Host machine URL

def generate_embedding(text, model="all-minilm"):
    url = f"{BASE_URL}/api/embeddings"
    payload = {"model": model, "prompt": text}

    response = requests.post(url, json=payload)
    if response.status_code == 200:
        return response.json()['embedding']
    raise Exception(f"Error generating embedding: {response.text}")

def generate_completion(prompt, model="llama2"):
    url = f"{BASE_URL}/api/generate"
    payload = {"model": model, "prompt": prompt}

    response = requests.post(url, json=payload)
    if response.status_code == 200:
        return response.json()['response']
    raise Exception(f"Error generating completion: {response.text}")

Build the Flask Application

Create a backend server with Flask for your RAG application.

1. Create main.py in the app Directory

from flask import Flask, request, jsonify
from ollama_api import generate_embedding, generate_completion
import os

app = Flask(__name__)

@app.route("/embed", methods=["POST"])
def embed():
    data = request.get_json()
    text = data.get("text")

    if not text:
        return jsonify({"error": "Text is required."}), 400

    try:
        embedding = generate_embedding(text)
        return jsonify({"embedding": embedding})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

@app.route("/generate", methods=["POST"])
def generate():
    data = request.get_json()
    prompt = data.get("prompt")

    if not prompt:
        return jsonify({"error": "Prompt is required."}), 400

    try:
        completion = generate_completion(prompt)
        return jsonify({"completion": completion})
    except Exception as e:
        return jsonify({"error": str(e)}), 500

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=int(os.environ.get("PORT", 8000)))

2. Create a Dockerfile for the Flask Application

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["flask", "run", "--host", "0.0.0.0", "--port", "8000"]

3. Add Dependencies in requirements.txt

flask==3.0.*
requests

4. Update docker-compose.yaml

services:
  server:
    build:
      context: ./app
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    environment:
      - OLLAMA=http://ollama:11434
    depends_on:
      - ollama

  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"

5. Build and Run the Containers

docker compose up --build

6. Test the Flask Application

Send POST requests to:

  • http://localhost:8000/embed for embedding generation
  • http://localhost:8000/generate for text completion

Example payloads:

Embedding:

{
    "text": "What is retrieval-augmented generation?"
}

Completion:

{
    "prompt": "Explain retrieval-augmented generation in simple terms."
}

Conclusion

You’ve successfully created a RAG application using Ollama and Docker. This setup uses Ollama’s API and Docker’s container features for scalable and efficient app development. Improve your RAG pipeline by adding advanced retrieval methods like vector databases.


Resources

Have Queries? Join https://launchpass.com/collabnix

Abraham Dahunsi Web Developer 🌐 | Technical Writer ✍️| DevOps EnthusiastπŸ‘¨β€πŸ’» | Python🐍 |
Join our Discord Server
Index