A Retrieval-Augmented Generation (RAG) app combines search tools and AI to provide accurate, context-aware results. This guide explains how to build a RAG app using Ollama and Docker. Ollama helps run large language models on your computer, and Docker simplifies deploying and managing apps in containers.
In this tutorial, you'll learn how to put a RAG app into a container and run it as a virtual sommelier, offering the best wine and food pairings. The app uses Ollama for language modeling, Qdrant as a vector database, and Streamlit for the user interface.
By the end, you'll have a working RAG app running locally with Docker, ready for development and testing in a container setup.
Prerequisites
Before starting:
Set Up the Project Environment
1. Clone the Sample Project Repository
git clone https://github.com/mfranzon/winy.git
cd winy
2. Review the Project Structure
The project should include the following structure:
βββ winy/
β βββ .gitignore
β βββ app/
β β βββ main.py
β β βββ Dockerfile
β β βββ requirements.txt
β βββ tools/
β β βββ create_db.py
β β βββ create_embeddings.py
β β βββ requirements.txt
β β βββ test.py
β β βββ download_model.sh
β βββ docker-compose.yaml
β βββ wine_database.db
β βββ LICENSE
β βββ README.md
3. Build and Run the Application
Use Docker Compose to build and start the services:
docker compose up --build
This command builds the Docker image and starts containers for the RAG application, including services for Qdrant and Ollama.
4. Access the Application
Once running, open the Streamlit application in your browser at http://localhost:8501
.
Configure Ollama API Access
Ollama provides APIs for model management, embedding generation, and completions.
1. Ensure Ollama Service is Running
Confirm that the docker-compose.yaml
file includes the Ollama service:
ollama:
image: ollama/ollama
container_name: ollama
ports:
- "11434:11434"
2. Create a Python Module to Interact with Ollamaβs API
Add the following to a new file, ollama_api.py
:
import requests
BASE_URL = "http://ollama:11434" # Docker container URL
# BASE_URL = "http://localhost:11434" # Host machine URL
def generate_embedding(text, model="all-minilm"):
url = f"{BASE_URL}/api/embeddings"
payload = {"model": model, "prompt": text}
response = requests.post(url, json=payload)
if response.status_code == 200:
return response.json()['embedding']
raise Exception(f"Error generating embedding: {response.text}")
def generate_completion(prompt, model="llama2"):
url = f"{BASE_URL}/api/generate"
payload = {"model": model, "prompt": prompt}
response = requests.post(url, json=payload)
if response.status_code == 200:
return response.json()['response']
raise Exception(f"Error generating completion: {response.text}")
Build the Flask Application
Create a backend server with Flask for your RAG application.
1. Create main.py
in the app
Directory
from flask import Flask, request, jsonify
from ollama_api import generate_embedding, generate_completion
import os
app = Flask(__name__)
@app.route("/embed", methods=["POST"])
def embed():
data = request.get_json()
text = data.get("text")
if not text:
return jsonify({"error": "Text is required."}), 400
try:
embedding = generate_embedding(text)
return jsonify({"embedding": embedding})
except Exception as e:
return jsonify({"error": str(e)}), 500
@app.route("/generate", methods=["POST"])
def generate():
data = request.get_json()
prompt = data.get("prompt")
if not prompt:
return jsonify({"error": "Prompt is required."}), 400
try:
completion = generate_completion(prompt)
return jsonify({"completion": completion})
except Exception as e:
return jsonify({"error": str(e)}), 500
if __name__ == "__main__":
app.run(host="0.0.0.0", port=int(os.environ.get("PORT", 8000)))
2. Create a Dockerfile for the Flask Application
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["flask", "run", "--host", "0.0.0.0", "--port", "8000"]
3. Add Dependencies in requirements.txt
flask==3.0.*
requests
4. Update docker-compose.yaml
services:
server:
build:
context: ./app
dockerfile: Dockerfile
ports:
- "8000:8000"
environment:
- OLLAMA=http://ollama:11434
depends_on:
- ollama
ollama:
image: ollama/ollama
ports:
- "11434:11434"
5. Build and Run the Containers
docker compose up --build
6. Test the Flask Application
Send POST requests to:
http://localhost:8000/embed
for embedding generationhttp://localhost:8000/generate
for text completion
Example payloads:
Embedding:
{
"text": "What is retrieval-augmented generation?"
}
Completion:
{
"prompt": "Explain retrieval-augmented generation in simple terms."
}
Conclusion
Youβve successfully created a RAG application using Ollama and Docker. This setup uses Ollamaβs API and Dockerβs container features for scalable and efficient app development. Improve your RAG pipeline by adding advanced retrieval methods like vector databases.