Setting Up Ollama & Running DeepSeek R1 Locally for a Powerful RAG System

Table of Contents

Discover how to create a private AI-powered document analysis system using cutting-edge open-source tools.

System Requirements

16GB RAM minimum
10th Gen Intel Core i5 or equivalent
10GB free storage space
Windows 10+/macOS 12+/Linux Ubuntu 20.04+

🛠️ Step 1: Installing Ollama

Download Ollama for macOS, Linux, or Windows:

Download Ollama
Follow installation instructions based on your OS.

# For Linux
curl -fsSL https://ollama.ai/install.sh | sh

🤖 What is Ollama?

Ollama is a framework designed for running large language models (LLMs) directly on your local machine. It allows users to download, execute, and interact with AI models without relying on cloud-based APIs.

Example: ollama run deepseek-r1:1.5b – Executes DeepSeek R1 locally.
Why use it? It offers a free, private, and offline AI experience with low latency.

🔗 What is LangChain?

LangChain is a Python/JavaScript framework that enables the seamless integration of LLMs with various data sources, APIs, and memory systems.

Why use it? It helps connect LLMs to applications like chatbots, document processing, and Retrieval-Augmented Generation (RAG) systems.

📄 What is Retrieval-Augmented Generation (RAG)?

RAG is an AI technique that improves the accuracy of LLM responses by incorporating information retrieved from external sources like PDFs and databases.

Why use it? It enhances factual correctness and reduces hallucinations by referencing actual documents.
Example: An AI-powered Q&A system that fetches relevant document excerpts before generating responses.

⚡ DeepSeek R1: A Powerful Open-Source AI Model

DeepSeek R1 is an AI model optimized for logical reasoning, problem-solving, and factual retrieval.

Why use it? It excels in RAG applications and can run efficiently on local machines with Ollama.

🚀 How Do These Technologies Work Together?

Ollama runs DeepSeek R1 locally.
LangChain connects the AI model to external data.
RAG retrieves relevant information for accurate responses.
DeepSeek R1 generates high-quality, context-aware answers.

📈 Use Case Example: AI-Powered PDF Q&A System

This system allows users to upload a PDF and ask questions about its content. The AI, powered by DeepSeek R1, retrieves relevant sections and generates precise answers.

🎯 Why Run DeepSeek R1 Locally?

Feature	Cloud-Based Models	Local DeepSeek R1
Privacy	Data sent to external servers	100% Local & Secure
Speed	API latency & network delays	Instant inference
Cost	Pay per API request	Free after setup
Customization	Limited fine-tuning	Full model control
Deployment	Cloud-dependent	Works offline & on-premises

🛠️ Step 2: Running DeepSeek R1

ollama pull deepseek-r1:1.5b

ollama run deepseek-r1:1.5b

🛠️ Step 3: Setting Up a RAG System with Streamlit in a Virtual Environment

pip install -U langchain langchain-community streamlit pdfplumber semantic-chunkers
open-text-embeddings faiss ollama prompt-template langchain_experimental sentence-transformers faiss-cpu

🛠️ Step 4: Creating and Running the App

mkdir rag-system && cd rag-system

Create a Python script app.py and insert the following code:

import streamlit as st
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import RetrievalQA

st.title("📄 RAG System with DeepSeek R1 & Ollama")

uploaded_file = st.file_uploader("Upload your PDF", type="pdf")

if uploaded_file:
    with open("temp.pdf", "wb") as f:
        f.write(uploaded_file.getvalue())

    loader = PDFPlumberLoader("temp.pdf")
    docs = loader.load()

    text_splitter = SemanticChunker(HuggingFaceEmbeddings())
    documents = text_splitter.split_documents(docs)

    embedder = HuggingFaceEmbeddings()
    vector = FAISS.from_documents(documents, embedder)
    retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k": 3})

    llm = Ollama(model="deepseek-r1:1.5b")
    QA_PROMPT = PromptTemplate.from_template("Context: {context}\nQuestion: {question}\nAnswer:")
    qa = RetrievalQA(combine_documents_chain=StuffDocumentsChain(LLMChain(llm=llm, prompt=QA_PROMPT)), retriever=retriever)

    user_input = st.text_input("Ask a question:")
    if user_input:
        response = qa(user_input)["result"]
        st.write("**Response:**", response)

streamlit run app.py

Now we see streamlit running on the web and in the terminal at Local URL: http://localhost:8501.

👌 Final Thoughts

Congratulations! You have successfully set up a local RAG system with DeepSeek R1 and Ollama. Enjoy building AI-powered applications with privacy, speed, and full control!

The full code of this blog can be found here.