Discover how to create a private AI-powered document analysis system using cutting-edge open-source tools.
System Requirements
- 16GB RAM minimum
- 10th Gen Intel Core i5 or equivalent
- 10GB free storage space
- Windows 10+/macOS 12+/Linux Ubuntu 20.04+
🛠️ Step 1: Installing Ollama
Download Ollama for macOS, Linux, or Windows:
- Download Ollama
- Follow installation instructions based on your OS.
# For Linux
curl -fsSL https://ollama.ai/install.sh | sh
🤖 What is Ollama?

Ollama is a framework designed for running large language models (LLMs) directly on your local machine. It allows users to download, execute, and interact with AI models without relying on cloud-based APIs.
- Example:
ollama run deepseek-r1:1.5b
– Executes DeepSeek R1 locally. - Why use it? It offers a free, private, and offline AI experience with low latency.
🔗 What is LangChain?
LangChain is a Python/JavaScript framework that enables the seamless integration of LLMs with various data sources, APIs, and memory systems.
- Why use it? It helps connect LLMs to applications like chatbots, document processing, and Retrieval-Augmented Generation (RAG) systems.
📄 What is Retrieval-Augmented Generation (RAG)?
RAG is an AI technique that improves the accuracy of LLM responses by incorporating information retrieved from external sources like PDFs and databases.
- Why use it? It enhances factual correctness and reduces hallucinations by referencing actual documents.
- Example: An AI-powered Q&A system that fetches relevant document excerpts before generating responses.
⚡ DeepSeek R1: A Powerful Open-Source AI Model

DeepSeek R1 is an AI model optimized for logical reasoning, problem-solving, and factual retrieval.
- Why use it? It excels in RAG applications and can run efficiently on local machines with Ollama.
🚀 How Do These Technologies Work Together?
- Ollama runs DeepSeek R1 locally.
- LangChain connects the AI model to external data.
- RAG retrieves relevant information for accurate responses.
- DeepSeek R1 generates high-quality, context-aware answers.
📈 Use Case Example: AI-Powered PDF Q&A System
This system allows users to upload a PDF and ask questions about its content. The AI, powered by DeepSeek R1, retrieves relevant sections and generates precise answers.
🎯 Why Run DeepSeek R1 Locally?
Feature | Cloud-Based Models | Local DeepSeek R1 |
---|---|---|
Privacy | Data sent to external servers | 100% Local & Secure |
Speed | API latency & network delays | Instant inference |
Cost | Pay per API request | Free after setup |
Customization | Limited fine-tuning | Full model control |
Deployment | Cloud-dependent | Works offline & on-premises |
🛠️ Step 2: Running DeepSeek R1
ollama pull deepseek-r1:1.5b
ollama run deepseek-r1:1.5b

🛠️ Step 3: Setting Up a RAG System with Streamlit in a Virtual Environment

pip install -U langchain langchain-community streamlit pdfplumber semantic-chunkers
open-text-embeddings faiss ollama prompt-template langchain_experimental sentence-transformers faiss-cpu
🛠️ Step 4: Creating and Running the App
mkdir rag-system && cd rag-system
Create a Python script app.py
and insert the following code:
import streamlit as st
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import RetrievalQA
st.title("📄 RAG System with DeepSeek R1 & Ollama")
uploaded_file = st.file_uploader("Upload your PDF", type="pdf")
if uploaded_file:
with open("temp.pdf", "wb") as f:
f.write(uploaded_file.getvalue())
loader = PDFPlumberLoader("temp.pdf")
docs = loader.load()
text_splitter = SemanticChunker(HuggingFaceEmbeddings())
documents = text_splitter.split_documents(docs)
embedder = HuggingFaceEmbeddings()
vector = FAISS.from_documents(documents, embedder)
retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k": 3})
llm = Ollama(model="deepseek-r1:1.5b")
QA_PROMPT = PromptTemplate.from_template("Context: {context}\nQuestion: {question}\nAnswer:")
qa = RetrievalQA(combine_documents_chain=StuffDocumentsChain(LLMChain(llm=llm, prompt=QA_PROMPT)), retriever=retriever)
user_input = st.text_input("Ask a question:")
if user_input:
response = qa(user_input)["result"]
st.write("**Response:**", response)
streamlit run app.py
Now we see streamlit running on the web and in the terminal at Local URL: http://localhost:8501.


👌 Final Thoughts
Congratulations! You have successfully set up a local RAG system with DeepSeek R1 and Ollama. Enjoy building AI-powered applications with privacy, speed, and full control!
The full code of this blog can be found here.