Getting Started with GenAI Stack powered with Docker, LangChain, Neo4j and Ollama

Table of Contents

At DockerCon 2023, Docker announced a new GenAI Stack – a great way to quickly get started building GenAI-backed applications with only a few commands.

The GenAI Stack came about through a collaboration between Docker, Neo4j, LangChain, and Ollama. The goal of the collaboration was to create a pre-built GenAI stack of best-in-class technologies that are well integrated, come with sample applications, and make it easy for developers to get up and running.

What is GenAI Stack?

The GenAI Stack is a one-stop shop for getting started with GenAI app development. It is basically a set of Docker containers that are orchestrated by Docker Compose. It provides all the tools and resources you need to build and run GenAI apps, without having to worry about setting up and configuring everything yourself. It makes it easy to build and run AI apps that can generate text, code, and other creative content.

What is GenAI Stack composed of?

The stack is a set of Docker containers that make it easy to experiment with building and running Generative AI (GenAI) apps. The containers provide a dev environment of a pre-built, support agent app with data import and response generation use-cases. It includes:

Ollama – A management tool for local LLMs (Ollama)
Neo4j – A database for grounding
GenAI apps based on LangChain
Pre-configured LLMs – A preconfigured Large Language Models such as Llama2, GPT-3.5, and GPT-4, to jumpstart your AI projects.

Why Ollama, Neo4j and LangChain?

LangChain and Ollama were involved in the collaboration because of their expertise in LLMs. LangChain is a programming and orchestration framework for LLMs, and Ollama is a tool for running and managing LLMs locally.

Neo4j was involved in the collaboration because of its expertise in graph databases and knowledge graphs. Neo4j recognized that the combination of graphs and LLMs is powerful and that it could be used to build GenAI applications that are more accurate and reliable.

What is Ollama all about?

Ollama is a lightweight, extensible framework for building and running large language models (LLMs) on a local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.

Benefits of Ollama

If you are interested in building and running LLMs on your local machine, I encourage you to check out Ollama. Here are some of the benefits of using Ollama:

It is easy to use and install.
It supports a wide range of LLMs.
It is extensible and customizable.
It is actively maintained and updated.

Supported File Formats by Ollama

Ollama supports importing GGUF and GGML file formats, which means that you can use it to run a wide range of LLMs, including:

Ollama also supports customizing and creating your own models. This makes it a powerful tool for researchers and developers who are working on new advances in LLM technology.

Ollama is now available as an official Docker image

With 10,000+ DockerHub downloads till date, Ollama is now available as an official Docker sponsored open-source image, making it simpler to get up and running with large language models using Docker containers.

If you are interested in building and running LLMs on your local machine, I encourage you to check out Ollama. It is a great tool for getting started with GenAI app development.

Ollama can handle running LLMs with GPU acceleration on macOS. It does this by using Docker Desktop to create a containerized environment with the necessary dependencies installed.

What is Neo4j?

Source ~ https://neo4j.com

Neo4j is a native graph database that is used in the GenAI Stack to provide grounding for large language models (LLMs). Grounding is the process of anchoring LLMs to real-world knowledge and context. This is important because it helps LLMs to generate more accurate and relevant responses.

Neo4j is a good choice for grounding LLMs because it is fast and scalable. It can also store and query complex graph data, which is ideal for representing the relationships between different entities in the real world.

In the GenAI Stack, Neo4j is used to store a knowledge graph that contains information about a variety of topics, such as people, places, and events. The LLM can then access this knowledge graph to generate more accurate and relevant responses to user queries.

For example, if a user asks the LLM “What is the capital of France?”, the LLM can query the knowledge graph to find out that the capital of France is Paris. The LLM can then generate a response such as “The capital of France is Paris.”

Neo4j is also used in the GenAI Stack to provide context for LLMs. Context is the information that surrounds a piece of text. It is important because it helps LLMs to better understand the meaning of the text.

Here are some of the benefits of using Neo4j in the GenAI Stack:

It is fast and scalable.
It can store and query complex graph data.
It can provide grounding and context for LLMs.
It can help LLMs to generate more accurate and relevant responses to user queries.

If you are working on developing GenAI applications, I encourage you to consider using Neo4j. It is a powerful tool that can help you to build more accurate and reliable AI systems.

What is LangChain?

LangChain is a programming and orchestration framework for large language models (LLMs). It provides a simple and intuitive way to interact with LLMs, and it makes it easy to build GenAI applications.

What is LangChain built on?

LangChain is built on top of PyTorch, and it provides a Pythonic API for interacting with LLMs. LangChain also provides a number of features that make it easy to build and deploy GenAI applications, including:

A built-in model server that makes it easy to deploy LLMs to production.
A library of pre-trained LLMs that can be used to build GenAI applications quickly and easily.
A set of tools for debugging and monitoring LLM applications.

LangChain plays an important role in the GenAI Stack. It provides the programming and orchestration framework that is needed to build GenAI applications. LangChain also provides a number of features that make it easy to build and deploy GenAI applications to production.

Benefits of LangChain

Here are some of the benefits of using LangChain in the GenAI Stack:

It provides a simple and intuitive way to interact with LLMs.
It makes it easy to build GenAI applications.
It provides a built-in model server for deploying LLMs to production.
It provides a library of pre-trained LLMs.
It provides a set of tools for debugging and monitoring LLM applications.

If you are interested in building GenAI applications, I encourage you to check out LangChain. It is a powerful tool that can help you to build and deploy reliable AI systems.

Component of GenAI Stack

GenAI Stack comes bundled with the core components you need to get started, already integrated and set up for you in Docker containers. It makes it really easy to experiment with new models, hosted locally on your machine (such as Llama2) or via APIs (like OpenAI’s GPT). It is already set up to help you use the Retrieval Augmented Generation (RAG) architecture for LLM apps which is the easiest way to integrate an LLM into an application and give it access to your own data.

What is RAG and what problem does it solve?

Retrieval Augmented Generation (RAG) is a method for improving the performance of large language models (LLMs) by providing them with access to external knowledge sources. This is done by first retrieving a set of relevant documents from the knowledge source, and then using those documents to generate a response.

Benefits of RAG

RAG has several advantages over other methods for integrating LLMs into applications. First, it is relatively easy to implement. Second, it can be used to integrate LLMs with a wide variety of knowledge sources, including databases, text corpora, and even other LLMs. Third, RAG can lead to significant improvements in the accuracy and performance of LLMs.

A simple example of how RAG can be used in a GenAI app

The user asks the app a question, such as “What is the capital of France?”
The app uses a retrieval model to retrieve a set of relevant documents from a knowledge source, such as Wikipedia.
The app uses an LLM to generate a response to the user’s question, using the retrieved documents as context.
The app returns the response to the user.

In this example, the RAG architecture allows the app to generate a more accurate and informative response to the user’s question, because it is able to access and use information from the knowledge source.

Benefits of using RAG for GenAI apps

It is relatively easy to implement.
It can be used to integrate LLMs with a wide variety of knowledge sources.
It can lead to significant improvements in the accuracy and performance of LLMs.

If you are interested in building GenAI apps, I encourage you to consider using RAG. It is a powerful tool that can help you to build more accurate and reliable AI systems.

Getting Started

Prerequisites

Step 1. Install Docker Desktop for Mac 4.23.0

Note: There is a performance issue that impacts python applications in the latest release of Docker Desktop v4.24.0. Until a fix is available, please use version 4.23.0 or earlier.

Step 2. Install Ollama on Mac OS

Visit this link to download and install Ollama on Macbook.
Please note that currently, Windows is not supported by Ollama, so Windows users need to generate a OpenAI API key and configure the stack to use gpt-3.5 or gpt-4 in the .env file.

Choose your preferrable operating system.

Step 3. Create OpenAI Secret API Keys

Visit this link to create your new OpenAI Secret API Keys.

Step 4. Sign Up for LangChain Beta for API Keys

Visit this link in order to create Langchain Endpoint and API Keys. You will need the following information

LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_TRACING_V2=true # false
LANGCHAIN_PROJECT=default
LANGCHAIN_API_KEY=ls__cbabccXXXXXX

Step 5. Clone the repository

 git clone https://github.com/docker/genai-stack
 cd genai-stack

Step 6. Create .env file

cat .env 
OPENAI_API_KEY=sk-EsNJzI5uMBCXXXXXXXX
OLLAMA_BASE_URL=http://host.docker.internal:11434
NEO4J_URI=neo4j://database:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
LLM=llama2 #or any Ollama model tag, or gpt-4 or gpt-3.5
EMBEDDING_MODEL=sentence_transformer #or openai or ollama

LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_TRACING_V2=true # false
LANGCHAIN_PROJECT=default
LANGCHAIN_API_KEY=ls__cbabccXXXXXX

Don’t forget to change “localhost” to “database” under NEO4J_URI entry.

Step 7. Bring up Compose services

 docker compose up -d --build

 pulling 8daa9615cce3...  93% |██████████████  | (3.5/3.8 GB, 29 MB/s) [2m... pulling model (250s) - will take several minutes
genai-stack-pull-model-1  | ... pulling model (260s) - will take several minutes
pulling 8daa9615cce3... 100% |███████████████| (3.8/3.8 GB, 29 MB/s)
genai-stack-pull-model-1  | ... pulling model (270s) - will take several minutes
pulling 8c17c2ebb0ea... 100% |█████████████████| (7.0/7.0 kB, 3.9 MB/s)
pulling 7c23fb36d801... 100% |█████████████████| (4.8/4.8 kB, 989 kB/s)
genai-stack-pull-model-1  | ... pulling model (280s) - will take several minutes
pulling bec56154823a... 100% |████████████████████| (59/59 B, 103 kB/s)
pulling e35ab70a78c7... 100% |█████████████████████| (90/90 B, 15 kB/s)
genai-stack-pull-model-1  | ... pulling model (290s) - will take several minutes
pulling 09fe89200c09... 100% |██████████████████| (529/529 B, 4.2 MB/s)
genai-stack-pull-model-1  | verifying sha256 digest
genai-stack-pull-model-1  | writing manifest
genai-stack-pull-model-1  | removing any unused layers
genai-stack-pull-model-1  | success
genai-stack-pull-model-1 exited with code 0
genai-stack-loader-1      |
genai-stack-loader-1      | Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
genai-stack-loader-1      |
genai-stack-pdf_bot-1     |
genai-stack-pdf_bot-1     | Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
genai-stack-pdf_bot-1     |
genai-stack-bot-1         |
genai-stack-bot-1         | Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
genai-stack-bot-1         |
genai-stack-bot-1         |
genai-stack-bot-1         |   You can now view your Streamlit app in your browser.
genai-stack-bot-1         |
genai-stack-bot-1         |   URL: http://0.0.0.0:8501
genai-stack-bot-1         |
genai-stack-pdf_bot-1     |
genai-stack-pdf_bot-1     |   You can now view your Streamlit app in your browser.
genai-stack-pdf_bot-1     |
genai-stack-pdf_bot-1     |   URL: http://0.0.0.0:8503
genai-stack-pdf_bot-1     |
genai-stack-loader-1      |
genai-stack-loader-1      |   You can now view your Streamlit app in your browser.
genai-stack-loader-1      |
genai-stack-loader-1      |   URL: http://0.0.0.0:8502
genai-stack-loader-1      |

Step 8. Viewing the Services on Docker Dashboard

Step 9. Accessing the app

Visit http://0.0.0.0:8502 to access the following:

Click “Import”. It will take a minute or two to run the import. Most of the time is spent generating the embeddings. After or during the import you can click the link to http://localhost:7474 and log in with username “neo4j” and password “password” as configured in docker compose. There, you can see an overview in the left sidebar and show some connected data by clicking on the “pill” with the counts.

The data loader will import the graph using the following schema.

Result:

The graph schema for Stack Overflow consists of nodes representing Questions, Answers, Users, and Tags. Users are linked to Questions they’ve asked via the “ASKED” relationship and to Answers they’ve provided with the “ANSWERS” relationship. Each Answer is also inherently associated with a specific Question. Furthermore, Questions are categorized by their relevant topics or technologies using the “TAGGED” relationship connecting them to Tags.

Step 10. Accessing the Neo4j

As instructed, open http://localhost:7474 and log in with username “neo4j” and password “password” as configured in docker compose.

Query the Imported Data via a Chat Interface Using Vector + Graph Search

This application server on http://localhost:8501 has the classic LLM chat UI and lets the user ask questions and get answers.

There’s a switch called RAG mode where the user can rely either completely on the LLMs trained knowledge (RAG: Disabled), or the more capable (RAG: Enabled) mode where the application uses similarity search using text embedding and graph queries to find the most relevant questions and answers in the database.

Click “Highly ranked questions”

Accessing GenAI Stack PDF Bot

Open http://0.0.0.0:8503/ on the browser to access the PDF Bot that allows you to chat with your PDF file.

In order to test drive, I uploaded my latest resume and asked a quick question. It responded back with the right answer. Amazing !!