Running Ollama 2 on NVIDIA Jetson Nano with GPU using Docker

Table of Contents

NVIDIA Jetson devices are powerful platforms designed for edge AI applications, offering excellent GPU acceleration capabilities to run compute-intensive tasks like language model inference.

With official support for NVIDIA Jetson devices, Ollama brings the ability to manage and serve Large Language Models (LLMs) locally, ensuring privacy, performance, and offline operation. By integrating Open WebUI, you can enhance your workflow with an intuitive web interface for managing these models.

It is important to note that the NVIDIA Jetson Nano, equipped with 4GB of memory, can run smaller LLaMA models, particularly those with fewer parameters, such as the 7B models. However, due to its limited memory, running these models may require optimizations like quantization to reduce memory usage.

For instance, using 4-bit quantization can make it feasible to run these models on the Jetson Nano. It’s important to note that while the Jetson Nano can handle these smaller models, performance may be constrained compared to more powerful hardware. Additionally, some users have reported challenges in utilizing GPU acceleration with pre-built binaries on the Jetson Nano, suggesting that building from source might be necessary to achieve optimal performance.

This guide will walk you through setting up Ollama on your Jetson device, integrating it with Open WebUI, and configuring the system for optimal GPU utilization. Whether you’re a developer or an AI enthusiast, this setup allows you to harness the full potential of LLMs right on your Jetson device.

Pre-requisite

Jetson Nano
A 5V 4Ampere Charger
64GB SD card
WiFi Adapter
Display Port
Wireless Keyboard
Wireless mouse

Software

Download Jetson SD card image from this link
Raspberry Pi Imager installed on your local system

Preparing Your Jetson Nano

Unzip the SD card image
Insert SD card into your system.
Bring up Raspberry Pi Imager tool to flash image into the SD card

Step 0. Verify the Jetson device

Begin by verifying the L4T (Linux for Tegra) version on your Jetson device. Each Jetson platform runs a specific JetPack version tied to an L4T release. To check your configuration:

cat /etc/nv_tegra_release
# R32 (release), REVISION: 7.1, GCID: 29818004, BOARD: t210ref, EABI: aarch64, DATE: Sat Feb 19 17:05:08 UTC 2022

This output confirms the device is using L4T R32.7.1, compatible with JetPack 4.6.1. Ensure your Jetson device is updated to the latest supported L4T version to avoid compatibility issues.

Using NVIDIA Jetpack 6.X

If you’re using Jetpack 6.X, the whole process of installing Ollama and Open WebUI with GPU is straightforward.

Prerequisite

Ensure that you have Jetpack 6.0 installed on your Jetson Nano device. You can download the SDK Manager on the remote Windows or Linux and follow the tutorial from the official NVIDIA Developer site.

Step 1. Verify L4T Version

To check the L4T (Linux for Tegra) version on your NVIDIA Jetson device (e.g., Jetson Nano, Jetson Xavier), follow these steps:

Run the following command to retrieve your current L4T version.

head -n 1 /etc/nv_tegra_release

Here are the list of supported L4T versions:

35.3.1
35.4.1
35.5.0
36.3.0

If your L4T version does not match the supported versions listed above, you may need to re-flash the system on your NVIDIA Jetson device. You might need to use SDK Manager on another computer to re-flash the device. You can download the SDK Manager and follow the tutorial from the official NVIDIA Developer site.

Step 2. Keep `apt` up to date:

   sudo apt update && sudo apt upgrade

Step 3. Install `jetpack`:

   sudo apt install jetpack

Step 4. Add users

Add your user to the docker group and restart the Docker service to apply the change:

   sudo usermod -aG docker $USER && \
   newgrp docker && \
   sudo systemctl daemon-reload && \
   sudo systemctl restart docker

Step 5. Install jetson-examples:

   pip3 install jetson-examples

Step 6. Reboot system

   sudo reboot

Step 7. Install Ollama

   reComputer run ollama

Optional: If you run the above command via ssh and encounter the error command not found: reComputer, you can resolve this by executing the following command:

   source ~/.profile

Step 8. Run a model

The smallest LLaMA model available for download is TinyLlama, a compact 1.1 billion parameter model. Despite its reduced size, TinyLlama demonstrates remarkable performance across various tasks, making it suitable for applications with limited computational resources. You can access TinyLlama through its GitHub repository or via Hugging Face.

Let’s run the tinyllama model and perform tasks like generating Python code:

ollama run tinyllama
>>> > Can you write a Python script to calculate the factorial of a number?
Sure! Here’s the code:

def factorial(n):
    if n == 0 or n == 1:
        return 1
    else:
        return n * factorial(n - 1)

num = int(input("Enter a number: "))
print(f"The factorial of {num} is {factorial(num)}")

Step 9. Install and run Open WebUI through docker

   docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

Once the installation is finished, you can access the GUI by visiting YOUR_SERVER_IP:3000 in your browser.

Access the API endpoints by navigating to YOUR_SERVER_IP/ollama/docs#/. For comprehensive documentation, please refer to the official resources: the Ollama API Documentation (recommended) and Open WebUI API Endpoints.

Using Jetpack 4.X

If you’re using the older version of Jetpack 4.x, then things are quite complex.

[QUICK UPDATE: As per the GitHub issue, you might need to run the following bash script to compile Ollama from scratch if you want to utilise the GPU inbuilt into the Jetson device. But before running the script, you might need to ensure that CUDA driver is configured

Setup CUDA drivers

Copy the below script in a file called “setup_cuda.sh”.

#!/bin/bash

set -e  # Exit immediately if a command exits with a non-zero status.

echo "=== Step 1: Verify CUDA Installation ==="

# Check if /usr/local/cuda exists
if [ -L "/usr/local/cuda" ]; then
    echo "/usr/local/cuda is a symbolic link."
    CUDA_TARGET=$(readlink -f /usr/local/cuda)
    echo "It points to: $CUDA_TARGET"
else
    echo "/usr/local/cuda is not properly set up. Checking available CUDA versions..."
fi

# Check installed CUDA versions in /usr/local
echo "=== Installed CUDA Versions in /usr/local ==="
ls -l /usr/local | grep cuda || echo "No CUDA versions found in /usr/local"

# Check if nvcc exists
if [ -x "/usr/local/cuda/bin/nvcc" ]; then
    echo "nvcc found in /usr/local/cuda/bin. CUDA Toolkit is installed."
else
    echo "nvcc not found. CUDA Toolkit installation is incomplete or missing."
    echo "Please reinstall CUDA using NVIDIA SDK Manager."
    exit 1
fi

echo "=== Step 2: Fix Symbolic Link to CUDA ==="
if [ -x "/usr/local/cuda/bin/nvcc" ]; then
    echo "Fixing symbolic link..."
    sudo ln -sf /usr/local/cuda-10.2 /usr/local/cuda
else
    echo "No valid CUDA installation detected. Please reinstall CUDA."
    exit 1
fi

echo "=== Step 3: Set Up CUDA Environment Variables ==="

# Add CUDA environment variables to ~/.bashrc if not already present
if ! grep -q 'export PATH=/usr/local/cuda/bin' ~/.bashrc; then
    echo "Adding CUDA environment variables to ~/.bashrc"
    echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
    echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
else
    echo "CUDA environment variables already set in ~/.bashrc"
fi

# Manually set the environment variables for the current session
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

echo "Reloading environment variables..."
source ~/.bashrc

echo "=== Step 4: Verify nvcc Command ==="
if command -v nvcc >/dev/null 2>&1; then
    echo "CUDA environment is configured correctly."
    nvcc --version
else
    echo "nvcc is still not found. Check your ~/.bashrc and environment variables."
    exit 1
fi

echo "=== CUDA Setup Completed Successfully ==="

Execute the script

Ensure that the script is executable:

./setup_cuda.sh
=== Step 1: Verify CUDA Installation ===
/usr/local/cuda is a symbolic link.
It points to: /usr/local/cuda-10.2
=== Installed CUDA Versions in /usr/local ===
lrwxrwxrwx  1 root root   22 Feb 23  2022 cuda -> /etc/alternatives/cuda
lrwxrwxrwx  1 root root   25 Feb 23  2022 cuda-10 -> /etc/alternatives/cuda-10
drwxr-xr-x 12 root root 4096 Feb 23  2022 cuda-10.2
nvcc found in /usr/local/cuda/bin. CUDA Toolkit is installed.
=== Step 2: Fix Symbolic Link to CUDA ===
Fixing symbolic link...
=== Step 3: Set Up CUDA Environment Variables ===
CUDA environment variables already set in ~/.bashrc
Reloading environment variables...
=== Step 4: Verify nvcc Command ===
CUDA environment is configured correctly.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0
=== CUDA Setup Completed Successfully ===

Confirm if CUDA is properly installed

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0

In case you face an issue with nvcc not being installed even after running the script, then use the following script to check and fix all the environment variable stuff:

#!/bin/bash

set -e  # Exit immediately if a command exits with a non-zero status.

echo "=== Step 1: Set CUDA Environment Variables ==="
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_PATH=/usr/local/cuda
export CUDA_11=/usr/local/cuda

echo "CUDA environment variables set."
nvcc --version || { echo "CUDA Toolkit is not installed or not configured correctly. Exiting."; exit 1; }

echo "=== Step 2: Verify and Navigate to Project Directory ==="
PROJECT_DIR="/home/ajeetraina/ollama"

if [ ! -d "$PROJECT_DIR" ]; then
    echo "Project directory $PROJECT_DIR not found. Cloning the project..."
    git clone https://github.com/ollama/ollama.git "$PROJECT_DIR"
else
    echo "Project directory $PROJECT_DIR found. Pulling the latest changes..."
    cd "$PROJECT_DIR"
    git pull
fi

cd "$PROJECT_DIR/llama"

echo "=== Step 3: Print Runner Targets ==="
RUNNER_TARGETS=$(make print-RUNNER_TARGETS | grep RUNNER_TARGETS | awk -F'=' '{print $2}')
echo "Detected Runner Targets: $RUNNER_TARGETS"

if [[ "$RUNNER_TARGETS" == "default" ]]; then
    echo "Only 'default' runner detected. This is likely the CPU-based runner."
fi

echo "=== Step 4: Clean Previous Builds ==="
make clean || echo "No previous builds to clean."

echo "=== Step 5: Attempt to Build CUDA-Based Runner ==="
if [[ "$RUNNER_TARGETS" == *"cuda_v11"* ]]; then
    echo "Building CUDA-based runner (cuda_v11)..."
    make -f make/Makefile.cuda_v11 || { echo "Failed to build CUDA-based runner. Skipping CUDA."; }
else
    echo "CUDA-based runner not detected or incompatible. Skipping CUDA."
    export OLLAMA_SKIP_CUDA_GENERATE=1
fi

echo "=== Step 6: Build Default Runner ==="
echo "Building default runner (CPU-based)..."
make default || { echo "Failed to build default runner. Exiting."; exit 1; }

echo "=== Step 7: Check Build Output ==="
BINARY_PATH="$PROJECT_DIR/llama/build/linux-arm64/runners/cpu/ollama_llama_server"

if [ -f "$BINARY_PATH" ]; then
    echo "Binary built successfully: $BINARY_PATH"
else
    echo "Build completed, but the binary was not found. Exiting."
    exit 1
fi

echo "=== Step 8: Install the Binary ==="
sudo cp "$BINARY_PATH" /usr/local/bin/ollama
echo "Binary installed to /usr/local/bin/ollama."

echo "=== Step 9: Verify Installation ==="
if command -v ollama >/dev/null 2>&1; then
    echo "Ollama successfully installed!"
    ollama --help
else
    echo "Failed to install Ollama or binary not in PATH. Exiting."
    exit 1
fi

echo "=== Script Completed Successfully ==="

Save it as setup_ollama_jetson.sh.

Compiling Ollama from Scratch

$ ./setup_ollama_jetson.sh
=== Step 1: Set CUDA Environment Variables ===
CUDA environment variables set.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0
=== Step 2: Verify and Navigate to Project Directory ===
Project directory /home/ajeetraina/ollama found. Pulling the latest changes...
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Total 4 (delta 3), reused 4 (delta 3), pack-reused 0 (from 0)
Unpacking objects: 100% (4/4), done.
From https://github.com/ollama/ollama
 * [new branch]        parth/fix-referencing-so -> origin/parth/fix-referencing-so
Already up to date.
=== Step 3: Print Runner Targets ===
Detected Runner Targets: default
Only 'default' runner detected. This is likely the CPU-based runner.
=== Step 4: Clean Previous Builds ===
rm -rf /home/ajeetraina/ollama/llama/build/linux-arm64
go clean -cache
=== Step 5: Attempt to Build CUDA-Based Runner ===
CUDA-based runner not detected or incompatible. Skipping CUDA.
=== Step 6: Build Default Runner ===
Building default runner (CPU-based)...
make -f make/Makefile.default
make[1]: Entering directory '/home/ajeetraina/ollama/llama'
GOARCH=arm64 go build -buildmode=pie "-ldflags=-w -s \"-X=github.com/ollama/ollama/version.Version=0.5.1-0-gde52b6c\" \"-X=github.com/ollama/ollama/llama.CpuFeatures=\" " -trimpath   -o /home/ajeetraina/ollama/llama/build/linux-arm64/runners/cpu/ollama_llama_server ./runner

cp /home/ajeetraina/ollama/llama/build/linux-arm64/runners/cpu/ollama_llama_server /home/ajeetraina/ollama/dist/linux-arm64/lib/ollama/runners/cpu/ollama_llama_server
/usr/bin/pigz --best -c /home/ajeetraina/ollama/llama/build/linux-arm64/runners/cpu/ollama_llama_server > /home/ajeetraina/ollama/build/linux/arm64/cpu/ollama_llama_server.gz
make[1]: Leaving directory '/home/ajeetraina/ollama/llama'
=== Step 7: Check Build Output ===
Binary built successfully: /home/ajeetraina/ollama/llama/build/linux-arm64/runners/cpu/ollama_llama_server
=== Step 8: Install the Binary ===
[sudo] password for ajeetraina:
Sorry, try again.
[sudo] password for ajeetraina:
Binary installed to /usr/local/bin/ollama.
=== Step 9: Verify Installation ===
Ollama successfully installed!
Usage of ollama:
  -batch-size int
        Batch size (default 512)
  -ctx-size int
        Context (or KV cache) size (default 2048)
  -flash-attn
        Enable flash attention
  -kv-cache-type string
        quantization type for KV cache (default: f16)
  -lora value
        Path to lora layer file (can be specified multiple times)
  -main-gpu int
        Main GPU
  -mlock
        force system to keep model in RAM rather than swapping or compressing
  -mmproj string
        Path to projector binary file
  -model string
        Path to model binary file
  -multiuser-cache
        optimize input cache algorithm for multiple users
  -n-gpu-layers int
        Number of layers to offload to GPU
  -no-mmap
        do not memory-map model (slower load but may reduce pageouts if not using mlock)
  -parallel int
        Number of sequences to handle simultaneously (default 1)
  -port int
        Port to expose the server on (default 8080)
  -requirements
        print json requirement information
  -tensor-split string
        fraction of the model to offload to each GPU, comma-separated list of proportions
  -threads int
        Number of threads to use during generation (default 4)
  -verbose
        verbose output (default: disabled)
=== Script Completed Successfully ===

Download the model

The ollama binary requires a valid Llama model file to operate. Download a pre-trained Llama model from a source like Hugging Face or another trusted model repository. Let’s download the lightweight Llama model.

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens.

wget https://huggingface.co/TheBloke/Tinyllama-2-1b-miniguanaco-GGUF/resolve/main/tinyllama-2-1b-miniguanaco.Q2_K.gguf

It might take time to download this model based on your internet speed. Keep patience!

Resolving cdn-lfs.hf.co (cdn-lfs.hf.co)... 108.159.15.45, 108.159.15.7, 108.159.15.40, ...
Connecting to cdn-lfs.hf.co (cdn-lfs.hf.co)|108.159.15.45|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 482149856 (460M) [binary/octet-stream]
Saving to: ‘tinyllama-2-1b-miniguanaco.Q2_K.gguf’

tinyllama-2-1b-miniguanaco.Q2_K 100%[======================================================>] 459.81M  2.27MB/s    in 6m 34s

2024-12-08 10:59:38 (1.17 MB/s) - ‘tinyllama-2-1b-miniguanaco.Q2_K.gguf’ saved [482149856/482149856]

Copy the model to the right directory

cp -rf tinyllama-2-1b-miniguanaco.Q2_K.gguf /home/ajeetraina/ollama/models/

Run ollama with the Model Path and GPU

Specify the path to the model when running the server:

ollama -model /home/ajeetraina/ollama/models/tinyllama-2-1b-miniguanaco.Q2_K.gguf -port 8080 -n-gpu-layers 6 -batch-size 256 -ctx-size 1024
time=2024-12-08T11:03:58.433+05:30 level=INFO source=runner.go:941 msg="starting go runner"
time=2024-12-08T11:03:58.433+05:30 level=INFO source=runner.go:942 msg=system info="AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = -1 | SVE = -1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = -1 | LLAMAFILE = 1 | cgo(gcc)" threads=4
time=2024-12-08T11:03:58.433+05:30 level=INFO source=.:0 msg="Server listening on 127.0.0.1:8080"
llama_model_loader: loaded meta data with 20 key-value pairs and 201 tensors from /home/ajeetraina/ollama/models/tinyllama-2-1b-miniguanaco.Q2_K.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = abdgrt_tinyllama-2-1b-miniguanaco
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 10
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32003]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32003]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32003]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:               general.quantization_version u32              = 2
..

Key Insights From the Metadata

Model Name and Size:

Model Name: abdgrt_tinyllama-2-1b-miniguanaco
Model Size: 459.11 MiB
Parameters: 1.10 Billion
Quantization: Q2_K – Medium (uses quantized weights for efficient inference).

Model Tokens:

Special Tokens:

BOS (Beginning of Sequence): token.
EOS (End of Sequence): token.
LF (Line Feed): <0x0A> token.
EOT (End of Text): <|im_end|> token.
Max Token Length: 48 (This is the maximum token length for the current model configuration).

Context Size:

n_ctx: 2048 (Supports a context window of 2048 tokens).
n_batch: 512 (Batch size used during inference).

Hardware Information:

CPU Buffers: 459.11 MiB (Model weights) + 44.00 MiB (KV Cache for attention mechanisms).
Compute Buffers: 148.01 MiB used for calculations.
Graph Nodes: 710 (Indicates the computational graph structure of the model).

The model is successfully initialized and is ready to generate predictions or process input.

Collect the Full Response

Since the response is being streamed in chunks, you can collect the full response by processing the JSON stream.

Using jq to Format the Output: Pipe the response through jq to extract and format the content field:

curl -X POST http://127.0.0.1:8080/completion -H "Content-Type: application/json" -d '{"prompt": "Hello, TinyLlama!"}' | jq -r '.content'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4082    0  4051    0    31    125      0 --:--:--  0:00:32 --:--:--   151


Please
 write
 an
 ess
ay
 on
 "
The
 Ro
le
 of
 the

curl -X POST http://127.0.0.1:8080/completion -H "Content-Type: application/json" -d '{"prompt": "Please write an essay on the role of technology in modern education."}' | jq -r 'select(.content != null) | .content' | tr -d '\n'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 54210    0 54128    0    82    308      0 --:--:--  0:02:55 --:--:--   295

 1. What is the impact of the spread of Christianity in Europe and how has it influenced the development of society, culture, and government in Europe during the Middle Ages?2. Describe the Middle Ages as a time of great uncertainty and change within Europe and what can be learned from that time period about societal dynamics and political systems. 3. What were the main challenges and opportunities facing European nobility and how was their relationship with the government and monarchs different during the Middle Ages compared to today? 4. Describe the complex system of taxation in Europe during the Middle Ages, as well as its impact on society and governmental processes. 5. What were the key institutions and powers of government during the Middle Ages, and how has that changed over time in Europe and what are the current state of affairs? 6. Describe the impact of the Reformation and the Renaissance on European culture and society, and what influence it had on the development of politics and governmental structures during the Middle Ages. 7. What were some of the most significant changes to society, politics, and economics during the Middle Ages, and how have these shaped the modern economy today? 8. Examine the impact of the Crusades and the Reconnaissance Order in Europe during the Middle Ages, and what role religious ideology played in shaping political, economic, and social structures in Europe during this period. 9. What were some of the earliest written records of politics, governmental structures, and monarchs in Europe during the Middle Ages? How have their influence evolved over time, and what can be learned from that time period about societal dynamics and political systems in Europe during this period? 10. Describe the impact of the Reformation and the Wars of the Roses on European society, politics, and governmental structures during the Middle Ages, and what role religious ideology played in shaping political, economic, and social structures in Europe during this period.<|im_end|><|im_start|>assistantThe impact of the spread of Christianity in Europe and how has it influenced the development of society, culture, and government in Europe during the Middle Ages is significant. The Middle Ages marked a time when Europe was going through many social changes and transformations such as the decline of feudalism, rise of absolutism...

Use JTOP to see if GPU is being used or not

Using Ollama and Docker

Step 1: Prepare the Directory for Docker

Ensure that all required files are in the /home/ajeetraina/ollama directory:

The compiled Ollama binary (e.g., ollama).
Model files (e.g., /home/ajeetraina/ollama/models/model.gguf).
Any dependencies or supporting scripts.

Create a new folder to set up your Docker environment:

mkdir /home/ajeetraina/ollama-docker
cd /home/ajeetraina/ollama-docker

Copy the Ollama binary and models to this new folder:

cp /home/ajeetraina/ollama/ollama .
cp -r /home/ajeetraina/ollama/models .

Step 2: Write the Dockerfile

Create a file named Dockerfile in the /home/ajeetraina/ollama-docker directory:

# Use a lightweight Ubuntu base image
FROM ubuntu:20.04

# Install necessary dependencies
RUN apt-get update && apt-get install -y \
    libssl-dev \
    libcurl4-openssl-dev \
    zlib1g-dev \
    wget \
    curl && \
    apt-get clean

# Set the working directory
WORKDIR /app

# Copy the Ollama binary and model files into the container
COPY ollama /app/ollama
COPY models /app/models

# Make the Ollama binary executable
RUN chmod +x /app/ollama

# Expose port 8080 for the Ollama server
EXPOSE 8080

# Run Ollama as the container's entry point
CMD ["./ollama", "-model", "/app/models/model.gguf", "-port", "8080"]

Step 3: Build the Docker Image

Navigate to the Docker build directory:

cd /home/ajeetraina/ollama-docker

Build the Docker image:

docker build -t ollama-image .

Step 4: Run the Docker Container

Start a container from the built image:

docker run -d --name ollama-container -p 8080:8080 ollama-image

Verify that the container is running:

docker ps

You should see ollama-container in the list of running containers.

Test the server: Use curl to send a test prompt to the Ollama server:

curl -X POST http://127.0.0.1:8080/completion \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello, TinyLlama!"}'

Step 5: Use GPU Acceleration (Optional)

If you want to enable GPU acceleration in your Docker container:

Install NVIDIA Container Toolkit: Follow the official NVIDIA instructions to install the container toolkit: NVIDIA Container Toolkit.

Example for Jetson Nano:

sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Modify the Dockerfile: Use an NVIDIA CUDA base image to include GPU support:

FROM nvidia/cuda:11.8.0-runtime-ubuntu20.04

RUN apt-get update && apt-get install -y \
    libssl-dev \
    libcurl4-openssl-dev \
    zlib1g-dev \
    wget \
    curl && \
    apt-get clean

WORKDIR /app
COPY ollama /app/ollama
COPY models /app/models
RUN chmod +x /app/ollama

EXPOSE 8080
CMD ["./ollama", "-model", "/app/models/model.gguf", "-port", "8080", "-n-gpu-layers", "4"]

Run the Container with GPU Support:

docker run --gpus all -d --name ollama-container -p 8080:8080 ollama-image

Step 6: Test and Debug

Check Container Logs: If something doesn’t work, view the container logs:

docker logs ollama-container

Access the Container: Enter the container for debugging:

docker exec -it ollama-container bash

Monitor GPU Usage: While the container is running, use tegrastats or nvidia-smi to confirm GPU utilization.

Run Open WebUI

Run Open WebUI with GPU acceleration in a separate Docker container:

sudo docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

Step 4. Verify Running Containers

Ensure both Ollama and Open WebUI containers are running correctly:

sudo docker ps
CONTAINER ID   IMAGE                                COMMAND               CREATED          STATUS                            PORTS                                           NAMES
dee2d1fbe4cf   ghcr.io/open-webui/open-webui:cuda   "bash start.sh"       10 seconds ago   Up 6 seconds (health: starting)   0.0.0.0:3000->8080/tcp, :::3000->8080/tcp       open-webui
9fd89a4fa908   ollama/ollama                        "/bin/ollama serve"   52 seconds ago   Up 48 seconds                     0.0.0.0:11434->11434/tcp, :::11434->11434/tcp   ollama

Results:

cuda: Pulling from open-webui/open-webui
6d29a096dd42: Pull complete
6fab32a80202: Pull complete
610eb561c31b: Pull complete
50c0fb1f456e: Pull complete
ae5672aeb8ae: Pull complete
4f4fb700ef54: Pull complete
639718444375: Pull complete
5dcf97af08b1: Pull complete
ea9079f84622: Pull complete
e3fc97a4f07a: Pull complete
a538afa31f12: Pull complete
86ede3d9066a: Pull complete
a5aa461a25d1: Pull complete
6acc9cdc9b03: Pull complete
1920af2d5f9d: Pull complete
Digest: sha256:781acd8f2b45bdf45ac9a89fa80d52a6a966d9e1e7b55fbb5f0f1397ce5d9515
Status: Downloaded newer image for ghcr.io/open-webui/open-webui:cuda
843100c8d64d0ab9ea78fd64f4ffced0a62ce8783c850ce66d7ebb890f102e5a

ajeetraina@ajeetraina-desktop:~$ sudo docker ps
[sudo] password for ajeetraina:
CONTAINER ID   IMAGE                                COMMAND           CREATED         STATUS                     PORTS                                       NAMES
843100c8d64d   ghcr.io/open-webui/open-webui:cuda   "bash start.sh"   4 minutes ago   Up 4 minutes (unhealthy)   0.0.0.0:3000->8080/tcp, :::3000->8080/tcp   open-webui

Bundled Installation of Open WebUI with Ollama

For a simplified setup, you can use a bundled Docker image that integrates both Open WebUI and Ollama.

Using GPU

This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. Choose the appropriate command based on your hardware setup:

sudo docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Using CPU only

For CPU Only: If you’re not using a GPU, use this command instead:

sudo docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Both commands facilitate a built-in, hassle-free installation of both Open WebUI and Ollama, ensuring that you can get everything up and running swiftly.

Conclusion

Once configured, Open WebUI can be accessed at http://localhost:3000, while Ollama operates at http://localhost:11434. This setup provides a seamless and GPU-accelerated environment for running and managing LLMs locally on NVIDIA Jetson devices.

This guide showcases the power and versatility of NVIDIA Jetson devices when paired with Ollama and Open WebUI, enabling advanced AI workloads at the edge with ease and efficiency.