Running Ollama 2 on NVIDIA Jetson Nano with GPU using Docker

Table of Contents

Ollama is a rapidly growing development tool, with 10,000 Docker Hub pulls in a short period of time. It is a large language model (LLM) from Google AI that is trained on a massive dataset of text and code. It can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

To run OLLAMA on a Jetson Nano, you will need to install the following software:

Docker Engine
OLLAMA Docker image
Jetson Nano 4GB

Hardware

Jetson Nano
A 5V 4Ampere Charger
64GB SD card

Software

Jetson SD card image from https://developer.nvidia.com/embedded/downloads
Etcher software installed on your system

Preparing Your Jetson Nano

1. Preparing Your Raspberry Pi Flashing Jetson SD Card Image

Unzip the SD card image
Insert SD card into your system.
Bring up Etcher tool and select the target SD card to which you want to flash the image.

2. Verifying if it is shipped with Docker Binaries

ajeetraina@ajeetraina-desktop:~$ sudo docker version

3. Checking Docker runtime

Starting with JetPack 4.2, NVIDIA has introduced a container runtime with Docker integration. This custom runtime enables Docker containers to access the underlying GPUs available in the Jetson family.

pico@pico1:/tmp/docker-build$ sudo nvidia-docker version
NVIDIA Docker: 2.0.3
Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Fri Feb 28 23:47:53 2020
 OS/Arch:           linux/arm64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Wed Feb 19 01:06:16 2020
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.2
  GitCommit:        
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:

Setting up Docker

Jetson Nano comes with Docker installed by default. To install the latest version of Docker on a Jetson Nano, follow these steps:

Update the package list:

sudo apt update

Install Docker:

sudo curl -sSL https://get.docker.com/ | sh

Add your user to the Docker group:

sudo groupadd docker
sudo usermod -aG docker $USER

Log out and back in for the changes to take effect.

Install with Apt

Configure the repository

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
    | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
    | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
    | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update

Install the NVIDIA Container Toolkit packages

sudo apt-get install -y nvidia-container-toolkit

Configure Docker to use Nvidia driver

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Start the container

sudo docker run -d --gpus=all --runtime=nvidia -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Run model locally

Now you can run a model:

sudo docker exec -it ollama ollama run llama2

sudo docker exec -it ollama ollama run llama2
pulling manifest
pulling 8daa9615cce3...   7% |█                     | (280 MB/3.8 GB, 4.4 MB/s) [1m15s:13m13s]

The command sudo docker exec -it ollama ollama run llama2 will start the OLLAMA 2 model in the ollama container. This will allow you to interact with the model directly from the command line.

To use the OLLAMA 2 model, you can send it text prompts and it will generate text in response. For example, to generate a poem about a cat, you would run the following command:

docker exec -it ollama ollama run llama2 "Write a poem about a cat."

This will generate a poem about a cat and print it to the console. You can also use the OLLAMA 2 model to translate languages, write different kinds of creative content, and answer your questions in an informative way.

Experiment with different prompts to test the capabilities of the OLLAMA 2 model.

Here are some examples of prompts you can use with the OLLAMA 2 model:

Translate the sentence "Hello, world!" into Spanish.
Write a short story about a robot who falls in love with a human.
Generate a list of ideas for new products.
Answer the question "What is the meaning of life?"

Models from the Ollama library can be customized with a prompt. The example

ollama pull llama2

Create a Modelfile

FROM llama2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system prompt
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Create and run the model

ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.

Conclusion

The OLLAMA 2 model is still under development, but it has the potential to be a powerful tool for a variety of tasks.