Table of Contents

You’ve probably heard about some of the latest open-source Large Language Models (LLMs) like Llama3.1, Gemma 2, and Mistral. These models are gained attention in the AI community for their powerful capabilities, which you can now easily run and test on your local machine.

Maybe you’re intrigued and want to try one or more of these models on your own machine but are unsure where to start. This is where Ollama steps in!

Why run your LLM locally?

Running open-source models locally instead of relying on cloud-based APIs like OpenAI, Claude, or Gemini offers several key advantages:

Customization: Running models locally gives you complete control over the environment. You can fine-tune models to suit your specific needs, adjust parameters, and even experiment with different configurations that would be impossible or costly with cloud-based solutions.
Reduced Costs: If you already have a capable machine, especially one equipped with a GPU, running LLMs locally can be a cost-effective option. There’s no need to pay for expensive cloud computing resources, and you can experiment freely without worrying about API call limits or escalating costs.
Privacy: When you run models locally, your data stays on your machine. This ensures that sensitive information never leaves your secure environment, providing a level of privacy that cloud-based services simply can’t match. For businesses dealing with confidential data, this can be a crucial advantage.

Why use Ollama to run these models?

Ollama is to LLMs what Docker is for container images. It simplifies the process of running LLM APIs locally from various models. Whether you have a GPU or not, Ollama streamlines everything, so you can focus on interacting with the models instead of wrestling with configurations.

Key benefits of using Ollama include:

Free and Open-Source: Ollama is completely free and open-source, which means you can inspect, modify, and distribute it according to your needs. This openness fosters a community-driven development process, ensuring that the tool is continuously improved and adapted to new use cases. The project can be found here.
Flexible and customizable: Ollama is designed with flexibility in mind. It supports a wide range of LLMs and allows you to customize how these models are run, including setting resource limits, modifying runtime parameters, and integrating with other tools or services.
Offline access: With Ollama, you can run LLMs completely offline. This is particularly useful in environments where internet access is restricted or where you need to ensure that your data does not leave the local network.

Prerequisites

Before we dive into installing and running Ollama, make sure you have the following prerequisites:

Docker: Ensure Docker is installed on your system. Ollama leverages Docker to run models in a contained environment.
Python: While not strictly necessary for running Ollama, Python is recommended if you plan to interact with the models programmatically.

Installing Ollama with Docker

Ollama can be installed in several ways, but we’ll focus on using Docker because it’s simple, flexible, and easy to manage.

Why Install Ollama with Docker?

Ease of Use: Docker allows you to install and run Ollama with a single command. There’s no need to worry about dependencies or conflicting software versions — Docker handles everything within a contained environment.
Flexibility: Docker makes it easy to switch between different versions of Ollama. If a new version is released with features that interest you, updating is as simple as pulling the latest Docker image. This flexibility also makes it easy to test different configurations without affecting your main environment.
Portability: With Docker, you can run Ollama on any machine that supports Docker, regardless of the underlying operating system. This portability means you can easily move your setup between different machines or share it with others, ensuring a consistent environment everywhere.

Installation Steps

Download the required version of Ollama with Docker. There is the Official Docker Image.

docker pull ollama/ollama:0.3.6

Run Ollama server in detach mode with Docker(without GPU)

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:0.3.6

Run Ollama server in detach mode with Docker(with GPU)

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:0.3.6

On Windows

Windows users can create a shortcut command using doskey:

doskey ollama='docker exec -it ollama ollama'

This sets up a command that you can use directly in your command prompt to interact with Ollama.

On Linux/MacOS

Linux users can achieve a similar setup by using an alias:

alias ollama="docker exec -it ollama ollama"

Add this alias to your shell configuration file (e.g., .bashrc or .zshrc) to make it permanent.

Linux

alias ollama="docker exec -it ollama ollama" >> $HOME/.bashrc

MacOS

alias ollama="docker exec -it ollama ollama" >> $HOME/.zshrc

Ollama usage

Once you have the command ollama available, you can check the usage with ollama help.

List locally available models

Let’s use the command ollama list to check if there are available models locally. Normally the first time, you shouldn’t see nothing:

As we can see, there is nothing for now. Then let’s pull model to run.

Running Models Locally with Ollama

Step 1: Pull the Llama2 Model

Pull a model(llama2)

You can find the registered models on the official model registry.

Let’s walk through a practical example running Llama2, a popular open-source LLM, and interact with it.

ollama pull llama2

After the model is pulled you can check if it’s available with ollama list.

Step 2: Run the Llama2 Model

Now that you have Llama2 installed, you can start interacting with it:

ollama run llama2

This command starts the model, allowing you to chat with it directly in your terminal.

Step 3: Interact with Models

Try asking it some questions, such as:

“What is the capital of France?”

“Explain quantum computing in simple terms.”

Feel free to experiment with different prompts and see how Llama2 responds.

Once the model is served, it will be accessible via a local URL. You can check that it’s running by visiting the URL in your browser or making requests to it via Python or another programming language.

For example, you can interact with the model programmatically using Python’s requests library:

from requests import post 

  from json import loads

  response_text = ""

  for chunk in  post(url="http://localhost:11434/api/generate",

                    json={

                             "model": "llama2",

                             "prompt": "What is the capital of France?"

                             }

                     ).text.splitlines():

     response_text += loads(chunk)["response"]

print(response_text)

This script sends a prompt to the model and prints the response. You can integrate this into your projects to leverage the Llama2 model in real time.

Conclusion

Ollama makes it incredibly easy to run and manage open-source LLMs on your local machine, much like Docker does for containerized applications. Whether you’re experimenting with different models, building applications, or just curious about the capabilities of LLMs like Llama2, Ollama provides a streamlined and flexible solution.

In this tutorial, we’ve covered the basics of installing Ollama using Docker and running a model like Llama2. I encourage you to explore other models and see how they can fit into your workflows.

Stay tuned for more tutorials on maximizing the potential of LLMs in your projects!

How to Run Open Source LLMs Locally with Ollama and Docker — Llama3.1, Phi3, Mistral, Gemma2