Ollama, the rapidly growing large language model, has taken the developer world by storm. Its capabilities for generating text, translating languages, and writing different kinds of creative content are truly impressive. But how do you leverage Ollama’s potential effectively within your development workflow? Enter Docker Desktop and Kubernetes – a powerful combination that allows you to seamlessly run Ollama in a containerized environment.
Why Kubernetes and Docker Desktop?
Docker Desktop provides a user-friendly platform for building and running containerized applications. Ollama, packaged as a Docker image, fits perfectly into this ecosystem. Kubernetes, on the other hand, orchestrates container deployment and management, ensuring efficient resource allocation and scalability for your Ollama instance.
Setting the Stage
- Install Docker Desktop: Download and install Docker Desktop on your machine. This provides the foundation for building and running containerized applications.
- Pull the Ollama Image: Use the docker pull command to fetch the official Ollama image from Docker Hub. This image contains all the necessary libraries and dependencies for running Ollama.
- Create a Kubernetes Pod: Define a Kubernetes pod YAML file specifying the Ollama image, resource requirements, and any desired configurations. This file instructs Kubernetes on how to deploy and manage the Ollama container.
- Deploy the Pod: Use the kubectl apply command to deploy the pod based on your YAML definition. Kubernetes will then create and manage the Ollama container, ensuring it has the necessary resources to function effectively.
Benefits of this approach:
- Isolation and Scalability: Running Ollama in a container isolates it from your system’s environment, preventing conflicts and ensuring a clean execution. Additionally, Kubernetes allows you to easily scale your Ollama deployment by adding more pods, catering to increased workload demands.
- Resource Management: Kubernetes effectively manages resource allocation for the Ollama container, preventing it from hogging system resources and impacting other applications.
- Portability and Collaboration: Docker containers and Kubernetes deployments are inherently portable. Share your Ollama deployment configuration with your team, allowing them to easily run it on their own Docker Desktop and Kubernetes environment, fostering seamless collaboration.
Getting Started
- Install Docker Desktop
- Enable Kubernetes
Ensure that a single node Kubernetes cluster is up and running by running the following command:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
docker-desktop Ready control-plane 57s v1.29.1
Open up a terminal, copy the below content in a file called ollama.yaml and save it anywhere in your system.
apiVersion: v1
kind: Pod
metadata:
name: ollama-pod
spec:
containers:
- name: ollama
image: ollama/ollama:latest # Replace with desired Ollama image tag
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
ports:
- containerPort: 11434
restartPolicy: Always
If you’re new to Kubernetes YAML, this section might be useful:
- apiVersion: Specifies the Kubernetes API version used (v1 in this case).
- kind: Indicates the type of object being defined (Pod in this case).
- metadata: Defines metadata about the Pod, including its name (ollama-pod).
- spec: Defines the Pod’s configuration, including: containers: An array of container definitions. name: Name of the container (ollama).
- image: The Docker image to use (ollama/ollama:latest). Replace with the desired Ollama image tag. resources: Defines resource requests and limits for the container:
- requests: Minimum guaranteed resources for the container.
- limits: Maximum resources the container can use.
- memory: Memory request and limit (e.g., 2Gi, 4Gi).
- cpu: CPU request and limit (e.g., 1, 2). ports: Exposes the container’s port (11434 by default).
- restartPolicy: Defines how the Pod should be restarted in case of failure (Always in this case).
Bringing up the Pod
kubectl apply -f ollama.yaml
kubectl describe po
Name: ollama-pod
Namespace: default
Priority: 0
Service Account: default
Node: docker-desktop/192.168.65.3
Start Time: Wed, 21 Feb 2024 10:01:34 +0530
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.1.0.6
IPs:
IP: 10.1.0.6
Containers:
ollama:
Container ID: docker://e04e664eea3123151f6f90806951d101826a3689000f27fabeab2c53de36e977
Image: ollama/ollama:latest
Image ID: docker-pullable://ollama/ollama@sha256:2bb3fa14517aff428033cce369a2cac3baf9215fed5b401f87e30b52e39ae124
Port: 11434/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 21 Feb 2024 10:01:37 +0530
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 1
memory: 2Gi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6l4gz (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-6l4gz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11s default-scheduler Successfully assigned default/ollama-pod to docker-desktop
Normal Pulling 10s kubelet Pulling image "ollama/ollama:latest"
Normal Pulled 8s kubelet Successfully pulled image "ollama/ollama:latest" in 2.082s (2.082s including waiting)
Normal Created 8s kubelet Created container ollama
Normal Started 8s kubelet Started container ollama
Running Ollama WebUI in a Kubernetes Pod
apiVersion: v1
kind: Service
metadata:
name: ollama-service
spec:
selector:
app: ollama
ports:
- protocol: TCP
port: 11434
targetPort: 11434
---
apiVersion: v1
kind: Pod
metadata:
name: ollama-pod
spec:
containers:
- name: ollama
image: ollama/ollama:latest # Replace with desired Ollama image tag
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
ports:
- containerPort: 11434
restartPolicy: Always
---
apiVersion: v1
kind: Deployment
metadata:
name: open-webui
spec:
replicas: 1
selector:
matchLabels:
app: open-webui
template:
metadata:
labels:
app: open-webui
spec:
containers:
- name: open-webui
image: ghcr.io/open-webui/open-webui:main
env:
- name: OLLAMA_API_BASE_URL
value: http://ollama-service:11434/api # Replace with Ollama service name or URL
- name: WEBUI_SECRET_KEY
valueFrom:
secretKeyRef:
name: open-webui-secret
key: web-secret
volumeMounts:
- name: open-webui-data
mountPath: /app/backend/data
volumes:
- name: open-webui-data
persistentVolumeClaim:
claimName: open-webui-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: open-webui-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
kind: Secret
apiVersion: v1
metadata:
name: open-webui-secret
stringData:
web-secret: "" # your actual secret value
Conclusion
This blog post has explored how to leverage Docker Desktop and Kubernetes to effectively run Ollama within a containerized environment. By combining these powerful tools, you gain several advantages:
- Isolation and Scalability: Ollama runs in a dedicated container, preventing conflicts with your system and enabling easy scaling to meet increased demands.
- Resource Management: Kubernetes efficiently allocates resources to the Ollama container, ensuring optimal performance without impacting other applications.
- Portability and Collaboration: Docker containers and Kubernetes deployments are inherently portable, allowing seamless sharing of your Ollama setup with your team.