Large Language Models (LLMs) are revolutionizing the way we interact with machines. Their ever-growing complexity demands ever-increasing processing power. This is where accelerators like GPUs come into play, offering a significant boost for training and inference tasks.
The good news? Ollama, a popular self-hosted large language model server, now joins the party with official support for AMD GPUs through ROCm! This blog dives into how to leverage this exciting new development, even if your Ollama server resides within a Kubernetes cluster.
Ollama Meets AMD GPUs
A Match Made in Compute Heaven. Ollama’s integration with ROCm allows you to utilize the raw power of your AMD graphics card for running LLMs. This translates to faster training times and smoother inference experiences. But wait, there’s more!
Benefits of AMD + ROCm for Ollama:
- Cost-effective performance: AMD GPUs offer exceptional value for money, making them a great choice for budget-conscious LLM enthusiasts.
- Open-source advantage: ROCm, the open-source platform powering AMD’s GPU ecosystem, fosters a collaborative environment and continuous development.
Setting Up Ollama with AMD and ROCm on Kubernetes
Here’s how to deploy Ollama with ROCm support on your Kubernetes cluster:
- Install the ROCm Kubernetes Device Plugin:
This plugin facilitates communication between Ollama and your AMD GPU. Follow the official guide at https://github.com/ROCm/k8s-device-plugin/blob/master/README.md for installation instructions.
2. Deploy Ollama with ROCm Support (using Kubernetes YAML):
The YAML configuration you provided offers a solid template:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama-rocm
spec:
replicas: 1
selector:
matchLabels:
app: ollama-rocm
template:
metadata:
labels:
app: ollama-rocm
spec:
containers:
- name: ollama
image: ollama/ollama:rocm
ports:
- containerPort: 11434
name: ollama
volumeMounts:
- name: ollama-data
mountPath: /root/.ollama
resources:
requests:
memory: "32Gi"
cpu: "64"
limits:
memory: "100Gi"
cpu: "64"
amd.com/gpu: 1
volumes:
- name: ollama-data
hostPath:
path: /var/lib/ollama/.ollama
type: DirectoryOrCreate
---
apiVersion: v1
kind: Service
metadata:
name: ollama-service-rocm
spec:
selector:
app: ollama-rocm
ports:
- protocol: TCP
port: 11434
targetPort: 11434
name: ollama
Key points to note:
- The ollama/ollama:rocm image ensures you’re using the ROCm-compatible version of Ollama.
- The amd.com/gpu: 1 resource request signifies your desire to utilize one AMD GPU for Ollama.
- Exposing Ollama Services:
The provided Service definition exposes Ollama’s port (11434) for external access.
Unleash the Power of Your AMD GPU with Ollama!
With Ollama and ROCm working in tandem on your AMD-powered Kubernetes cluster, you’re well-equipped to tackle demanding LLM tasks. Remember to consult Ollama’s official documentation for detailed instructions and troubleshooting. Happy experimenting!