Join our Discord Server
Ajeet Raina Ajeet Singh Raina is a former Docker Captain, Community Leader and Arm Ambassador. He is a founder of Collabnix blogging site and has authored more than 570+ blogs on Docker, Kubernetes and Cloud-Native Technology. He runs a community Slack of 8900+ members and discord server close to 2200+ members. You can follow him on Twitter(@ajeetsraina).

Ollama Meets AMD GPUs: Leveraging ROCm for Faster LLM Training

1 min read

Large Language Models (LLMs) are revolutionizing the way we interact with machines. Their ever-growing complexity demands ever-increasing processing power. This is where accelerators like GPUs come into play, offering a significant boost for training and inference tasks.

The good news? Ollama, a popular self-hosted large language model server, now joins the party with official support for AMD GPUs through ROCm! This blog dives into how to leverage this exciting new development, even if your Ollama server resides within a Kubernetes cluster.

Ollama Meets AMD GPUs

A Match Made in Compute Heaven. Ollama’s integration with ROCm allows you to utilize the raw power of your AMD graphics card for running LLMs. This translates to faster training times and smoother inference experiences. But wait, there’s more!

Benefits of AMD + ROCm for Ollama:

  • Cost-effective performance: AMD GPUs offer exceptional value for money, making them a great choice for budget-conscious LLM enthusiasts.
  • Open-source advantage: ROCm, the open-source platform powering AMD’s GPU ecosystem, fosters a collaborative environment and continuous development.

Setting Up Ollama with AMD and ROCm on Kubernetes

Here’s how to deploy Ollama with ROCm support on your Kubernetes cluster:

  1. Install the ROCm Kubernetes Device Plugin:

    This plugin facilitates communication between Ollama and your AMD GPU. Follow the official guide at https://github.com/ROCm/k8s-device-plugin/blob/master/README.md for installation instructions.

    2. Deploy Ollama with ROCm Support (using Kubernetes YAML):

      The YAML configuration you provided offers a solid template:

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: ollama-rocm
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: ollama-rocm
        template:
          metadata:
            labels:
              app: ollama-rocm
          spec:
            containers:
            - name: ollama
              image: ollama/ollama:rocm
              ports:
              - containerPort: 11434
                name: ollama
              volumeMounts:
              - name: ollama-data
                mountPath: /root/.ollama
              resources:
                requests:
                  memory: "32Gi"
                  cpu: "64"
                limits:
                  memory: "100Gi"
                  cpu: "64"
                  amd.com/gpu: 1
            volumes:
            - name: ollama-data
              hostPath:
                path: /var/lib/ollama/.ollama
                type: DirectoryOrCreate
      ---
      apiVersion: v1
      kind: Service
      metadata:
        name: ollama-service-rocm
      spec:
        selector:
          app: ollama-rocm
        ports:
        - protocol: TCP
          port: 11434
          targetPort: 11434
          name: ollama
      

      Key points to note:

      1. The ollama/ollama:rocm image ensures you’re using the ROCm-compatible version of Ollama.
      2. The amd.com/gpu: 1 resource request signifies your desire to utilize one AMD GPU for Ollama.
      3. Exposing Ollama Services:

      The provided Service definition exposes Ollama’s port (11434) for external access.

      Unleash the Power of Your AMD GPU with Ollama!

      With Ollama and ROCm working in tandem on your AMD-powered Kubernetes cluster, you’re well-equipped to tackle demanding LLM tasks. Remember to consult Ollama’s official documentation for detailed instructions and troubleshooting. Happy experimenting!

      Have Queries? Join https://launchpass.com/collabnix

      Ajeet Raina Ajeet Singh Raina is a former Docker Captain, Community Leader and Arm Ambassador. He is a founder of Collabnix blogging site and has authored more than 570+ blogs on Docker, Kubernetes and Cloud-Native Technology. He runs a community Slack of 8900+ members and discord server close to 2200+ members. You can follow him on Twitter(@ajeetsraina).
      Join our Discord Server
      Index