Running NVIDIA Docker in the GPU-Accelerated Data Center

Table of Contents

Docker is the leading container platform which provides both hardware and software encapsulation by allowing multiple containers to run on the same system at the same time each with their own set of resources (CPU, memory, etc) and their own dedicated set of dependencies (library version, environment variables, etc.). Docker can now be used to containerize GPU-accelerated applications. In case you’re new to GPU-accelerated computing, it is basically the use of graphics processing unit to accelerates high performance computing workloads and applications. This means you can easily containerize and isolate accelerated application without any modifications and deploy it on any supported GPU-enabled infrastructure.

Docker does not natively support NVIDIA GPUs within containers. Though there are available workaround like fully installing the NVIDIA drivers inside the container and map in the character devices corresponding to the NVIDIA GPUs (e.g. /dev/nvidia0) on launch but still it is not recommended.

Here comes nvidia-docker plugin for a rescue…

The nvidia-docker is an open source project hosted on GITHUB and it provides driver-agnostic CUDA images & docker command line wrapper that mounts the user mode components of the driver and the GPUs (character devices) into the container at launch. With this enablement, the NVIDIA Docker plugin enables deployment of GPU-accelerated applications across any Linux GPU server with NVIDIA Docker support. What does this mean? – Using Docker, we can develop and prototype GPU applications on a workstation, and then ship and run those applications anywhere that supports GPU containers. Earlier this year, the nvidia-docker 1.0.1 release announced the support for Docker 17.03 Community & Enterprise Edition both.

Some of the key notable benefits includes –

Legacy accelerated compute apps can be containerized and deployed on newer systems, on premise, or in the cloud.
Ease of Deployment
Isolation of Resource
Bare Metal Performance
Facilitate Collaboration
Run access heterogeneous CUDA toolkit environments (sharing the host driver)
Specific GPU resources can be allocated to container for better isolation and performance.
You can easily share, collaborate, and test applications across different environments.
Portable and reproducible builds

~source: Nvidia

Let’s talk about libnvidia-container a bit..

libnvidia is NVIDIA container runtime library. The repository provides a library and a simple CLI utility to automatically configure GNU/Linux containers leveraging NVIDIA hardware.The implementation relies on kernel primitives and is designed to be agnostic of the container runtime. Basic features includes –

Integrates with the container internals
Agnostic of the container runtime
Drop-in GPU support for runtime developers
Better stability, follows driver releases
Brings features seamlessly (Graphics, Display, Exclusive mode, VM, etc.)

~ source: NVIDIA

Under this blog post, I will show you how to get started with nvidia-docker to interact with NVIDIA GPU system and then look at few of interesting applications which can be build for GPU-accelerated data center. Let us get started –

Infrastructure Setup:

Docker Version: 17.06

OS: Ubuntu 16.04 LTS

Environment : Manager Server Instance with GPU

GPU: GeForce GTX 1080 Graphics card

Verify that GPU card is equipped in your hardware:

Install nvidia-docker & nvidia-docker-plugin under Ubuntu 16.04 using wget as shown below:

[simterm]

$sudo wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
$sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb

[/simterm]

Initializing nvidia-docker service:

[simterm]

ajit@Ubuntu-1604-xenial-64-minimal:~$ systemctl status nvidia-docker

nvidia-docker.service – NVIDIA Docker

plugin Loaded: loaded (/lib/systemd/system/nvidia-docker.service; enabled; vendor preset: enabled)

Active: active (running) since Sun 2017-08-20 10:52:43 CEST; 6 days ago

Docs: https://github.com/NVIDIA/nvidia-docker/wiki Main

PID: 19921 (nvidia-docker-p)

Tasks: 13

Memory: 12.3M

CPU: 5.046s

CGroup: /system.slice/nvidia-docker.service

└─19921 /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker

[/simterm]

Whenever nvidia-docker is installed, it creates a Docker volume and mounts the devices into a docker container automatically.

Did you know?

It is possible to avoid replying on nvidia-wrapper to launch GPU containers using ONLY docker and that can be done by using the REST API directly as shown below:

[simterm]

docker run -ti --rm `curl -s http://localhost:3476/docker/cli` nvidia/cuda nvidia-smi

[/simterm]

NVIDIA’s System Management Interface

If you want to know the status of your NVIDIA GPU, then nvidia-smi is the handy command which can be run using nvidia-cuda container. This is generally useful when you’re having trouble getting your NVIDIA GPUs to run GPGPU code.

[simterm]

$nvidia-docker run –rm nvidia/cuda nvidia-smi

[/simterm]

Listing all NVIDIA Devices:

[simterm]

$nvidia-docker run –rm nvidia/cuda nvidia-smi -L

GPU 0: GeForce GTX 1080 (UUID: GPU-70ecf884-c4fb-159b-a67e-26b4ce96681d)

[/simterm]

Listing all available data on the particular GPU:

[simterm]

$ nvidia-docker run –rm nvidia/cuda nvidia-smi -i 0 -q

[/simterm]

Listing details for each GPU:

[simterm]

$nvidia-docker run –rm nvidia/cuda nvidia-smi –query-gpu=index,name,uuid,serial –format=csv

index, name, uuid, serial0, GeForce GTX 1080, GPU-70ecf884-c4fb-159b-a67e-26b4ce96681d, [Not Supported]

[/simterm]

Listing the available clock speeds:

[simterm]

$nvidia-docker run –rm nvidia/cuda nvidia-smi -q -d SUPPORTED_CLOCKS

[/simterm]

Building & Testing NVIDIA-Docker Images

If you look at samples/ folder under the nvidia-docker repository , there are couple of images that can be used to quickly test nvidia-docker on your machine. Unfortunately, the samples are not available on the Docker Hub, hence you will need to build the images locally. I have built few of them which I am going to showcase:

[simterm]

$cd /nvidia-docker/samples/ubuntu-16.04/deviceQuery/

$docker build -t ajeetraina/nvidia-devicequery .

[/simterm]

Running the DeviceQuery container

You can leverage ajeetraina/nvidia-devicequery container directly as shown below:

Listing the current GPU clock speed, default clock speed & maximum possible clock speed:

[simterm]

$ nvidia-docker run –rm nvidia/cuda nvidia-smi -q -d CLOCK

[/simterm]

Retrieving the System Topology:

The topology refers to how the PCI-Express devices (GPUs, InfiniBand HCAs, storage controllers, etc.) connect to each other and to the system’s CPUs. This can be retrieved as follow:

A Quick Look at NVIDIA Deep Learning..

The NVIDIA Deep Learning GPU Training System, a.k.a DIGITS is a webapp for training deep learning models. It puts the power of deep learning into the hands of engineers & data scientists. It can be used to rapidly train the highly accurate deep neural network (DNNs) for image classification, segmentation and object detection tasks.The currently supported frameworks are: Caffe, Torch, and Tensorflow.

DIGITS simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real time with advanced visualizations, and selecting the best performing model from the results browser for deployment. DIGITS is completely interactive so that data scientists can focus on designing and training networks rather than programming and debugging.

To test-drive DIGITS, you can get it up and running in a single Docker container:

[simterm]

ajit@Ubuntu-1604-xenial-64-minimal:~$ NV_GPU=0 nvidia-docker run –name digits -d -p 5000:5000 nvidia/digits

f0e5d1f78b810037a039b34420ee4848e5809effc1c73752eb5d0ced89b1835f

[/simterm]

In the above command, NV_GPU is a method of assigning GPU resources to a container which is critical for leveraging DOCKER in a Multi GPU System. This passes GPU ID 0 from the host system to the container as resources. Note that if you passed GPU ID 2,3 for example, the container would still see the GPUs as ID 0,1 inside the container, with the PCI ID of 2,3 from the host system. As I have a single GPU card, I just passed it as NV_GPU=0.

You can open up web browser and verify if its running on the below address:

[simterm]

$w3m http://<dockerhostip>:5000

[/simterm]

The below is the snippet from my w3m text browser:

How about Docker Compose? Is it supported?

Yes, of course.

Let us see how Docker compose works for nvidia-docker.

First we need to figure out the nvidia driver version

As shown above, the nvidia driver version displays 375.66.

Create a docker volume that uses the nvidia-docker plugin.

[simterm]

$docker volume create –name=nvidia_driver_375.66 -d nvidia-dockernvidia_driver_375.66

[/simterm]

Verify it with the below command:

[simterm]

$sudo docker volume ls

DRIVER VOLUME NAMElocal 15dd59ba1017ca5b822473fb3ed8a341b4e29e59f3725565c00810ddd5619878local

…

local nvidia_driver_375.66

[/simterm]

Now let us look at docker-compose YAML file shown below:

If you have ever worked with docker-compose, you can easily understand what each line specifies. I specified /dev/nvidia0 as I had a single GPU card, capture the correct volume driver name which we specified in the last step.

Just initiate the docker-compose as shown below:

[simterm]

$docker-compose up

[/simterm]