New Docker CLI API Support for NVIDIA GPUs under Docker Engine 19.03.0 Pre-Release

Estimated Reading Time: 7 minutes

Let’s talk about Docker in a GPU-Accelerated Data Center…

Docker is the leading container platform which provides both hardware and software encapsulation by allowing multiple containers to run on the same system at the same time each with their own set of resources (CPU, memory, etc) and their own dedicated set of dependencies (library version, environment variables, etc.). Docker  can now be used to containerize GPU-accelerated applications. In case you’re new to GPU-accelerated computing, it is basically the use of graphics processing unit  to accelerates high performance computing workloads and applications. This means you can easily containerize and isolate accelerated application without any modifications and deploy it on any supported GPU-enabled infrastructure.

Yes, you heard it right. Today Docker does natively support NVIDIA GPUs within containers. This is possible with the latest Docker 19.03.0 Beta 3 Release which is the latest pre-release and is available for download here. With this release, Docker  can now be flawlessly be used to containerize GPU-accelerated applications.

Let’s go back to 2017…

2 year back, I wrote a blog post titled “Running NVIDIA Docker in a GPU Accelerated Data center”. The nvidia-docker is an open source project hosted on GITHUB and it provides driver-agnostic CUDA images  & docker command line wrapper that mounts the user mode components of the driver and the GPUs (character devices) into the container at launch. With this enablement, the NVIDIA Docker plugin enabled deployment of GPU-accelerated applications across any Linux GPU server with NVIDIA Docker support. Under the same blog post, I showcased how to get started with nvidia-docker to interact with NVIDIA GPU system and then look at few of interesting applications which can be build for GPU-accelerated data center.

With the recent 19.03.0 Beta Release, now you don’t need to spend time in downloading the NVIDIA-DOCKER plugin and rely on nvidia-wrapper to launch GPU containers. All you can now use –gpus option with docker run CLI to allow containers to use GPU devices seamlessly.

Under this blog post, I will showcase how to get started with this new CLI API for NVIDIA GPU.

Prerequisite:

  • Ubuntu 18.04 instance running on Google Cloud Platform
  • Verify that NVIDIA card is detected
$ lspci -vv | grep -i nvidia
00:04.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
        Subsystem: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB]
        Kernel modules: nvidiafb

Installing NVIDIA drivers first

$ apt-get install ubuntu-drivers-common \
	&& sudo ubuntu-drivers autoinstall
- Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.18.0-1009-gcp/updates/dkms/

nvidia-uvm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.18.0-1009-gcp/updates/dkms/

depmod...

DKMS: install completed.
Setting up xserver-xorg-video-nvidia-390 (390.116-0ubuntu0.18.10.1) ...
Processing triggers for libc-bin (2.28-0ubuntu1) ...
Processing triggers for systemd (239-7ubuntu10.13) ...
Setting up nvidia-driver-390 (390.116-0ubuntu0.18.10.1) ...
Setting up adwaita-icon-theme (3.30.0-0ubuntu1) ...
update-alternatives: using /usr/share/icons/Adwaita/cursor.theme to provide /usr/share/icons/default/index.theme (x-cursor-theme) in auto mode
Setting up humanity-icon-theme (0.6.15) ...
Setting up libgtk-3-0:amd64 (3.24.4-0ubuntu1.1) ...
Setting up libgtk-3-bin (3.24.4-0ubuntu1.1) ...
Setting up policykit-1-gnome (0.105-6ubuntu2) ...
Setting up screen-resolution-extra (0.17.3build1) ...
Setting up ubuntu-mono (16.10+18.10.20181005-0ubuntu1) ...
Setting up nvidia-settings (390.77-0ubuntu1) ...
Processing triggers for libgdk-pixbuf2.0-0:amd64 (2.38.0+dfsg-6) ...
Processing triggers for initramfs-tools (0.131ubuntu15.1) ...
update-initramfs: Generating /boot/initrd.img-4.18.0-1009-gcp
cryptsetup: WARNING: The initramfs image may not contain cryptsetup binaries 
    nor crypto modules. If that's on purpose, you may want to uninstall the 
    'cryptsetup-initramfs' package in order to disable the cryptsetup initramfs 
    integration and avoid this warning.
Processing triggers for libc-bin (2.28-0ubuntu1) ...
Processing triggers for dbus (1.12.10-1ubuntu2) ...

Go ahead and reboot the system

$ reboot

Follow the below steps once the Ubuntu instance comes back.

Installing NVIDIA Container Runtime

Create a file named nvidia-container-runtime-script.sh and save it

$ cat nvidia-container-runtime-script.sh
 
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update

Execute the script

sh nvidia-container-runtime-script.sh

OK
deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
Hit:1 http://archive.canonical.com/ubuntu bionic InRelease
Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  InRelease [1139 B]                
Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  InRelease [1136 B]           
Hit:4 http://security.ubuntu.com/ubuntu bionic-security InRelease                                       
Get:5 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  Packages [4076 B]                 
Get:6 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  Packages [3084 B]            
Hit:7 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic InRelease
Hit:8 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:9 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-backports InRelease
Fetched 9435 B in 1s (17.8 kB/s)                   
Reading package lists... Done
$ apt-get install nvidia-container-runtime
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  grub-pc-bin libnuma1
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
Get:1 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  libnvidia-container1 1.0.2-1 [59.1 kB]
Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64  libnvidia-container-tools 1.0.2-1 [15.4 kB]
Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64  nvidia-container-runtime-hook 1.4.0-1 [575 kB]

...
Unpacking nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
Setting up libnvidia-container1:amd64 (1.0.2-1) ...
Setting up libnvidia-container-tools (1.0.2-1) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Setting up nvidia-container-runtime-hook (1.4.0-1) ...
Setting up nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
which nvidia-container-runtime-hook
/usr/bin/nvidia-container-runtime-hook

Installing Docker 19.03 Beta 3 Test Build

curl -fsSL https://test.docker.com -o test-docker.sh 

Execute the script

sh test-docker.sh

Verifying Docker Installation

$ docker version
Client:
 Version:           19.03.0-beta3
 API version:       1.40
 Go version:        go1.12.4
 Git commit:        c55e026
 Built:             Thu Apr 25 02:58:59 2019
 OS/Arch:           linux/amd64
 Experimental:      false
Server:
 Engine:
  Version:          19.03.0-beta3
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.4
  Git commit:       c55e026
  Built:            Thu Apr 25 02:57:32 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.5
  GitCommit:        bb71b10fd8f58240ca47fbb579b9d1028eea7c84
 runc:
  Version:          1.0.0-rc6+dev
  GitCommit:        2b18fe1d885ee5083ef9f0838fee39b62d653e30
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Verifying –gpus option under docker run

$ docker run --help | grep -i gpus
      --gpus gpu-request               GPU devices to add to the container ('all' to pass all GPUs)

Running a Ubuntu container which leverages GPUs

 $ docker run -it --rm --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
f476d66f5408: Pull complete 
8882c27f669e: Pull complete 
d9af21273955: Pull complete 
f5029279ec12: Pull complete 
Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981
Status: Downloaded newer image for ubuntu:latest
Tue May  7 15:52:15 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
:~$ 

Troubleshooting:

Did you encounter the below error message:

$ docker run -it --rm --gpus all debian
docker: Error response from daemon: linux runtime spec devices: could not select device driver "" with capabilities: [[gpu]].

The above error means that Nvidia could not properly register with Docker. What it actually mean is the drivers are not properly installed on the host. This could also mean the nvidia container tools were installed without restarting the docker daemon: you need to restart the docker daemon.

I suggest you to go back and verify if nvidia-container-runtime is installed or not OR restart the Docker daemon.

Listing out GPU devices

$ docker run -it --rm --gpus all ubuntu nvidia-smi -L
GPU 0: Tesla P4 (UUID: GPU-fa974b1d-3c17-ed92-28d0-805c6d089601)
$ docker run -it --rm --gpus all ubuntu nvidia-smi  --query-gpu=index,name,uui
d,serial --format=csv
index, name, uuid, serial
0, Tesla P4, GPU-fa974b1d-3c17-ed92-28d0-805c6d089601, 0325017070224

A Quick Look at NVIDIA Deep Learning..

The NVIDIA Deep Learning GPU Training System, a.k.a DIGITS is a webapp for training deep learning models. It puts  the power of deep learning into the hands of engineers & data scientists. It can be used to rapidly train the highly accurate deep neural network (DNNs) for image classification, segmentation and object detection tasks.The currently supported frameworks are: Caffe, Torch, and Tensorflow.

DIGITS simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real time with advanced visualizations, and selecting the best performing model from the results browser for deployment. DIGITS is completely interactive so that data scientists can focus on designing and training networks rather than programming and debugging.

To test-drive DIGITS, you can get it up and running in a single Docker container:

$docker run -itd --gpus all -p 5000:5000 nvidia/digits

You can open up web browser and verify if its running on the below address:

w3m http://<dockerhostip>:5000

Verifying again with nvidia-smi

$ docker run -it --rm --gpus all ubuntu nvidia-smi
Tue May  7 16:27:37 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   51C    P0    24W /  75W |    129MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

If you want to see it in action, here’s a quick video

Can I use Docker compose for NVIDIA GPU?

Not Yet. I have raised a feature request under this link few minutes back.

Special Thanks to Tibor Vass, Docker Staff Engineer for reviewing this blog.

Clap

(4)