Let’s talk about Docker in a GPU-Accelerated Data Center…
Docker is the leading container platform which provides both hardware and software encapsulation by allowing multiple containers to run on the same system at the same time each with their own set of resources (CPU, memory, etc) and their own dedicated set of dependencies (library version, environment variables, etc.). Docker can now be used to containerize GPU-accelerated applications. In case you’re new to GPU-accelerated computing, it is basically the use of graphics processing unit to accelerates high performance computing workloads and applications. This means you can easily containerize and isolate accelerated application without any modifications and deploy it on any supported GPU-enabled infrastructure.
Yes, you heard it right. Today Docker does natively support NVIDIA GPUs within containers. This is possible with the latest Docker 19.03.0 Beta 3 Release which is the latest pre-release and is available for download here. With this release, Docker can now be flawlessly be used to containerize GPU-accelerated applications.
Let’s go back to 2017…
2 year back, I wrote a blog post titled “Running NVIDIA Docker in a GPU Accelerated Data center”. The nvidia-docker is an open source project hosted on GITHUB and it provides driver-agnostic CUDA images & docker command line wrapper that mounts the user mode components of the driver and the GPUs (character devices) into the container at launch. With this enablement, the NVIDIA Docker plugin enabled deployment of GPU-accelerated applications across any Linux GPU server with NVIDIA Docker support. Under the same blog post, I showcased how to get started with nvidia-docker to interact with NVIDIA GPU system and then look at few of interesting applications which can be build for GPU-accelerated data center.
With the recent 19.03.0 Beta Release, now you don’t need to spend time in downloading the NVIDIA-DOCKER plugin and rely on nvidia-wrapper to launch GPU containers. All you can now use –gpus option with
CLI to allow containers to use GPU devices seamlessly.docker run
Under this blog post, I will showcase how to get started with this new CLI API for NVIDIA GPU.
Prerequisite:
- Ubuntu 18.04 instance running on Google Cloud Platform
- Verify that NVIDIA card is detected
$ lspci -vv | grep -i nvidia
00:04.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
Subsystem: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB]
Kernel modules: nvidiafb
Installing NVIDIA drivers first
$ apt-get install ubuntu-drivers-common \
&& sudo ubuntu-drivers autoinstall
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.18.0-1009-gcp/updates/dkms/
nvidia-uvm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/4.18.0-1009-gcp/updates/dkms/
depmod...
DKMS: install completed.
Setting up xserver-xorg-video-nvidia-390 (390.116-0ubuntu0.18.10.1) ...
Processing triggers for libc-bin (2.28-0ubuntu1) ...
Processing triggers for systemd (239-7ubuntu10.13) ...
Setting up nvidia-driver-390 (390.116-0ubuntu0.18.10.1) ...
Setting up adwaita-icon-theme (3.30.0-0ubuntu1) ...
update-alternatives: using /usr/share/icons/Adwaita/cursor.theme to provide /usr/share/icons/default/index.theme (x-cursor-theme) in auto mode
Setting up humanity-icon-theme (0.6.15) ...
Setting up libgtk-3-0:amd64 (3.24.4-0ubuntu1.1) ...
Setting up libgtk-3-bin (3.24.4-0ubuntu1.1) ...
Setting up policykit-1-gnome (0.105-6ubuntu2) ...
Setting up screen-resolution-extra (0.17.3build1) ...
Setting up ubuntu-mono (16.10+18.10.20181005-0ubuntu1) ...
Setting up nvidia-settings (390.77-0ubuntu1) ...
Processing triggers for libgdk-pixbuf2.0-0:amd64 (2.38.0+dfsg-6) ...
Processing triggers for initramfs-tools (0.131ubuntu15.1) ...
update-initramfs: Generating /boot/initrd.img-4.18.0-1009-gcp
cryptsetup: WARNING: The initramfs image may not contain cryptsetup binaries
nor crypto modules. If that's on purpose, you may want to uninstall the
'cryptsetup-initramfs' package in order to disable the cryptsetup initramfs
integration and avoid this warning.
Processing triggers for libc-bin (2.28-0ubuntu1) ...
Processing triggers for dbus (1.12.10-1ubuntu2) ...
Go ahead and reboot the system
$ reboot
Follow the below steps once the Ubuntu instance comes back.
Installing NVIDIA Container Runtime
Create a file named nvidia-container-runtime-script.sh and save it
$ cat nvidia-container-runtime-script.sh
curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
Execute the script
sh nvidia-container-runtime-script.sh
OK
deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/$(ARCH) /
Hit:1 http://archive.canonical.com/ubuntu bionic InRelease
Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64 InRelease [1139 B]
Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64 InRelease [1136 B]
Hit:4 http://security.ubuntu.com/ubuntu bionic-security InRelease
Get:5 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64 Packages [4076 B]
Get:6 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64 Packages [3084 B]
Hit:7 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic InRelease
Hit:8 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:9 http://us-east4-c.gce.clouds.archive.ubuntu.com/ubuntu bionic-backports InRelease
Fetched 9435 B in 1s (17.8 kB/s)
Reading package lists... Done
$ apt-get install nvidia-container-runtime
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
grub-pc-bin libnuma1
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
Get:1 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64 libnvidia-container1 1.0.2-1 [59.1 kB]
Get:2 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64 libnvidia-container-tools 1.0.2-1 [15.4 kB]
Get:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64 nvidia-container-runtime-hook 1.4.0-1 [575 kB]
...
Unpacking nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
Setting up libnvidia-container1:amd64 (1.0.2-1) ...
Setting up libnvidia-container-tools (1.0.2-1) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Setting up nvidia-container-runtime-hook (1.4.0-1) ...
Setting up nvidia-container-runtime (2.0.0+docker18.09.6-3) ...
which nvidia-container-runtime-hook
/usr/bin/nvidia-container-runtime-hook
Installing Docker 19.03 Beta 3 Test Build
curl -fsSL https://test.docker.com -o test-docker.sh
Execute the script
sh test-docker.sh
Verifying Docker Installation
$ docker version
Client:
Version: 19.03.0-beta3
API version: 1.40
Go version: go1.12.4
Git commit: c55e026
Built: Thu Apr 25 02:58:59 2019
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.0-beta3
API version: 1.40 (minimum version 1.12)
Go version: go1.12.4
Git commit: c55e026
Built: Thu Apr 25 02:57:32 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.5
GitCommit: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc:
Version: 1.0.0-rc6+dev
GitCommit: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
docker-init:
Version: 0.18.0
GitCommit: fec3683
Verifying –gpus option under docker run
$ docker run --help | grep -i gpus
--gpus gpu-request GPU devices to add to the container ('all' to pass all GPUs)
Running a Ubuntu container which leverages GPUs
$ docker run -it --rm --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
f476d66f5408: Pull complete
8882c27f669e: Pull complete
d9af21273955: Pull complete
f5029279ec12: Pull complete
Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981
Status: Downloaded newer image for ubuntu:latest
Tue May 7 15:52:15 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116 Driver Version: 390.116 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:00:04.0 Off | 0 |
| N/A 39C P0 22W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
:~$
Troubleshooting:
Did you encounter the below error message:
$ docker run -it --rm --gpus all debian
docker: Error response from daemon: linux runtime spec devices: could not select device driver "" with capabilities: [[gpu]].
The above error means that Nvidia could not properly register with Docker. What it actually mean is the drivers are not properly installed on the host. This could also mean the nvidia container tools were installed without restarting the docker daemon: you need to restart the docker daemon.
I suggest you to go back and verify if nvidia-container-runtime is installed or not OR restart the Docker daemon.
Listing out GPU devices
$ docker run -it --rm --gpus all ubuntu nvidia-smi -L
GPU 0: Tesla P4 (UUID: GPU-fa974b1d-3c17-ed92-28d0-805c6d089601)
$ docker run -it --rm --gpus all ubuntu nvidia-smi --query-gpu=index,name,uui
d,serial --format=csv
index, name, uuid, serial
0, Tesla P4, GPU-fa974b1d-3c17-ed92-28d0-805c6d089601, 0325017070224
A Quick Look at NVIDIA Deep Learning..
The NVIDIA Deep Learning GPU Training System, a.k.a DIGITS is a webapp for training deep learning models. It puts the power of deep learning into the hands of engineers & data scientists. It can be used to rapidly train the highly accurate deep neural network (DNNs) for image classification, segmentation and object detection tasks.The currently supported frameworks are: Caffe, Torch, and Tensorflow.
DIGITS simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real time with advanced visualizations, and selecting the best performing model from the results browser for deployment. DIGITS is completely interactive so that data scientists can focus on designing and training networks rather than programming and debugging.
To test-drive DIGITS, you can get it up and running in a single Docker container:
$docker run -itd --gpus all -p 5000:5000 nvidia/digits
You can open up web browser and verify if its running on the below address:
w3m http://<dockerhostip>:5000
Verifying again with nvidia-smi
$ docker run -it --rm --gpus all ubuntu nvidia-smi
Tue May 7 16:27:37 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116 Driver Version: 390.116 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:00:04.0 Off | 0 |
| N/A 51C P0 24W / 75W | 129MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
If you want to see it in action, here’s a quick video
Can I use Docker compose for NVIDIA GPU?
Not Yet. I have raised a feature request under this link few minutes back.
Special Thanks to Tibor Vass, Docker Staff Engineer for reviewing this blog.
Comments are closed.