The NVIDIA Jetson Nano 2GB Developer Kit is the ideal platform for teaching, learning, and developing AI and robotics applications. It uses the same proven NVIDIA JetPack Software Development Kit (SDK) used in breakthrough AI-based products. The new developer kit is unique in its ability to utilize the entire NVIDIA CUDA-X™ accelerated computing software stack including TensorRT for fast and efficient AI inference — all in a small form factor and at a significantly lower price. The Jetson Nano 2GB Developer Kit is priced at $59 and will be available for purchase starting end-October.
Under this blog post, I will cover the below details:
- Installing Docker
- Installing Docker Compose
- Testing GPU support
- Running JTOP Docker container
- Compiling CUDA drivers and libraries
- Running deviceQuery on Docker with GPU support
- Running deviceQuery on Containerd with GPU support
- Running deviceQuery on the K3s cluster
Hardware
- Jetson Nano
- A Camera Module
- A 5V 4Ampere Charger
- 64GB SD card
Software
- Jetson SD card image from https://developer.nvidia.com/embedded/downloads
- Etcher software installed on your system
Preparing Your Jetson Nano
1. Preparing Your Raspberry Pi Flashing Jetson SD Card Image
- Unzip the SD card image
- Insert SD card into your system.
- Bring up Etcher tool and select the target SD card to which you want to flash the image.
2. Verifying if it is shipped with Docker Binaries
Jetson Nano SD card images comes with Docker 20.10.6 by default.
ajeetraina@ajeetraina-desktop:~$ sudo docker version
Client:
Version: 20.10.2
API version: 1.41
Go version: go1.13.8
Git commit: 20.10.2-0ubuntu1~18.04.2
Built: Tue Mar 30 21:35:54 2021
OS/Arch: linux/arm64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.2
API version: 1.41 (minimum version 1.12)
Go version: go1.13.8
Git commit: 20.10.2-0ubuntu1~18.04.2
Built: Mon Mar 29 19:27:41 2021
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.4.4-0ubuntu1~18.04.2
GitCommit:
runc:
Version: spec: 1.0.2-dev
GitCommit:
docker-init:
Version: 0.19.0
GitCommit:
pico@pico1:~$
Installing nvidia-docker
sudo apt install nvidia-docker2
Install nvidia-container-runtime package:
sudo yum install nvidia-container-runtime
Update docker daemon
sudo vim /etc/docker/daemon.json
Ensure that /etc/docker/daemon.json with the path to nvidia-container-runtime:
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Make docker update the path:
sudo pkill -SIGHUP dockerd
3. Installing Docker Compose on NVIDIA Jetson Nano
Jetson Nano doesnt come with Docker Compose installed by default. You will need to install it first:
export DOCKER_COMPOSE_VERSION=1.27.4
sudo apt-get install libhdf5-dev
sudo apt-get install libssl-dev
sudo pip3 install docker-compose=="${DOCKER_COMPOSE_VERSION}"
apt install python3
apt install python3-pip
pip install docker-compose
docker-compose version
docker-compose version 1.26.2, build unknown
docker-py version: 4.3.1
CPython version: 3.6.9
OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
4. Identify the Jetson board
pico@pico1:~$ git clone https://github.com/jetsonhacks/jetsonUtilities
Cloning into 'jetsonUtilities'...
remote: Enumerating objects: 123, done.
remote: Counting objects: 100% (39/39), done.
remote: Compressing objects: 100% (30/30), done.
remote: Total 123 (delta 15), reused 23 (delta 8), pack-reused 84
Receiving objects: 100% (123/123), 32.87 KiB | 5.48 MiB/s, done.
Resolving deltas: 100% (49/49), done.
pico@pico1:~$ cd jetson
-bash: cd: jetson: No such file or directory
pico@pico1:~$ cd jetsonUtilities/
pico@pico1:~/jetsonUtilities$ ls
LICENSE README.md jetsonInfo.py scripts
pico@pico1:~/jetsonUtilities$ python3 jetsonInfo.py
NVIDIA Jetson Nano (Developer Kit Version)
L4T 32.4.4 [ JetPack 4.4.1 ]
Ubuntu 18.04.5 LTS
Kernel Version: 4.9.140-tegra
CUDA 10.2.89
CUDA Architecture: 5.3
OpenCV version: 4.1.1
OpenCV Cuda: NO
CUDNN: 8.0.0.180
TensorRT: 7.1.3.0
Vision Works: 1.6.0.501
VPI: 4.4.1-b50
Vulcan: 1.2.70
5. Running Jtop in a Docker Container
In the latest release, JTOP is recommended instead of NVIDIA-SMI.
sudo docker run --rm -it --gpus all \
-v /run/jtop.sock:/run/jtop.sock ajeetraina/jetson-stats-nano jtop
Use the “tab” key to switch to different GPUs and CPUs.
6. CUDA Compilers and Libraries
ajeetraina@ajeetraina-desktop:~/meetup$ nvcc --version
-bash: nvcc: command not found
ajeetraina@ajeetraina-desktop:~/meetup$ export PATH=${PATH}:/usr/local/cuda/bin
ajeetraina@ajeetraina-desktop:~/meetup$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64
ajeetraina@ajeetraina-desktop:~/meetup$ source ~/.bashrc
ajeetraina@ajeetraina-desktop:~/meetup$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
7. Testing GPU Support
We’ll use the deviceQuery NVIDIA test application (included in L4T) to check that we can access the GPU in the cluster. First, we’ll create a Docker image with the appropriate software, run it directly as Docker, then run it using containerd ctr and finally on the Kubernetes cluster itself.
8. Running deviceQuery on Docker with GPU support
Create a directory
mkdir test
cd test
Copy the sample files
Copy the demos where deviceQuery is located to the working directory where the Docker image will be created:
cp -R /usr/local/cuda/samples .
Create a Dockerfile
FROM nvcr.io/nvidia/l4t-base:r32.5.0
RUN apt-get update && apt-get install -y --no-install-recommends make g++
COPY ./samples /tmp/samples
WORKDIR /tmp/samples/1_Utilities/deviceQuery
RUN make clean && make
CMD ["./deviceQuery"]
sudo docker build -t ajeetraina/jetson_devicequery . -f Dockerfile
pico@pico2:~/test$ sudo docker run --rm --runtime nvidia ajeetraina/jetson_devicequery:latest
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA Tegra X1"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 3963 MBytes (4155383808 bytes)
( 1) Multiprocessors, (128) CUDA Cores/MP: 128 CUDA Cores
GPU Max Clock rate: 922 MHz (0.92 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
9. Running deviceQuery on containerd with GPU support
Since K3s uses containerd as its runtime by default, we will use the ctr command line to test and deploy the deviceQuery image we pushed on containerd with this script:
#!/bin/bash
IMAGE=ajeetraina/jetson_devicequery:latest
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
ctr i pull docker.io/${IMAGE}
ctr run --rm --gpus 0 --tty docker.io/${IMAGE} deviceQuery
10. Execute the script
sudo sh usectr.sh
sudo sh usectr.sh
docker.io/ajeetraina/jetson_devicequery:latest: resolved |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:dfeaad4046f78871d3852e5d5fb8fa848038c57c34c6554c6c97a00ba120d550: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4438ebff930fb27930d802553e13457783ca8a597e917c030aea07f8ff6645c0: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:b1cdeb9e69c95684d703cf96688ed2b333a235d5b33f0843663ff15f62576bd4: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:bf60857fb4964a3e3ce57a900bbe47cd1683587d6c89ecbce4af63f98df600aa: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:0aac5305d11a81f47ed76d9663a8d80d2963b61c643acfce0515f0be56f5e301: done |++++++++++++++++++++++++++++++++++++++|
config-sha256:37987db6d6570035e25e713f41e665a6d471d25056bb56b4310ed1cb1d79a100: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:f0f57d03cad8f8d69b1addf90907b031ccb253b5a9fc5a11db83c51aa311cbfb: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:08c23323368d4fde5347276d543c500e1ff9b712024ca3f85172018e9440d8b0: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:04da93b342eb651d6b94c74a934a3290697573a907fa0a06067b538095601745: done |++++++++++++++++++++++++++++++++++++++|
layer-sha256:f84ceb6e8887e9b3b454813459ee97c2b9730869dbd37d4cca4051958b7a5a36: done |++++++++++++++++++++++++++++++++++++++|
elapsed: 81.4s total: 305.5 (3.8 MiB/s)
unpacking linux/arm64/v8 sha256:dfeaad4046f78871d3852e5d5fb8fa848038c57c34c6554c6c97a00ba120d550...
done
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA Tegra X1"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 5.3
Total amount of global memory: 3963 MBytes (4155383808 bytes)
( 1) Multiprocessors, (128) CUDA Cores/MP: 128 CUDA Cores
GPU Max Clock rate: 922 MHz (0.92 GHz)
Memory Clock rate: 13 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
11. Running deviceQuery on the K3s cluster
pico@pico2:~/test$ cat pod_deviceQuery.yaml
apiVersion: v1
kind: Pod
metadata:
name: devicequery
spec:
containers:
- name: nvidia
image: ajeetraina/jetson_devicequery:latest
command: [ "./deviceQuery" ]
pico@pico2:~/test$
sudo KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply -f ./pod_deviceQuery.yaml
pod/devicequery created
pico@pico2:~/test$ sudo KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl describe pod devicequery
Name: devicequery
Namespace: default
Priority: 0
Node: pico4/192.168.1.163
Start Time: Sun, 13 Jun 2021 09:16:44 -0700
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
nvidia:
Container ID:
Image: ajeetraina/jetson_devicequery:latest
Image ID:
Port: <none>
Host Port: <none>
Command:
./deviceQuery
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mcrmv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-mcrmv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 78s default-scheduler Successfully assigned default/devicequery to pico4
Normal Pulling 77s kubelet Pulling image "ajeetraina/jetson_devicequery:latest"
pico@pico2:~/test$
cat pod_deviceQuery_jetson4.yaml
apiVersion: v1
kind: Pod
metadata:
name: devicequery
spec:
nodeName: pico4
containers:
- name: nvidia
image: ajeetraina/jetson_devicequery:latest
command: [ "./deviceQuery" ]
pico@pico2:~/test$
pico@pico2:~/test$ sudo KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl describe pod devicequery
Name: devicequery
Namespace: default
Priority: 0
Node: pico4/192.168.1.163
Start Time: Sun, 13 Jun 2021 09:16:44 -0700
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.42.1.3
IPs:
IP: 10.42.1.3
Containers:
nvidia:
Container ID: containerd://fd502d6bfa55e2f80b2d50bc262e6d6543fd8d09e9708bb78ecec0b2e09621c3
Image: ajeetraina/jetson_devicequery:latest
Image ID: docker.io/ajeetraina/jetson_devicequery@sha256:dfeaad4046f78871d3852e5d5fb8fa848038c57c34c6554c6c97a00ba120d550
Port: <none>
Host Port: <none>
Command:
./deviceQuery
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 13 Jun 2021 09:21:50 -0700
Finished: Sun, 13 Jun 2021 09:21:50 -0700
Ready: False
Restart Count: 5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mcrmv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-mcrmv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m51s default-scheduler Successfully assigned default/devicequery to pico4
Normal Pulled 5m45s kubelet Successfully pulled image "ajeetraina/jetson_devicequery:latest" in 2m5.699757621s
Normal Pulled 5m43s kubelet Successfully pulled image "ajeetraina/jetson_devicequery:latest" in 1.000839703s
Normal Pulled 5m29s kubelet Successfully pulled image "ajeetraina/jetson_devicequery:latest" in 967.072951ms
Normal Pulled 4m59s kubelet Successfully pulled image "ajeetraina/jetson_devicequery:latest" in 1.025604394s
Normal Created 4m59s (x4 over 5m45s) kubelet Created container nvidia
Normal Started 4m59s (x4 over 5m45s) kubelet Started container nvidia
Warning BackOff 4m20s (x8 over 5m42s) kubelet Back-off restarting failed container
Normal Pulling 2m47s (x6 over 7m51s) kubelet Pulling image "ajeetraina/jetson_devicequery:latest"
pico@pico2:~/test$ sudo KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply -f ./pod_deviceQuery_jetson4.yaml
pod/devicequery configured
In my next blog, we will see how to deploy Jetson Software stack for Deepstreaming .
Comments are closed.