Building a Real-Time Crowd Face Mask Detection System using Docker on NVIDIA Jetson Nano

Table of Contents

Did you know? Around 94% of AI Adopters are using or plan to use containers within 1 year time. Containers are revolutionizing a variety of workloads across the enterprise IT space, and AI adopters seem keen on using this technology to improve their AI workloads. AI adopters are intrigued by the benefits of scalability and speed that containers can bring to their AI deployments. In one of survey, 28% cite increased scalability as a benefit, and 27% say containers can decrease time to deployment. Another 26% even indicate containers will help lower costs. It seems that AI adopters are still figuring out some of the other ways containers can benefits their AI deployments. For example, only 19% cite the benefit of increased portability, which will likely only grow in importance as AI infrastructure becomes more hybrid in nature.

The power of modern AI is now available for makers, learners, and embedded developers everywhere. Thanks to NVIDIA for introducing a $99 NVIDIA^® Jetson Nano^™ Developer Kit. It is a small, powerful computer that lets you run multiple neural networks in parallel for applications like image classification, object detection, segmentation, and speech processing. All in an easy-to-use platform that runs in as little as 5 watts. It’s simpler than ever to get started! Just insert a microSD card with the system image, boot the developer kit, and begin using the same NVIDIA JetPack SDK used across the entire NVIDIA Jetson™ family of products.

MaskCam is an open source project hosted over GITHUB. It is a prototype reference design for a Jetson Nano-based smart camera system . MaskCam can be run on a Jetson Nano Developer Kit, or on a Jetson Nano module (SOM). It measures crowd face mask usage in real-time, with all AI computation performed at the edge. It detects and tracks people in its field of view and determines whether they are wearing a mask via an object detection, tracking, and voting algorithm. It uploads statistics (not videos) to the cloud, where a web GUI can be used to monitor face mask compliance in the field of view. It saves interesting video snippets to local disk (e.g., a sudden influx of lots of people not wearing masks) and can optionally stream video via RTSP.

Please Note: Maskcam was designed to use the Raspberry Pi High Quality Camera but will also work with pretty much any USB webcam that is supported on Linux.

Technology

Written in Python
Runs perfectly well with JetPack 4.4.1 or 4.5
Edge AI processing is handled by NVIDIA’s DeepStream video analytics framework, YOLOv4-tiny, and Tryolabs’ Norfair tracker.
Reports statistics to and receives commands from the cloud using MQTT and a web-based GUI.
The software is containerized and for evaluation can be easily installed on a Jetson Nano DevKit using docker with just a couple of commands
For production, MaskCam can run under balenaOS, which makes it easy to manage and deploy multiple devices.

In this tutorial, you will learn how to implement a COVID-19 crowd face mask detector with Jetson Nano & Docker in 5 minutes.

Hardware

A Jetson Nano Dev Kit running JetPack 4.4.1 or 4.5
An external DC 5 volt, 4 amp power supply connected through the Dev Kit’s barrel jack connector (J25). (See these instructions on how to enable barrel jack power.) This software makes full use of the GPU, so it will not run with USB power.
A USB webcam attached to your Nano

A 5V 4Ampere Charger
64GB SD card
Another computer with a program that can display RTSP streams — we suggest VLC or QuickTime.

Software

Jetson SD card image from https://developer.nvidia.com/embedded/downloads
Etcher software installed on your system

Preparing Your Jetson Nano

1. Preparing Your Raspberry Pi Flashing Jetson SD Card Image

Unzip the SD card image
Insert SD card into your system.
Bring up Etcher tool and select the target SD card to which you want to flash the image.

sudo lshw -C system
pico2                       
    description: Computer
    product: NVIDIA Jetson Nano Developer Kit
    serial: 1422919082257
    width: 64 bits
    capabilities: smp cp15_barrier setend swp

CUDA Compiler and Libraries

ajeetraina@ajeetraina-desktop:~/meetup$ nvcc --version
-bash: nvcc: command not found
ajeetraina@ajeetraina-desktop:~/meetup$ export PATH=${PATH}:/usr/local/cuda/bin
ajeetraina@ajeetraina-desktop:~/meetup$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64
ajeetraina@ajeetraina-desktop:~/meetup$ source ~/.bashrc
ajeetraina@ajeetraina-desktop:~/meetup$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:14:42_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

DeviceQuery

$ pwd

/usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make

ajeetraina@ajeetraina-desktop:/usr/local/cuda/samples/1_Utilities/deviceQuery$ sudo make
/usr/local/cuda-10.2/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery.o -c deviceQuery.cpp
/usr/local/cuda-10.2/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/aarch64/linux/release
cp deviceQuery ../../bin/aarch64/linux/release
ajeetraina@ajeetraina-desktop:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ls
Makefile  NsightEclipse.xml  deviceQuery  deviceQuery.cpp  deviceQuery.o  readme.txt
ajeetraina@ajeetraina-desktop:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3956 MBytes (4148387840 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

2. Verifying if it is shipped with Docker Binaries

ajeetraina@ajeetraina-desktop:~$ sudo docker version
[sudo] password for ajeetraina: 
Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Fri Feb 28 23:47:53 2020
 OS/Arch:           linux/arm64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Wed Feb 19 01:06:16 2020
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.2
  GitCommit:        
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:

3. Checking Docker runtime

Starting with JetPack 4.2, NVIDIA has introduced a container runtime with Docker integration. This custom runtime enables Docker containers to access the underlying GPUs available in the Jetson family.

pico@pico1:/tmp/docker-build$ sudo nvidia-docker version
NVIDIA Docker: 2.0.3
Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Fri Feb 28 23:47:53 2020
 OS/Arch:           linux/arm64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Wed Feb 19 01:06:16 2020
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.2
  GitCommit:        
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:

4. Configuring Docker Daemon

Open the daemon.json ( /etc/docker/daemon.json)


{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

5. Restarting the Docker daemon

systemctl restart docker

6. Install nvidia-container-runtime

sudo apt install nvidia-container-runtime

7. Run the Face Mask detection

Run the below docker command to implement the face mask detection system. The MaskCam container should start running the maskcam_run.py script, using the USB camera as the default input device (/dev/video0). It will produce various status output messages (and error messages, if it encounters problems). If there are errors, the process will automatically end after several seconds. Check the Troubleshooting section for tips on resolving errors.

Otherwise, after 30 seconds or so, it should continually generate status messages (such as Processed 100 frames...). Leave it running (don’t press Ctrl+C, but be aware that the device will start heating up) and continue to the next section to visualize the video!

sudo docker run --runtime nvidia --privileged --rm -it --env MASKCAM_DEVICE_ADDRESS=<your-jetson-ip> -p 1883:1883 -p 8080:8080 -p 8554:8554 maskcam/maskcam-beta

Viewing the Live Video Stream

If you scroll through the logs and don’t see any errors, you should find a message like:

Streaming at rtsp://IP_Address:8554/maskcam

where IP_Addressis the address that you provided in MASKCAM_DEVICE_ADDRESS previously. If you didn’t provide an address, you’ll see some unknown address label there, but the streaming will still work.

Next, you just need to put the URL into your RSTP streaming viewer (VLC) on another computer. If all goes well, you should be rewarded with streaming video of your Nano, with green boxes around faces wearing masks and red boxes around faces not wearing masks.