Docker, Prometheus & Pushgateway for NVIDIA GPU Metrics & Monitoring

In my last blog post, I talked about how to get started with NVIDIA docker & interaction with NVIDIA GPU system. I demonstrated NVIDIA Deep Learning GPU Training System, a.k.a DIGITS by running it inside Docker container. ICYMI –  DIGITS is essentially a webapp for training deep learning models and is used to rapidly train the highly accurate deep neural network (DNNs) for image classification, segmentation and object detection tasks.The currently supported frameworks are: Caffe, Torch, and Tensorflow. It simplifies common deep learning tasks such as managing data, designing and training neural networks on multi-GPU systems, monitoring performance in real time with advanced visualizations, and selecting the best performing model from the results browser for deployment. 


In a typical HPC environment where you run 100 and 100s of NVIDIA GPU equipped cluster of nodes, it becomes important to monitor those systems to gain insight of the performance metrics, memory usage, temperature and utilization. . Tools like Ganglia & Nagios etc. are very popular due to their scalable  & distributed monitoring architecture for high-performance computing systems such as clusters and Grids. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. But with the advent of container technology, there is a need of modern monitoring tools and solutions which works well with Docker & Microservices. 

It’s all modern world of Prometheus Stack…

Prometheus is 100% open-source service monitoring system and time series database written in Go.It is a full monitoring and trending system that includes built-in and active scraping, storing, querying, graphing, and alerting based on time series data. It has knowledge about what the world should look like (which endpoints should exist, what time series patterns mean trouble, etc.), and actively tries to find faults.

How is it different from Nagios?

Though both serves a purpose of monitoring, Prometheus wins this debate with the below major points – 

  • Nagios is host-based. Each host can have one or more services, which has one check.There is no notion of labels or a query language. But Prometheus comes with its robust query language called “PromQL”. Prometheus provides a functional expression language that lets the user select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus’s expression browser, or consumed by external systems via the HTTP API.
  • Nagios is suitable for basic monitoring of small and/or static systems where blackbox probing is sufficient. But if you want to do whitebox monitoring, or have a dynamic or cloud based environment then Prometheus is a good choice.
  • Nagios is primarily just about alerting based on the exit codes of scripts. These are called “checks”. There is silencing of individual alerts, however no grouping, routing or deduplication.

Let’s talk about Prometheus Pushgateway..

Occasionally you will need to monitor components which cannot be scraped. They might live behind a firewall, or they might be too short-lived to expose data reliably via the pull model. The Prometheus Pushgateway allows you to push time series from these components to an intermediary job which Prometheus can scrape. Combined with Prometheus’s simple text-based exposition format, this makes it easy to instrument even shell scripts without a client library.

The Prometheus Pushgateway allow ephemeral and batch jobs to expose their metrics to Prometheus. Since these kinds of jobs may not exist long enough to be scraped, they can instead push their metrics to a Pushgateway. The Pushgateway then exposes these metrics to Prometheus. It is important to understand that the Pushgateway is explicitly not an aggregator or distributed counter but rather a metrics cache. It does not have statsd-like semantics. The metrics pushed are exactly the same as you would present for scraping in a permanently running program.For machine-level metrics, the textfile collector of the Node exporter is usually more appropriate. The Pushgateway is intended for service-level metrics. It is not an event store

Under this blog post, I will showcase how NVIDIA Docker, Prometheus & Pushgateway come together to  push NVIDIA GPU metrics to Prometheus Stack.

Infrastructure Setup:

  • Docker Version: 17.06
  • OS: Ubuntu 16.04 LTS
  • Environment : Managed Server Instance with GPU
  • GPU: GeForce GTX 1080 Graphics card

Cloning the GITHUB Repository

Run the below command to clone the below repository to your Ubuntu 16.04 system equipped with GPU card:

git clone

Script to bring up Prometheus Stack(Includes Grafana)

Change to nvidia-prometheus-stats directory with proper execute permission & then execute the ‘’ script as shown below:

cd nvidia-prometheus-stats
sudo chmod +x
sudo sh

This script will bring up 3 containers in sequence – Pushgateway, Prometheus & Grafana

Executing GPU Metrics Script:

NVIDIA provides a python module for monitoring NVIDIA GPUs using the newly released Python bindings for NVML (NVIDIA Management Library). These bindings are under BSD license and allow simplified access to GPU metrics like temperature, memory usage, and utilization.

Next, under the same directory, you will find a python script called “”.

Execute the script (after IP under line number – 124 as per your host machine) as shown below:

sudo python

That’s it. It is time to open up Prometheus & Grafana UI under http://<IP-address>:9090

Just type gpu under the Expression section and you will see the list of GPU metrics automatically turned up as shown below:

Accessing the targets

Go to Status > Targets to see what targets are accessible. The Status should show up UP.

Click on Push gateway Endpoint to access the GPU metrics in details as shown:

Accessing Grafana

 You can access Grafana through the below link:



Did you find this blog helpful?  Feel free to share your experience. Get in touch @ajeetsraina

If you are looking out for contribution/discussion, join me at Docker Community Slack Channel.



Test Drive 5 Cool Docker Application Stacks on play-with-docker (PWD) platform

Do you want to learn Docker FOR FREE OF COST? Yes, you read it correct. Thanks to a playground called “play-with-docker” – PWD in short. PWD is a website which allows you to create 5 instances to play around with Docker & Docker Swarm Mode for 4 hours – all for $0 cost. It is a perfect tool for demos, Meetups, beginners & advanced level training. I tested it on number of web browsers like Chrome, Safari, Firefox and IE and it just works flawlessly. You don’t need to install Docker as it comes by default. It’s ready-to-use platform.




Currently, PWD is hosted on AWS instance type 2x r3.4xlarge  having 16 cores and 120GB of RAM. It comes with the latest Docker 17.05 release, docker-compose 1.11.1 & docker-machine 0.9.0-rc1. You can setup your own PWD environment in your lab using this repository. Credits to Docker Captain – Marcos Nils & Jonathan Leibuisky for building this amazing tool for Docker Community. But one of the most interesting fact about PWD is its based on DIND (Docker-in-a-Docker) concept. When you are playing around with PWD instances & building application stack, you are actually inside Docker container itself. Interesting, isn’t it? PWD gives you an amazing experience of having a free Alpine Linux  3.5 Virtual Machine in the cloud where you can build and run Docker containers and even create Multi-Node Swarm Mode Cluster.

Said that, PWD is NOT just a platform for beginners. Today, it has matured enough to run sophisticated application stack on top of it. Within seconds of time, you can setup Swarm Mode cluster running application stack.Please remember that PWD is just for trying out new stuffs with Docker and its application and NOT to be used for production environment. The instances will vanish after 4:00 hours automatically.





Under this blog post, I am going to showcase 3 out of 5 cool Dockerized application stack which you can demonstrate to the advance level audience all running on PWD playground (listed below):

Docker UI Management & Local Registry Service

  • Bringing Portainer & Portus together for PWD under Swarm Mode

Building Monitoring Stack with Prometheus & Grafana:

  • Prometheus, Grafana & cAdvisor Stack on Swarm Mode

Demonstrating Voting Application under Swarm Mode

  • Voting App on Swarm Mode

Highly Available Web Application

  • LAMP Stack under Swarm Mode

Demonstrating RSVP

To make it quite simple, I have collected the list of sample application under

You can use git clone to pull the repository on the manager node:

$git clone

First, we need to setup Swarm Mode cluster on PWD. As Docker 17.05 already comes installed by default, we are all set to initialize the swarm mode cluster. Click on “New Instance” button on the left hand side of PWD and this will open up an instance on the right side as shown.Run the below command on the 1st instance as shown below:

$docker swarm init –advertise-addr <manager-ip>:2377 –listen-addr <manager-ip>:2377

This command will initialize the Swarm and suggest a command to join the worker node as shown:

$docker swarm join  –token <token-id> <manager-ip>:2377

Once you run the command on all the 4 worker nodes. You can go back to manager node and check the list of Swarm nodes:




1. Docker UI Management & Local Registry Service:

Setting up Portainer for Swarm Mode Cluster

It’s time to get started with our first application – Portainer. Portainer is an open-source lightweight management UI which allows you to easily manage your Docker host or Swarm cluster. Run the below command to setup Portainer for Swarm Mode cluster:



Ensure that portainer is up and running:



The moment Portainer service gets started, PWD displays 9000 port which Portainer works on as shown. You can click on this port number to directly open the Management UI.




This opens up a fancy UI which displays lots of management features related to image, container, swarm, network, volumes etc.




Portainer clearly displays the swarm cluster as shown:



Setting up Portus for Local Docker Registry:

Portus is an authorization service for your Docker registry. It provide a useful and powerful UI on top of your registry. To test driver Portus, one need to clone the repository:

$git clone && cd Portus

Run the below script with your manager IP to get Portus up and running:










Once this looks good, you should be able to browse its UI as shown:









I faced the issue related to Webpack::Rails::Manifest::ManifestLoadError which I got it fixed within few minutes. You now have control over Docker registry using this fancy UI. You can create Users, Organization for your development team and push/pull Docker images privately.

Hence, you can now open up Portainer and provide Portus as a local Docker registry instead of standard Docker registry. This makes Portainer & Portus work together flawlessly.

2. Building Monitoring Stack with Prometheus & Grafana

Prometheus is an open-source systems monitoring and alerting toolkit. Most Prometheus components are written in Go while some  written in Java, Python, and Ruby. It is designed for capturing high dimensional data. It is designed to be used for monitoring. On the other hand, ELK is a general-purpose NOSQL stack that is also very popular for monitoring and Logging.

To test-drive Prometheus & ELK, you can change the directory to /play-with-docker/docker-prometheus-swarm directory and run the stack deploy command as shown below:

$git clone

#cd docker101/play-with-docker/docker-prometheus-swarm/

$docker stack deploy -c docker-compose.yml myprom1

[ A Special Credits to Basi for building repository & tremendous effort for enabling this solution]




That’s it. Your Prometheus, Grafana & cadvisor for ELK stack is ready to be used.





Demonstrating Voting Application under Swarm Mode

Voting app is a perfect piece of example where it showcase dependencies among the services, and a potential division of services between the manager and worker nodes in a swarm.  You can learn more about voting app and how it actually works under this link. To quickly demonstrate it, let us pick up the right directory under the pulled repository:

$cd play-with-docker/example-voting-app

Run the below command:

$docker stack deploy -c docker-stack.yml myvotingapp









In the next series of this blog post, I am going to cover on other 2 application stacks – CloudYuga RSVP & WordPress Web Application.

Did you find this blog helpful?  Feel free to share your experience. Get in touch @ajeetsraina

If you are looking out for contribution, join me at Docker Community Slack Channel.


Assessing the current state of Docker Engine & Tools on Raspberry Pi

Are you planning to speak or conduct your next Docker Workshop on Raspberry Pi’s? Still curious to know whether the tools like Docker Machine, Docker Compose, Dockerfile, 1.12 Swarm Mode, High Availability & Load Balancing are compatible and good to be demonstrated to the workshop audience? In case you want to know the current state of Docker containers on Raspberry Pi, then you are at the right place. For the next 1/2 hour time, I will be talking about what Docker images, tools, networking and security features are supported on Raspberry Pi box.

Let’s have a quick glimpse of what popular tools works today with Docker Engine on Pi box. Below listed are few lists of tools and applications which currently works great on Raspberry Pi. Please remember that this is NOT official support Matrix from Docker Inc. Also, the versions specified are the latest tested and verified personally. I just verified the tools functionality so as to keep it ready before the workshop.


The  State of Docker Engine

In my previous blog post, I talked about “Docker 1.12.1 on Raspberry Pi 3 in 5 minutes” where I demonstrated on how to get started with Docker 1.12.1 Installation on Pi box for the first time. With 1.12.1, a FIRST ARM Debian package was officially made available and there was tremendous amount of interest among the Docker users. This time I tried my hands with the latest experimental build, Docker Engine version 1.12.3 on top of the latest Raspbian Jessie Lite version.




Docker Engine looked quite stable as I was able to try out basic Docker commands like exec, run, attach etc., creating Dockerfile and building the image was smooth on Pi box. Running and stopping the containers works as smooth as you experience on VMs on your Virtual box or Cloud instance. This release should be a good baseline for demonstration of the latest Docker CLI to the workshop audience(for both beginners & advanced level users)

The State of Docker Compose

Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a Compose file to configure your application’s services.Major features like multiple isolated environments on a single host, building up Microservice architecture makes Compose a powerful tool. I picked up the latest stable Docker Compose version 1.8 so as to check and verify the latest compose functionality.


I followed this link to get Docker Compose version 1.8 installed on my Pi box. It went smooth and required no tweaking at all. I tried running WordPress application(running maria-db, Apache and PHP) inside docker containers using docker-compose and it just went fine.






The overall experience with Docker Compose was positive. Tried dozens of docker-compose CLIs like exec and run and it worked great. In case you are planning to demonstrate Compose 1.8, go ahead and this should be a great tool to demonstrate the concept of Microservices to the workshop users.

The State of Docker 1.12 Swarm Mode

I verified the functionality of Swarm Mode on Docker 1.12.3 and it just works flawless. I tested the basic functionality in setting up the master and worker nodes and it worked great:


One just need to add the worker nodes as suggested in the above command and it just worked as expected. I tried few features like workers leaving the cluster using $docker swarm leave option and it worked as expected. Creating networks and attaching services to the overlay network is one feature which I haven’t yet tried out. Will update this space once I try my hands-on with the overlay network.

The State of Docker 1.12 High Availability:

Demonstrating HA using Swarm Mode is always an interesting stuffs to do with the container clustering.  I had 5 nodes clustering setup with 5 Raspberry Pi – one master and other 4 worker nodes. I took the same WordPress application and demonstrated on 5 node cluster. I scaled out the WordPress application container from 10 to 30 and then tried stopping few containers on the 3rd Pi. Automatically the new containers came up with a new IDs balancing across the swarm cluster. I haven’t tested the master node failure on Pi box but I believe it should work with minimal 3 number of master nodes on Raspberry Pi boxes.

The State of Operating Systems Docker Image:

Last September, I built the first CentOS 7.2 ARM docker image on Raspberry Pi 3 which I described in detail through my blog. It was well appreciated effort and quite accepted by Docker community. I was trying to build Dell legacy application for ARM architecture and the legacy application was tightly coupled to CentOS 7.x distribution. Hence, just thought to pick up necessary packages from Fedora repository and successfully built the required CentOS 7.2 Docker image. Other than CentOS, Ubuntu, Alpine Linux and Arch-Linux are few of the most popular Docker OS image which can be demonstrated to the workshop audience.


The State of Tools(Monitoring, Management):

Two months back, I published a blog post on “Turn Your Raspberry Pi into Out-of-band Monitoring Device using Docker“. I pushed Nagios Docker image for the first time for ARM architecture which you can use freely from Dockerhub. Demonstrating the monitoring tool like Nagios running inside Docker on Raspberry Pi can be a great example of how Docker reduces the complexity in packaging the huge application.

Talking about UI & Management of Docker containers, I came across Portainer – A Simple Management UI for Docker .Portainer is very young project but gaining a huge popularity due to its easy to use UI, lightweight and responsive user interface.

[Updated: 11/21/2016] – Good News for Pi users – Portainer now has official support for ARM arch using the Docker image portainer/portainer:arm ! Portainer recently added more features to support swarm mode services in their latest release.

Setting up Portainer is a matter of one-liner command(as shown below):


That’s it. All you need to do is open your browser and point out to http://<IP>:9000.Portainer provides you a lightweight management UI which allows you to easily manage your Docker host or Swarm cluster.


One of the most attractive offering from Portainer Team is “Use Your Own Templates” . It allows you to rapidly deploy containers using App Templates, a glimpse of which can be found at


In short, Portainer allows you to manage your Docker containers, images, volumes, networks and more ! It is compatible with the standalone Docker engine and with Docker Swarm.

In my future blog post, I am going to touch upon the upcoming Docker releases verified for Raspberry Pi box and its compatibility matrix. Feel free to share your thoughts through twitter(@ajeetsraina)

[clickandtweet handle=”@docker @ajeetsraina” hashtag=”#RaspberryPi #docker ” related=”@Raspberry_Pi” layout=”” position=””]Run Your Next Docker Workshop on Raspberry Pi[/clickandtweet]




Running Prometheus Docker container for monitoring Microservices on Raspberry Pi

In the previous post, we talked about running Nagios container on Raspberry Pi for the first time.Nagios is a free and open source monitoring and alerting service which basically collects the statistics of your server using agents like nrpe, check_mk or using SNMP and sends the alert if the metric value is above the predefined threshold. It is a great tool for monitoring the nodes or monolithic application but with the advent of the ‘microservices’ architecture and Docker, monitoring the small piece of services running across the cluster becomes very important.With that also grows the need to monitor these services around the clock, to maintain the healthy functioning of the application.


One of the most talked-about monitoring tools are Prometheus & Grafana. Prometheus is an open-source systems monitoring and alerting toolkit written in Go. It is a next-generation monitoring system and well appreciated by Docker Core Engineering Team. It’s a service especially well designed for containers and provides perspective about the data intensiveness of this new age and how even Internet scale companies have had to adapt.

On the other hand, Grafana can be rightly called the face of Prometheus. It is also an open-source, general purpose dashboard and graph composer, which runs as a web application. It supports graphite, influxdb or opentsdb as backends. While Prometheus exposes some of its internals like settings and the stats it gathers via basic web front-ends, it delegates the heavy lifting of proper graphical displays and dashboards to Grafana. For anyone who want to collect metrics from the development environment to be graphed and measured, Prometheus as time series store and Grafana as a visualization tool is the perfect solution.

What’s unique about Prometheus?

Prometheus is a self-hosted set of tools which collectively provide metrics storage, aggregation, visualization and alerting. Usually most of the available monitoring tools are push based, i.e. agents on the monitored servers talk to a central server (or set of servers) and send out their metrics. Prometheus is little different on this aspect – It is a pull based server which expects monitored servers to provide a web interface from which it can scrape data. This is a unique characteristic of Prometheus.

The main features of Prometheus can be summed below:

  • Decentralized architecture
  • Scalable data collection
  • Powerful Query language (leverages the data model for meaningful alerting (including easy silencing) and graphing (for dashboards and for ad-hoc exploration).
  • A multi-dimensional data model.( data can be sliced and diced at will, along dimensions like instance, service, endpoint, and method)
  • No reliance on distributed storage; single server nodes are autonomous.
  • Time series collection happens via a pull model over HTTP.
  • Pushing time series is supported via an intermediary gateway.
  • Targets are discovered via service discovery or static configuration.
  • Multiple modes of graphing and dashboard support.

You can deep dive more into its architecture at

I was in need of lightweight monitoring tool like Prometheus and Fancy UI like Grafana for my Raspberry Pi which I can spin up as Docker container in few seconds and then grow /spread it across the cluster primarily to test it for Docker 1.12 Swarm Mode. Unfortunately, I didn’t find any Docker image for Prometheus. I decided to build one for myself to explore it on Raspberry Pi 3 box.

After spending considerable amount of time on Pi box, I was able to build the first Prometheus Docker image and pushed it to Dockerhub. In case you are interested, check out my github page : to build and run your own Prometheus container.

[Thanks to Alex Ellis, #Docker Captain for reviewing this container and cleaning the Dockerfile.]



Running Prometheus & Grafana Docker Image on Raspberry Pi

Follow the below command to spin up Prometheus container on the fly:


Luckily, there was already a Grafana ARM image available in Dockerhub which I spent no minute to spin up easily as shown below:



You can also use docker-compose to bring up both the container in a single shot as shown:

pi@raspberrypi:~/testingPi $ cat docker-compose.yml

image: ajeetraina/prometheus-armv7
– ./prometheus.yml:/etc/prometheus/prometheus.yml
– ‘-config.file=/etc/prometheus/prometheus.yml’
– ‘9090:9090’

image: fg2it/grafana-armhf:v3.1.1

– “3000:3000”

Ensure that you have either sample or your own customized prometheus.yml file under / directory to run this container.

Just one command and you are ready to spin up both the containers as shown below:


That’s it. Browse to http://<ipaddress>:9090 to check out the Prometheus UI.


Also, Browse to http://<ipaddress>:3000 to open up fancy Grafana UI as shown below. Supply admin/collabnix as credentials to enter into Grafana UI.


Importing Prometheus Data into Grafana

Once you login into Grafana UI, click on Data Sources followed by “Add Data Source” button. You will find the below screen:


Next, Let’s import the prometheus into Grafana dashboard:



I firmly believe that the newly built Prometheus container is sure to help developers to monitor their microservice application on top of Raspberry Pi box.In the next post, I will talk about the other available data sources and how it integrates with Prometheus and Grafana as a monitoring stack in detail.