The 20-minutes Docker 1.12 Swarm Mode demonstration on Azure Platform

Estimated Reading Time: 7 minutes

2016 has been a great year for Docker Inc. With the announcement of Docker 1.12 release in last Dockercon, a new generation Docker clustering & distributed system was born. With an optional “Swarm Mode” feature rightly integrated into core Docker Engine, a native management of a cluster of Docker Engines, orchestration, decentralized design, service and application deployment, scaling, rolling updates, desired state reconciliation, multi-host networking, service discovery and routing mesh implementation – all of these features works flawlessly. With the recent Engine 1.12.5 release, all of these features have matured enough to make it production ready.

Under this blog post, I will be spending another 20-minutes to go quickly through the complete A-Z tutorial around Swarm Mode covering the important features like Orchestration, Scaling, Routing Mesh, Overlay Networking, Rolling Updates etc.

  • Preparing Your Azure Environment
  • Setting up Cluster Nodes
  • Setting up Master Node
  • Setting up Worker Nodes
  • Creating Your First Service
  • Inspecting the Service
  • Scaling service for the first time
  • Creating the Nginx Service
  • Verifying the Nginx Page
  • Stopping all the services in a single command
  • Building WordPress Application using CLI
  • Building WordPress Application using Docker-Compose
  • Demonstrating CloudYuga RSVP Application
  • Scaling the CloudYuga RSVP Application
  • Demonstrating Rolling Updates
  • Docker 1.12 Scheduling | Restricting Service to specific nodes


Preparing Your Azure Environment:

1. Login to Azure Portal.
2. Create minimal of 5 swarm nodes(1 master & 4 worker nodes) – [We can definitely automate this]
3. While creating Virtual Machine, select “Docker on Ubuntu Server” (It contains 1.12.3 and Ubuntu 16.04)
4. Select Password rather than Public Key for quick access using PuTTY.[for demonstration]
5. Select the default Resource Group for all the nodes to communicate each other


Setting Up Cluster Nodes:



Setting Up Master Node:

ajeetraina@Master1:~$ sudo docker swarm init –advertise-addr
Swarm initialized: current node (dj89eg83mymr0asfru2n9wpu7) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join \
–token SWMTKN-1-511d99ju7ae74xs0kxs9x4sco8t7awfoh99i0vwrhhwgjt11wi-d8y0tplji3z449ojrfgrrtgyc \

To add a manager to this swarm, run ‘docker swarm join-token manager’ and follow the instructions.


Setting up Worker Node:

Adding worker nodes is quite easy. Just run the above command to connect worker nodes to manager one by one. This can also be automated(will touch later if needed).

Tips: In case you loose your current session on Manager and want to know what token will allow you to connect to the cluster, run the below command on the manager node:

ajeetraina@Master1:~$ sudo docker swarm join-token worker
To add a worker to this swarm, run the following command:

docker swarm join \
–token SWMTKN-1-511d99ju7ae74xs0kxs9x4sco8t7awfoh99i0vwrhhwgjt11wi-d8y0tplji3z449ojrfgrrtgyc \

Run the above command on all the nodes one by one.


Listing the Swarm Cluster:

ajeetraina@Master1:~$ sudo docker node ls
ID                                                 HOSTNAME        STATUS    AVAILABILITY  MANAGER STATUS
49fk9jibezh2yvtjuh5wlx3td            Node2              Ready            Active
aos67yarmj5cj8k5i4g9l3k6g          Node1               Ready            Active
dj89eg83mymr0asfru2n9wpu7 *  Master1           Ready            Active                        Leader
euo8no80mr7ocu5uulk4a6fto       Node4              Ready            Active


Verifying if the node is worker node or not?

Run the below command on the worker node:

$sudo docker info

Swarm: active
NodeID: euo8no80mr7ocu5uulk4a6fto
Is Manager: false
Node Address:
Runtimes: runc

The “Is Manager: false” entry specifies that this node is a worker node.

Creating our First Service:

ajeetraina@Master1:~$ sudo docker service create alpine ping
ajeetraina@Master1:~$ sudo docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                  PORTS               NAMES
cc80ce569a94        alpine:latest       “ping”      2 seconds ago       Up Less than a second                       nauseous_stonebraker.1.6gcqr8d9brbf9lowgqpb4q6uo


Querying the service:

Syntax: sudo docker service ps <service-id>


ajeetraina@Master1:~$ sudo docker service ps 2ncb
ID                         NAME                    IMAGE   NODE     DESIRED STATE  CURRENT STATE           ERROR
6gcqr8d9brbf9lowgqpb4q6uo  nauseous_stonebraker.1  alpine  Master1  Running        Running 54 seconds ago

Alternative Method:

ajeetraina@Master1:~$ sudo docker service ps nauseous_stonebraker
ID                         NAME                    IMAGE   NODE     DESIRED STATE  CURRENT STATE               ERROR
6gcqr8d9brbf9lowgqpb4q6uo  nauseous_stonebraker.1  alpine  Master1  Running        Running about a minute ago


Inspecting the Service:

ajeetraina@Master1:~$ sudo docker service inspect 2ncb
“ID”: “2ncblsn85ft2sgeh5frsj0n8g”,
“Version”: {
“Index”: 27
“CreatedAt”: “2016-11-16T10:59:10.462901856Z”,
“UpdatedAt”: “2016-11-16T10:59:10.462901856Z”,
“Spec”: {
“Name”: “nauseous_stonebraker”,
“TaskTemplate”: {
“ContainerSpec”: {
“Image”: “alpine”,
“Args”: [
“Resources”: {
“Limits”: {},
“Reservations”: {}
“RestartPolicy”: {
“Condition”: “any”,
“MaxAttempts”: 0
“Placement”: {}
“Mode”: {
“Replicated”: {
“Replicas”: 1
“UpdateConfig”: {
“Parallelism”: 1,
“FailureAction”: “pause”
“EndpointSpec”: {
“Mode”: “vip”
“Endpoint”: {
“Spec”: {}
“UpdateStatus”: {
“StartedAt”: “0001-01-01T00:00:00Z”,
“CompletedAt”: “0001-01-01T00:00:00Z”


Scaling the Service:

ajeetraina@Master1:~$ sudo docker service ls
ID            NAME                  REPLICAS  IMAGE   COMMAND
2ncblsn85ft2  nauseous_stonebraker  1/1       alpine  ping
ajeetraina@Master1:~$ sudo docker service scale nauseous_stonebraker=4
nauseous_stonebraker scaled to 4
ajeetraina@Master1:~$ sudo docker service ls
ID            NAME                  REPLICAS  IMAGE   COMMAND
2ncblsn85ft2  nauseous_stonebraker  4/4       alpine  ping


Creating a Nginx Service:

ajeetraina@Master1:~$ sudo docker service create –name web –publish 80:80 –replicas 4 nginx
ajeetraina@Master1:~$ sudo docker service ls
ID            NAME                  REPLICAS  IMAGE   COMMAND
2ncblsn85ft2  nauseous_stonebraker  4/4       alpine  ping
9xm0tdkt83z3  web                   0/4       nginx
ajeetraina@Master1:~$ sudo docker service ls
ID            NAME                  REPLICAS  IMAGE   COMMAND
2ncblsn85ft2  nauseous_stonebraker  4/4       alpine  ping
9xm0tdkt83z3  web                   0/4       nginx
ajeetraina@Master1:~$ sudo docker service ls
ID            NAME                  REPLICAS  IMAGE   COMMAND
2ncblsn85ft2  nauseous_stonebraker  4/4       alpine  ping
9xm0tdkt83z3  web                   4/4       nginx


Verifying the Nginx Web Page:

ajeetraina@Master1:~$ sudo curl http://localhost
<!DOCTYPE html>
<title>Welcome to nginx!</title>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href=””></a>.<br/>
Commercial support is available at
<a href=””></a>.</p>

<p><em>Thank you for using nginx.</em></p>


Stopping all the swarm mode service in a single shot

$sudo docker service rm $(docker service ls | awk ‘{print $1}’)

Building WordPress Application using CLI

Create an overlay network:

$sudo docker network create –driver overlay collabnet

Run the backend(wordpressdb) service:

$sudo docker service create –env MYSQL_ROOT_PASSWORD=collab123 –env MYSQL_DATABASE=wordpress –network collabnet –replicas 1 –name wordpressdb mysql:latest

Run the frontend(wordpressapp) service:

$sudo docker service create –env WORDPRESS_DB_HOST=wordpressdb –env WORDPRESS_DB_PASSWORD=collab123 –network collabnet –replicas 4 –name wordpressapp –publish 80:80/tcp wordpress:latest

Inspecting the Virtual IP Address:

$docker service inspect –format {{.Endpoint.VirtualIPs}} wordpressdb
$docker service inspect –format {{.Endpoint.VirtualIPs}} wordpressapp

Ensuring that the WordPress Application is working or not:

$curl http://localhost

Building WordPress Application using Docker-Compose:

Create a file called docker-compose.yml under some directory on your Linux system with the below entry:

version: '2'

     image: mysql:5.7
       - db_data:/var/lib/mysql
     restart: always
       MYSQL_ROOT_PASSWORD: wordpress
       MYSQL_DATABASE: wordpress
       MYSQL_USER: wordpress
       MYSQL_PASSWORD: wordpress

       - db
     image: wordpress:latest
       - "8000:80"
     restart: always
       WORDPRESS_DB_HOST: db:3306
       WORDPRESS_DB_PASSWORD: wordpress

Execute the below command to bring up the application:

$sudo docker-compose up -d

Verifying the running containers:

$sudo docker-compose ps

Running the Interactive Mode for docker-compose:

$sudo docker-compose config –services


Testing CloudYuga RSVP Application:


$docker network create –driver overlay rsvpnet
$docker service create –name mongodb  -e MONGODB_DATABASE=rsvpdata –network rsvpnet  mongo:3.3
$docker service create –name rsvp -e MONGODB_HOST=mongodb –publish 5000 –network rsvpnet teamcloudyuga/rsvpapp

Verifying it opens up on Web browser:

$ curl http://localhost:30000
<!doctype html>
<title>RSVP App!</title>
<meta name=”viewport” content=”width=device-width, initial-scale=1″>
<link rel=”stylesheet” href=”/static/bootstrap.min.css”>
<link rel=”icon” href=”” type=”image/png” sizes=”
<script type=”text/javascript” src= “/static/jquery.min.js”></script>
<script type=”text/javascript” src= “/static/bootstrap.min.js”></script>
<div class=”jumbotron”>
<div class=”container”>
<div align=”center”>
<h2><a href=””>CloudYuga<img src=””/>Garage RSVP!<
<h3><font color=”maroon”> Serving from Host: 75658e4fd141 </font>
<font size=”8″ >

Delete the CloudYuga RSVP service

$ docker service rm rsvp


Create the CloudYuga RSVP service with custom names:

$ docker service create –name rsvp  -e MONGODB_HOST=mongodb -e TEXT1=”Docker Meetup” -e TEXT2=”Bangalore” –publish 5000  –network rsvpnet teamcloudyuga/rsvpapp


Scale the CloudYuga RSVP service:

$ docker service scale rsvp=5


Demonstrating the rolling update

$docker service update –image teamcloudyuga/rsvpapp:v1 –update-delay 10s rsvp

keep refreshing the RSVP frontend watch for changes,  “Name” should be converted into “Your Name”.

Demonstrating DAB and Docker Compose

$ mkdir cloudyuga
$ cd cloudyuga/
:~/cloudyuga$ docker-compose bundle -o cloudyuga.dab
WARNING: Unsupported top level key ‘networks’ – ignoring
Wrote bundle to cloudyuga.dab
:~/cloudyuga$ vi docker-compose.yml
:~/cloudyuga$ ls
cloudyuga.dab  docker-compose.yml

cat cloudyuga.dab
“Services”: {
“mongodb”: {
“Env”: [
“Image”: “mongo@sha256:08a90c3d7c40aca81f234f0b2aaeed0254054b1c6705087b10da1c1901d07b5d”,
“Networks”: [
“Ports”: [
“Port”: 27017,
“Protocol”: “tcp”
“web”: {
“Env”: [
“Image”: “teamcloudyuga/rsvpapp@sha256:df59278f544affcf12cb1798d59bd42a185a220ccc9040c323ceb7f48d030a75”,
“Networks”: [
“Ports”: [
“Port”: 5000,
“Protocol”: “tcp”
“Version”: “0.1”

Scaling the Services:

$docker service ls
ID            NAME               REPLICAS  IMAGE                                                                                          COMMAND
66kcl850fkkh  cloudyuga_web      1/1       teamcloudyuga/rsvpapp@sha256:df59278f544affcf12cb1798d59bd42a185a220ccc9040c323ceb7f48d030a75
aesw4vcj1s11  cloudyuga_mongodb  1/1

$docker service scale rsvp=5

aztab8c3r22c  rsvp               5/5       teamcloudyuga/rsvpapp:v1
f4olzfoomu76  mongodb            1/1       mongo:3.3

Restricting service to node-1

$sudo docker node update –label-addtype=ubuntu node-1

master==>sudo docker service create –name mycloud –replicas 5 –network collabnet –constraint ‘node.labels.type
== ubuntu’ dockercloud/hello-world
master==>sudo docker service ls
ID            NAME               REPLICAS  IMAGE
0elchvwja6y0  mycloud            0/5       dockercloud/hello-world

66kcl850fkkh  cloudyuga_web      3/3       teamcloudyuga/rsvpapp@sha256:df59278f544affcf12cb1798d59bd42a185a220ccc9
aesw4vcj1s11  cloudyuga_mongodb  1/1       mongo@sha256:08a90c3d7c40aca81f234f0b2aaeed0254054b1c6705087b10da1c1901d
aztab8c3r22c  rsvp               5/5       teamcloudyuga/rsvpapp:v1

f4olzfoomu76  mongodb            1/1       mongo:3.3

master==>sudo docker service ps mycloud
ID                         NAME       IMAGE                    NODE    DESIRED STATE  CURRENT STATE          ERROR
a5t3rkhsuegf6mab24keahg1y  mycloud.1  dockercloud/hello-world  node-1  Running        Running 5 seconds ago
54dfeuy2ohncan1sje2db9jty  mycloud.2  dockercloud/hello-world  node-1  Running        Running 3 seconds ago
072u1dxodv29j6tikck8pll91  mycloud.3  dockercloud/hello-world  node-1  Running        Running 4 seconds ago
enmv8xo3flzsra5numhiln7d3  mycloud.4  dockercloud/hello-world  node-1  Running        Running 4 seconds ago
14af770jbwipbgfb5pgwr08bo  mycloud.5  dockercloud/hello-world  node-1  Running        Running 4 seconds ago


What’s new upcoming in Docker Compose v1.9.0?

Estimated Reading Time: 5 minutes

Docker Compose has gained lots of attention in the recent past due to its easy one-liner installation(on Linux, Windows & Mac OS X), easy-to-use JSON & YAML format support , available sample docker-compose files on GITHUB  and a one-liner command to create and start all the services from your configuration. If you are looking out for Microservices implementation, Docker Compose is a great tool to get started with. With Compose, you can define and run complex application with Docker. Also, you define a multi-container application in a single file, then spin up your application in a single command which takes care of linking services together through Service Discovery.


Docker Compose 1.9 is currently under RC4 phase and nearing the Final Release. Several new features and improvements in terms of Networking, Logging & Compose CLI has been introduced. With this release, Docker Compose version 2.1 has been introduced for the first time.This release will support the setting up of  volume labels and network labels in YAML specification. BUT there is a good news for Microsoft Windows enthusiasts. Interactive mode for docker-compose run and docker-compose exec is now supported on Windows platforms and this is surely going to help Microsoft enthusiasts to play around with the services flawlessly.

The below picture shows what major features has been introduced since last year in Docker Compose release:


In case you are very new to Docker Compose, I suggest you to read this official documentation. If you are an experienced Compose user and curious to know how Docker Compose fits into Swarm Mode, don’t miss out my recent blog post. Under this blog post, we will look at the new features which are being introduced under Docker Compose 1.9 release.

Installation of Docker Compose v1.9

On Windows Server 2016 system, you can run the below command to get started with Docker Compose 1.9-rc4 release.


If you are on Linux host, the installation just goes flawless as shown below:

# curl -L`uname -s`-`uname -m` > /usr/local/bin/docker-compose

# chmod +x /usr/local/bin/docker-compose$ sudo docker-compose -vdocker-compose version 1.9.0-rc3, build fcd38d3


Introduction of Version 2.1 YAML specification format for the first time

Docker 1.9 introduces the newer version of Docker Compose YAML specification format rightly called “Version: 2.1” for the first time. To test drive, I created a docker-compose file for my wordpress application and it just worked well.


The docker-compose up -d just went good as shown below:


We can have a look at the list of services running using Docker compose as shown below:


Interactive Mode for docker exec & docker run

Though this feature has been there for Linux users quite for sometime, it has been newly introduced and supported on Windows Platform too. In case you are new to docker-compose run command, here is the simplified way to demonstrate it.

On Linux Host:


Note: In case you are new to docker-compose config command, it is a CLI tool which validates your Docker compose file.

Cool. One can use docker-compose run command to target one service out of several services mentioned under docker-compose.yml file and interact with that particular service without any issue.

On Windows Host:

To quickly test this feature, I spun up Windows Server 2016 on Azure, installed Docker and Docker Compose and forked repository which has collection of Windows Docker images. Though it was quite slow in the beginning, but once pulled bringing up services using Docker Compose was pretty quick.

NOTE: When running docker-compose, you will either need to explicitly reference the host port by adding the option “-H tcp://localhost:2375” to the end of this command (e.g. docker-compose -H “tcp://localhost:2375” or by setting your DOCKER_HOST environment variable to always use this port (e.g. $env:DOCKER_HOST=”tcp://localhost:2375”



cmp33As shown below, the services finally were up and running and one can easily check through docker-compose ps command as shown below:




Let us test docker-compose run feature now. I tried targeting the db service and running cmd command to see if it works well.



That’s really cool. Believe me, it was quite quick in bringing up command prompt.

Support for setting volume labels and network labels in docker-compose.yml

This is an important addition to Docker compose release. There has been several ask from Docker community user to bring up this feature and Docker team has done a great job in introducing it under this release.


If you look at the last few lines, the volume labels has been specified in the following format:




            –   “alpha=beta”

To verify if it rightly build up the container with the volume labels, one can issue the below command:


In the upcoming posts, I will be covering more features and bug fixes introduced under Docker Compose 1.9. Keep watching this space for further updates.

Docker 1.12 Swarm Mode & Persistent Storage using NFS

Estimated Reading Time: 5 minutes

Containers are stateless by nature and likely to be short-lived. They are quite ephemeral than VMs. What it actually means? Say, you have any data or logs generated inside the container and you don’t really care about loosing it no matter how many times you spin it up and down, like HTTP requests, then the ideal stateless feature should be good enough. BUT in case you are looking out for a solution which should record “stateful” applications like storing databases, storing logs etc. you definitely need persistent storage to be implemented. This is achieved by leveraging Docker’s volume mounts to create either a data volume or a data volume container that can be used and shared by other containers.


In case you’re new to Docker Storage, Docker Volumes are logical building blocks for shared storage when combined with plugins. It helps us to store state from the application to locations outside the docker image layer. Whenever you run a Docker command with -v, and provide it with a name or path, this gets managed within /var/lib/docker or in case you’re using a host mount, it’s something that exists on the host. The problem with this implementation is that the data is pretty inflexible, which means anything you write to that specific location, yes, it’ll stay there after the container’s life cycle, but only on that host. If you lose that host, the data will get erased. This clearly means that the situation is very prone to data loss. Within Docker, it looks very similar to what shown in the above picture, /var/lib/docker directory structure. Let’s talk about how to implement the management of data with an external storage. This could be anything from NFS to distributed file systems to block storage.

In my previous blog, we discussed about Persistent Storage implementation with DellEMC RexRay for Docker 1.12 Swarm Mode. Under this blog, we will look at how NFS works with Swarm Mode.I assume that you have an existing NFS server running in your infrastructure. If not, you can quickly set it up in just few minutes. I have Docker 1.12 Swarm Mode initialized with 1 master node and 4 worker nodes. Just for an example, I am going to leverage a Ubuntu 16.04 node(outside the swarm mode cluster) as NFS server and rest of the nodes( 1 master & 4 workers) as NFS client systems.

Setting up NFS environment:

There are two ways to setup NFS server – either using available Docker image or manually setting up NFS server on the Docker host machine. As I already have NFS server working on one of Docker host running Ubuntu 16.04 system, I will just verify if the configuration looks good.

Let us ensure that NFS server packages are properly installed as shown below:

raina_ajeet@master1:~$ sudo dpkg -l | grep nfs-kernel-server

1:1.2.8-9ubuntu12               amd64        support for NFS kernel server


I created the following NFS directory which I want to share across the containers running the swarm cluster.

$sudo mkdir /var/nfs

$sudo chown nobody:nogroup /var/nfs

It’s time to configure NFS shares. For this, let’s edit the export file to look like as show below:

raina_ajeet@master1:~$ cat /etc/exports

# /etc/exports: the access control list for filesystems which may be exported

#               to NFS clients.  See exports(5).#

# Example for NFSv2 and NFSv3:

# /srv/homes       hostname1(rw,sync,no_subtree_check) hostname2


# Example for NFSv4:

# /srv/nfs4        gss/krb5i(rw,sync,fsid=0,crossmnt,no_subtree_check)

# /srv/nfs4/homes  gss/krb5i(rw,sync,no_subtree_check)

/var/nfs    *(rw,sync,no_subtree_check)



As shown above, we will be sharing /var/nfs directory among all the worker nodes in the Swarm cluster.

Let’s not forget to run the below commands to provide the proper permission:

$sudo chown nobody:nogroup /var/nfs

$sudo exportfs -a

$sudo service nfs-kernel-server start

Great ! Let us cross-check if the configuration holds good.

raina_ajeet@master:~$ sudo df -h

Filesystem           Size  Used Avail Use% Mounted on

udev                 1.9G     0  1.9G   0% /dev

tmpfs                370M   43M  328M  12% /run

/dev/sda1             20G  6.6G   13G  35% /

tmpfs                1.9G     0  1.9G   0% /dev/shm

tmpfs                5.0M     0  5.0M   0% /run/lock

tmpfs                1.9G     0  1.9G   0% /sys/fs/cgroup

tmpfs                100K     0  100K   0% /run/lxcfs/controllers

tmpfs                370M     0  370M   0% /run/user/1001   20G   19G  1.3G  94% /mnt/nfs/var/nfs

As shown above, we have NFS server with IP:  and ready to share volume to all the worker nodes.

Running NFS service on Swarm Mode


In case you are new to –mount option introduced under Docker 1.12 Swarm Mode, here is an easy explanation:-

$sudo docker service create  –mount type=volume,volume-opt=o=addr=<source-host which is master node>, volume-opt=device=:<NFS directory>,volume-opt=type=nfs,source=<volume name>, target=/<insideContainer> –replicas 3 –name <service-name> dockerimage <command>

Let’s verify if NFS service is up and running across the swarm cluster –


We can verify the NFS volume driver through ‘docker volume’ utility as shown below –


Inspecting the NFS Volume on the nodes:


The docker volume inspect <vol_name> rightly displays the Mountpoint, Labels and Scope of the NFS volume driver.


Verifying if worker nodes are able to see the Volumes

Let’s verify by logging into one of worker node and trying to check if volume is shared across the swarm cluster:


Let us create a file under /var/nfs directory so as to see if storage persistent is actually working and data is shared across the swarm cluster:


Let’s verify if one of worker node can see the created file.


Hence, we implemented persistent storage for Docker 1.12 Swarm Mode using NFS. In the future post, we will talk further on the available storage plugin and its implementations.

[clickandtweet handle=”@ajeetsraina” hashtag=”#Swarm, #docker, #NFS” related=”@docker” layout=”” position=””]Docker 1.12 Swarm Mode & Persistent Storage using NFS[/clickandtweet]

What’s new in Docker 1.12 Scheduling? – Part-I

Estimated Reading Time: 7 minutes

In our previous posts, we spent considerable amount of time deep-diving into Swarm Mode which is in-built orchestration engine in Docker 1.12 release. The Swarm Mode orchestration engine comprises of desired state reconciliation, replicated and global services, configuration updates in the form of parallelism/delay and restart policies to name a few. Docker Engine 1.12 is not just about the multi-host and multi-container orchestration but there are numerous improvements in terms of Scheduling, Cluster management and Security. Under this post, I am going to talk about scheduling(primarily Engine & Swarm Labels) aspect in terms of new service API introduced under 1.12 engine.


Looking at Engine 1.12, scheduling can be referred to a subset of Orchestration.Orchestration is a broader term that refers to container scheduling, cluster management, and possibly the provisioning of master and worker nodes.When applications are scaled out across multiple swarm nodes, the ability to manage each nodes and abstract away the complexity of the underlying platform becomes more important.Under Docker 1.12 swarm mode cluster, we talk more of docker service rather than docker run.In terms of new service API, the “scheduling” refers to the ability for an operation team to build application services onto a swarm node cluster that establishes how to run a specific group of tasks/containers. While scheduling refers to the specific act of loading the application service , in a more general sense, schedulers are responsible for hooking into a node’s init system(dockerd ~ docker daemon) to manage services.

Under Docker 1.12, scheduling refers to resource awareness, constraints and strategies. Resource awareness is about being aware of resources available on nodes and will place tasks/containers accordingly. Swarm Mode handles that quite effectively. As of today, the newer 1.12 ships with a spread strategy which will attempt to schedule tasks on the least loaded nodes, provided they meet the constraints and resource requirements.Under constraints, the operation team can limit the set of nodes where a task/containers can be scheduled by defining constraint expressions. Multiple constraints find nodes that satisfy every expression, i.e., an AND match. Constraints can match node attributes in the following table.

Few Important Tips:

  • The engine.labels are collected from Docker Engine with information like operating system, drivers, etc.
  • The node.labels are added by the operations team for operational purpose. For example, some nodes have security compliant labels to run tasks with compliant requirements

Below is a snippet of the constraints used under 1.12 Swarm:

node attribute matches example node’s ID == 5ivku8v2gvtg4
node.hostname node’s hostname node.hostname != node-1
node.role node’s manager or worker role node.role == manager
node.labels node’s labels added by cluster admins == high
engine.labels Docker Engine’s labels engine.labels.operatingsystem == ubuntu 16.04

Let’s take a look at Docker 1.12 labels and constraints in detail. A Label is a key-value pair which serves a wide range of uses, such as identifying the right set of node/s etc. The label is a metadata which can be attached to dockerd(docker daemon). Labels, along with the semantics of constraints can help services run on a target worker nodes. For example, payment related application services can be targeted at the nodes which are more secured, some of the database R/W operations can be limited to specific number of SSD equipped worker nodes etc.

Under 1.12, there are two types of labels – Swarm labels and Engine labels. Swarm Labels adds a security scheduling decisions on top of Engine labels. It is important to note that Engine labels can’t be trusted for security sensitive scheduling decisions,since any worker can report any label up to a manager. However, they can be useful for certain scenarios like scheduling containers on SSD specific nodes, running application services based on resources and so on.

On the other hand, Swarm labels adds an additional layer of trust as they can be explicitly defined by the operations folks. One can easily label worker nodes as “production” or “secure” to ensure that the payment related application can get scheduled on those nodes primarily and this ensures that malicious workers can be kept away.


To get started, let us setup 5 node Swarm Cluster running Docker 1.12 on Google Cloud Engine. I will be running Ubuntu 16.04 so as to show what new changes has to be made under Ubuntu/Debian specific OS to make it work. Before I start setting up Swarm cluster, let us pick up 2 nodes( node-2 and node-3)  for which we will adding labels and constraints.

Login to node3 instance and add the following lines under [Service] section:



ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS

Your file should look like as shown below:


Next, open up /etc/default/docker and add the highlighted line as shown below:


As shown above, I have added a label named “com.example.environment” with a value “production” so as to differentiate this node from the other nodes.

PLEASE NOTE : These are systemd specific changes which works great for Debian/Ubuntu specific distributions.

Once you have made those changes, follow the sequence shown below:

$sudo systemctl stop docker.service

Ensure that docked service doesn’t list up with sudo ps -aef | grep dockerd

$sudo systemctl daemon-reload

$sudo systemctl restart docker.service

To ensure that the $DOCKER_OPTS variable is rightly integrated into the docker daemon, run the below command:


As shown in the screenshot, the Labels is right attached to the dockerd daemon.

Follow the same step for node-2 before we start building the swarm cluster.

Once done, let’s setup a swarm cluster as shown below:


Setup a worker nodes, by joining all the nodes one by one. Hence we have 5-node Swarm Cluster setup ready:


It’s time to create a service which schedules the tasks or containers only on node3 and node2 based on the labels which we defined earlier. This is how the docker service command should look like:


If you notice the ‘docker service’ command(shown above), a new prefix ‘engine.labels’ has been added which is very specific to service API introduced under this new release. Once you pass this constraint with the service name specification, the scheduler will ensure that these tasks will only be run on specific set of nodes( node2 and node3).


Even though we had 5-node cluster, the master node just chose node2 and node3 based on the constraints which supplied at the Engine and Swarm labels.

Demonstrating Node Label constraints:

Let us pick up node1 and node4 to demonstrate node labels constraints. We will be using docker node update command to add labels to the nodes directly.( Please remember it doesn’t require engine level label changes).


As shown above, we added ostype=ubuntu to both the nodes individually. Now create a service with name –collabtest1 passing labels through –constraint option. You can easily verify the labels for each individual nodes using docker node inspect format as shown below:


Now if you try scaling the service to 4, it will restrict to the node1 and node4 as per the node label constraints we supplied earlier.


This brings an important point of consideration where if two containers should always run on the same host because they operate as a unit, that affinity can often be declared during the scheduling. On the other hand, if two containers should not be placed on the same host, for example to ensure high availability of two instances of the same service, this can be possible through scheduling. In my next post, I will be covering more on affinity and additional filters in terms of Swarm Mode.

What’s new in Docker 1.12.0 Load-Balancing feature?

Estimated Reading Time: 8 minutes

In the previous blog post, we deep-dived into Service Discovery aspects of Docker. A service is now a first class citizen in Docker 1.12.0 which allows replication, update of images and dynamic load-balancing. With Docker 1.12, services can be exposed on ports on all Swarm nodes and load balanced internally by Docker using either a virtual IP(VIP) based or DNS round robin(RR) based Load-Balancing method or both.


In case you are very new to Load-balancing concept, the load balancer assigns workload to a set of networked computer servers or components in such a manner that the computing resources are used in an optimal manner. A load balancer provides high availability by detecting server or component failure and re-configuring the system appropriately. Under this post, I will try to answer the following queries:


Let’s get started –

Is Load-Balancing new to Docker?

Load-balancing(LB) feature is not at all new for Docker. It was firstly introduced under Docker 1.10 release where Docker Engine implements an embedded DNS server for containers in user-defined networks.In particular, containers that are run with a network alias ( — net-alias) were resolved by this embedded DNS with the IP address of the container when the alias is used.

No doubt, DNS Round robin is extremely simple to implement and is an excellent mechanism to increment capacity in certain scenarios, provided that you take into account the default address selection bias but it possess certain limitations and issues like some applications cache the DNS host name to IP address mapping and  this causes applications to timeout when the mapping gets changed.Also, having non-zero DNS TTL value causes delay in DNS entries reflecting the latest detail. DNS based load balancing does not do proper load balancing based on the client implementation. To learn more about DNS RR which is sometimes called as poor man’s protocol, you can refer here.

What’s new in Load-balancing feature under Docker 1.12.0?

  • Docker 1.12.0 comes with built-in Load Balancing feature now.LB is designed as an integral part of Container Network Model (rightly called as CNM) and works on top of CNM constructs like network, endpoints and sandbox. Docker 1.12 comes with VIP-based Load-balancing.VIP based services use Linux IPVS load balancing to route to the backend containers
  • No more centralized Load-Balancer, it’s distributed and hence scalable. LB is plumbed into individual container. Whenever container wants to talk to another service, LB is actually embedded into the container where it happens. LB is more powerful  now and just works out of the box.


  • New Docker 1.12.0 swarm mode uses IPVS(kernel module called “ip_vs”) for load balancing. It’s a load balancing module integrated into the Linux kernel
  • Docker 1.12 introduces Routing Mesh for the first time.With IPVS routing packets inside the kernel, swarm’s routing mesh delivers high performance container-aware load-balancing.Docker Swarm Mode includes a Routing Mesh that enables multi-host networking. It allows containers on two different hosts to communicate as if they are on the same host. It does this by creating a Virtual Extensible LAN (VXLAN), designed for cloud-based networking. we will talk more on Routing Mesh at the end of this post.

Whenever you create a new service in Swarm cluster, the service gets Virtual IP(VIP) address. Whenever you try to make a request to the particular VIP, the swarm Load-balancer will distribute that request to one of the container of that specified service. Actually the built-in service discovery resolves service name to Virtual-IP. Lastly, the service VIP to container IP load-balancing is achieved using IPVS. It is important to note here that VIP is only useful within the cluster. It has no meaning outside the cluster because it is a private non-routable IP.

I have 6 node cluster running Docker 1.12.0 in Google Cloud Engine. Let’s examine the  VIP address through the below steps:

  1. Create a new overlay network:

    $docker network create –driver overlay \

     –subnet \

     –opt encrypted \




2. Let’s create a new service called collabweb which is a simple Nginx server as shown:

        $ docker service create \

       —replicas 3 \

       —name collabweb \

       —network collabnet \


3. As shown below, there are 3 nodes where 3 replicas of containers are running the service under the swarm overlay network called “collabnet”.


4. Use docker inspect command to look into the service internally as shown below:



It shows “VIP” address added to each service. There is a single command which can help us in getting the Virtual IP address as shown in the diagram below:


5. You can use nsenter utility to enter into its sandbox to check the iptables configuration:


In any iptables, usually a packets enters the Mangle Table chains first and then the NAT Table chains.Mangling refers to modifying the IP Packet whereas NAT refers to only address translation. As shown above in the mangle table, service IP gets marking of 0x10c using iptables OUTPUT chain. IPVS uses this marking and load balances it to containers, and as shown:


As shown above, you can use  ipvsadm  to set up, maintain or inspect the IP virtual server table in the Linux kernel.This tool can be installed on any of Linux machine through apt or yum based on the Linux distribution.

A typical DNS RR and IPVS LB can be differentiated as shown in the below diagram where DNS RR shows subsequent list of IP addresses when we try to access the service each time(either through curl or dig) while VIP  load balances it to containers(i.e., and



6. Let’s create  a new service called collab-box under the same network. As shown in the diagram, a new Virtual-IP ( will be automatically attached to this service as shown below:


Also, the service discovery works as expected,



IPVS (IP Virtual Server) implements transport-layer load balancing inside the Linux kernel, so called Layer-4 switching. It’s a load balancing module integrated into the linux kernel. It is based on Netfilter.It supports TCP, SCTP & UDP, v4 and v7. IPVS running on a host acts as a load balancer before a cluster of real servers, it can direct requests for TCP/UDP based services to the real servers, and makes services of the real servers to appear as a virtual service on a single IP address.

It is important to note that IPVS is not a proxy — it’s a forwarder that runs on Layer 4. IPVS forwards traffic from clients to back-ends, meaning you can load balance anything, even DNS! Modes it can use include:

  • UDP support
  • Dynamically configurable
  • 8+ balancing methods
  • Health checking

IPVS holds lots of interesting features and has been in kernel for more than 15 years. Below chart differentiate IPVS from other LB tools:


Is Routing Mesh a Load-balancer?

Routing Mesh is not Load-Balancer. It makes use of LB concepts.It provides global publish port for a given service. The routing mesh uses port based service discovery and load balancing. So to reach any service from outside the cluster you need to expose ports and reach them via the Published Port.

In simple words, if you had 3 swarm nodes, A, B and C, and a service which is running on nodes A and C and assigned node port 30000, this would be accessible via any of the 3 swarm nodes on port 30000 regardless of whether the service is running on that machine and automatically load balanced between the 2 running containers. I will talk about Routing Mesh in separate blog if time permits.

It is important to note that Docker 1.12 Engine creates “ingress” overlay network to achieve the routing mesh. Usually the frontend web service and sandbox are part of “ingress” network and take care in routing mesh.All nodes become part of “ingress” overlay network by default using the sandbox network namespace created inside each node. You can refer this link to learn more about the internals of Routing Mesh.

Is it possible to integrate an external LB to the services in the cluster.Can I use HA-proxy in Docker Swarm Mode?

You can expose the ports for services to an external load balancer. Internally, the swarm lets you specify how to distribute service containers between nodes.If you would like to use an L7 LB you either need to point them to any (or all or some) node IPs and PublishedPort. This is only if your L7 LB cannot be made part of the cluster. If the L7 LB can be made of the cluster by running the L7 LB itself as a service then they can just point to the service name itself (which will resolve to a VIP). A typical architecture would look like this:


In my next blog, I am going to elaborate on External Load balancer primarily. Keep Reading !

Demystifying Service Discovery under Docker Engine 1.12.0

Estimated Reading Time: 7 minutes

Prior to Docker 1.12 release, setting up Swarm cluster needed some sort of service discovery backend. There are multiple discovery backends available like hosted discovery service, using a static file describing the cluster, etcd, consul, zookeeper or using static list of IP address.


Thanks to Docker 1.12 Swarm Mode, we don’t have to depend upon these external tools and complex configurations. Docker Engine 1.12 runs it’s own internal DNS service to route services by name.Swarm manager nodes assign each service in the swarm a unique DNS name and load balances running containers. You can query every container running in the swarm through a DNS server embedded in the swarm.

How does it help?

When you create a service and provide a name for it, you can use just that name as a target hostname, and it’s going to be automatically resolved to the proper container IP of the service. In short, within the swarm, containers can simply reference other services via their names and the built-in DNS will be used to find the appropriate IP and port automatically. It is important to note that if the service has multiple replicas, the requests would be round-robin load-balanced. This would still work if you didn’t forward any ports when you created your docker services.


Embedded DNS is not a new concept. It was first included under Docker 1.10 release. Please note that DNS lookup for containers connected to user-defined networks works differently compared to the containers connected to default bridge network. As of Docker 1.10, the docker daemon implements an embedded DNS server which provides built-in service discovery for any container created with a valid name or net-alias or aliased by link. Moreover,container name configured using --name is used to discover a container within an user-defined docker network. The embedded DNS server maintains the mapping between the container name and its IP address (on the network the container is connected to).

How does Embedded DNS resolve unqualified names?



With Docker 1.12 release, a new API called “service” is being included which clearly talks about the functionality of service discovery.  It is important to note that Service discovery is scoped within the network. What it really means is –  If you have redis application and web client as two separate services , you combine into single application and put them into same network.If you try build your application in such a way that you are trying to reach to redis through name “redis”,it will always resolve to name “redis”. Reason – both of these services are part of the same network. You don’t need to be inside the application trying to resolve this service using FQDN. Reason – FQDN name is not going to be portable which in turn, makes your application non-portable.

Internally, there is a listener opened inside the container itself. If we try to enter into the container which is providing a service discovery and look at /etc/resolv.conf, we will find that the nameserver entry holds something really different like is nothing but a loopback address. So, whenever resolver tried to resolve, it will resolve to and this request is rightly trapped.


Once this request is trapped, it is sent to particular random UDP / TCP port currently being listened under the docker daemon. Consequently, the socket is to be created inside the namespace. When DNS server and daemon gets the request, it knows that this is coming from which specific network, hence gets aware of  the context of from where it is coming from.Once it knows the context, it can generate the appropriate DNS response.

To demonstrate Service Discovery  under Docker 1.12, I have upgraded Docker 1.12.rc5 to 1.12.0 GA version. The swarm cluster look like:


I have created a network called “collabnet” for the new services as shown below:


Let’s create a service called “wordpressdb” under collabnet network :


You can list the running tasks(containers) and the node on which these containers are running on:


Let’s create another service called “wordpressapp” under the same network:


Now, we can list out the number of services running on our swarm cluster as shown below.


I have scaled out the number of wordpressapp and wordpressdb just for demonstration purpose.

Let’s consider my master node where I have two of the containers running as shown below:


I can reach out one service(wordpressapp) from another service(wordpressapp) through just service-name as shown below:


Also, I can reach out to particular container by its name from other container running different service but on the same network. As shown below, I can reach out to wordpressapp.3.6f8bthp container via wordpressdb.7.e62jl57qqu running wordpressdb.


The below picture depicts the Service Discovery in a nutshell:


Every service has Virtual IP(VIP) associated which can be derived as shown below:


As shown above, each service has an IP address and this IP address maps to multiple container IP address associated with that service. It is important to note that service IP associated with a service does not change even though containers associated with the service dies/ restarts.

Few important points to remember:

  • VIP based services use Linux IPVS load balancing to route to the backend containers. This works only for TCP/UDP protocols. When you use DNS-RR mode services don’t have a VIP allocated. Instead service names resolves to one of the backend container IPs randomly.
  • Ping not working for VIP is as designed. Technically, IPVS is a TCP/UDP load-balancer, while ping uses ICMP and hence IPVS is not going to load-balance the ping request.
  • For VIP based services the reason ping works on the local node is because the VIP is added a 2nd IP address on the overlay network interface
  • You can any of the tools like  dig, nslookup or wget -O- <service name> to demonstrate the service discovery functionality

Below picture depicts that the network is the scope of service discoverability which means that when you have a service running on one network , it is scoped to that network and won’t be able to reach out to different service running on different network(unless it is part of that network).


Let’s dig little further introducing Load-balancing aspect too. To see what is basically enabling the load-balancing functionality, we can go into sandbox of each containers and see how it has been resolved.

Let’s pick up the two containers running on the master node. We can see the sandbox running through the following command:


Under /var/run/docker/netns, you will find various namespaces. The namespaces marked with x-{id} represents network namespace managed by the overlay network driver for its operation (such as creating a bridge, terminating vxlan tunnel, etc…). They don’t represent the container network namespace. Since it is managed by the driver, it is not recommended to manipulate anything within this namespace. But if you are curious on the deep dive, then you can use the “nsenter” tool to understand more about this internal namespace.

We can enter into sandbox through the nsenter utility:


In case you faced an error stating “nsenter: reassociate to namespace ‘ns/net’ failed: Invalid argument”, I suggest to look at this workaround. service IP is marked 0x108 using iptables OUTPUT chain. ipvs uses this marking and load balances it to containers and as shown below:


Here are key takeaways from this entire post:


In my next blog post, I am going to deep dive into Load-Balancing aspect of Swarm Mode. Thanks for reading.

Understanding Node Failure Handling under Docker 1.12 Swarm Mode

Estimated Reading Time: 5 minutes

In the last Meetup (#Docker Bangalore), there has been lots of curiosity around “Desired State Reconciliation” & “Node Management” feature in case of Docker Engine 1.12 Swarm Mode.  I found lots of queries post the presentation session on how Node Failure Handling is taken care in case of new Docker Swarm Mode , particularly when master node participating in the raft consensus goes down. Under this blog post, I will demonstrate how Master Node Failure is achieved which is very specific to RAFT consensus algorithm. We will look at how Swarmkit (the technical foundation of Swarm Mode implementation) uses the raft consensus algorithm and enables NO single point of failure feature to perform effective decision in the distributed system.

In the previous post we did a deep-dive into Swarm Mode implementation where we talked about the communication in between manager and worker nodes. Machines running SwarmKit can be grouped together in order to form a Swarm, coordinating tasks with each other. Once a machine joins, it becomes a Swarm Node. Nodes can either be worker nodes or manager nodes. Worker nodes are responsible for running Tasks while Manager nodes accept specifications from the user and are responsible for reconciling the desired state with the actual cluster state.


Manager nodes maintain a strongly consistent, replicated (Raft based) and extremely fast (in-memory reads) view of the cluster which allows them to make quick scheduling decisions while tolerating failures.Node roles (Worker or Manager) can be dynamically changed through API/CLI calls.  Say, if any of master or worker node fails, SwarmKit reschedules its tasks(which are nothing but containers) onto a different node.

A Quick Brief on Raft Consensus Algorithm

Let’s understand what raft consensus is all about. A Raft cluster contains several servers; five is a typical number, which allows the system to tolerate two failures. At any given time each server is in one of three states: leader, follower, or candidate. In normal operation there is exactly one leader and all of the other servers are followers. Followers are passive: they issue no requests on their own but simply respond to requests from leaders and candidates. The leader handles all client requests (if a client contacts a follower, the follower redirects it to the leader). The third state, candidate, is used to elect a new leader. Raft uses a heartbeat mechanism to trigger leader election. When servers start up, they begin as followers. A server remains in follower state as long as it receives valid RPCs from a leader or candidate. Leaders send periodic heartbeats to all followers in order to maintain their authority. If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader. To understand the raft implementation, I recommend reading


PLEASE NOTE that there should always be an odd number of managers (1,3,5 or 7) to reach the consensus.  If you have just two managers, with one manager down results in a situation where you can’t achieve the consensus.Reason –  greater than 50% of the managers need to “agree” to actually makes the raft consensus work.

Demonstrating Manager Node Failure

Let me demonstrate the master node failure scenario with the existing Swarm Mode cluster running on Google Cloud Engine. As shown below, I have 5 nodes forming Swarm Mode cluster installed running the experimental Docker 1.12.0-rc4 release.



The Swarm Mode cluster is already running a service which is replicated across 3 nodes – test-master1, test-node2 and test-node1 out of total 5 nodes. Let us use docker-machine(my all-time favorite) command to ssh to test-master1 and promote workers (test-node1 and test-node2) to the manager node as shown above.


Hence, the worker nodes are rightly promoted to manager node which is shown as “Reachable”.

The “$docker ps” command shows that there is a task (container) already running on the master node. Please remember that “$docker ps” has to manually run on the dedicated node to know what local containers are running on the particular node.


The below picture depicts the detailed list of the containers(or tasks) which are distributed across the swarm cluster.


Let’s bring down the manager node “test-master1” either by shutting it down uncleanly or stopping the instance through the available GCE feature.(as show below). The manager node(test-master1) is no longer reachable. If you try to ssh to test-node2 and check if the cluster is up and running, you will find that node failure has been taken care and desired state reconciliation comes into the picture. Now the 3-replicas of tasks or containers are running on test-node1, test-node2 and test-node3.



To implement raft consensus, there is a minimal recommendation of an odd number of managers (1,3,5 or 7). The maximum recommendation of manager node is 5 for better performance while increasing the manager nodes to 7 might incur performance bottleneck as there will be additional overhead in terms of communication to keep the mutual agreement in place between the managers.

Docker 1.12 Swarm Mode – Under the hood

Estimated Reading Time: 6 minutes

Today Docker Inc. released Engine 1.12 Release Candidate 4 with numerous improvements and added security features. With an optional “Swarm Mode” feature rightly integrated into core Docker Engine, a native management of a cluster of Docker Engines, orchestration, decentralized design, service and application deployment, scaling, desired state reconciliation, multi-host networking, service discovery and routing mesh implementation is just a matter of few liner commands.

In the previous posts, we introduced Swarm Mode, implemented a simple service applications and went through 1.12 networking model. Under this post, we will deep dive into Swarm Mode and study what kind of communication gets generated between master and worker nodes in the Swarm cluster.

Setting up Swarm Master Node

Let’s start setting up Swarm Mode cluster and see how underlying communication takes place. I will be using docker-machine to setup master and worker nodes on my Google Cloud Engine.

$docker-machine create -d google –google-project <project-id> –engine-url test-master1

If you have less time setting up Swarm Cluster, do refer I have forked it from here.

As you see below, Docker Hosts machines gets created through docker-machine with all the nodes running Docker Engine 1.12-rc4.

Let’s initialize the swarm mode on the first master node as shown below:

I have used one liner docker-machine command to keep it clean and simple. The docker-machine command will SSH to the master node and initialize the swarm mode.

The newly released RC4 version holds improvement in terms of security which is enabled by default. In earlier release, one has to pass –secret parameter to secure and control which worker node can join and which can’t. But going forward, the swarm mode automatically generates random secret key. This is just awesome !!!

[Under the hood] – Whenever we do “docker swarm init”, a TLS root CA (Certificate Authority) gets created as shown below.

Then a key-pair is issued for the first node and signed by root CA.

Let’s add the first worker node as shown below:

Looking at inotify output:

When further nodes joins the swarm, they are issues their own keypair, signed by the root CA, and they also receive the root CA public key and certificate. All the communication is encrypted over TLS.

The node keys and certificates are automatically renewed on regular intervals (by default 90 days) but one can tune it with docker swarm update command.

Let us spend some time understanding the master and worker architecture in detail.


Every node in Swarm Mode has a role which can be categorized as  Manager and Worker. Manager node has responsibility to actually orchestrate the cluster, perform the health-check, running containers serving the API and so on. The worker node just execute the tasks which are actually containers. It can-not decide to schedule the containers on the different machine. It can-not change the desired state. The workers only takes work and report back the status. You can enable node promotion or demotion easily through one-liner command.

Managers and Workers uses two different communication models. Managers have built-in RAFT system that allows them to share information for new leader election. At one time, only manager is actually performing the scaling and they use a leader follower model to figure out which one is supposed to be what. No External K-V store is required as built-in internal distributed state store is available.

Workers, on the other side, uses GOSSIP network protocol which is quite fast and consistent. Whenever any new container/tasks gets generated in the cluster, the gossip is going to broadcast it to all the other containers in a specific overlay network that this new container has started. Please remember that ONLY the containers which are running in the specific overlay network will be communicated and NOT globally. Gossip is optimized for heavy traffic.

Let us go one level more deeper to understand how the underlying service is created and dispatched to the worker nodes. Before creating the service, let us first create a new overlay network called mynetwork.

The inotify triggers the relevant output accordingly:

Let’s create our first service:

$sudo docker-machine ssh test-master1 ‘sudo docker service create –name collabnix –replicas 3 \
   –network mynetwork dockercloud/hello-world

Once you run the above command, 3 replicas of services gets generated and distributed across the cluster nodes.

[Under the hood] – Let’s understand what happens whenever a new service is created.


Whenever we create overlay network through “docker network create -d overlay” command, it basically goes to manager. Manager is built up of multiple pipeline stages. One of them is Allocator. Allocator takes the network creation request and choose particular pre-defined sub network that is available. Allocation purely happen in the memory and hence it goes quick. Once network is created, it’s time to connect service to that network. Say, you start with service creation, orchestrator is involved and try to generate the requisite number of tasks which is nothing but containers in real world. But the tasks needs IP address, VXLAN ids as the overlay network needs that too. The allocation happens in the manager nodes. Once allocation gets completed, tasks are created and the state is preserved in the raft store. Once allocation is done, only then the scheduler will be able to move that particular task into the assigned state which is then dispatched to one of the worker node. Manager can also be worker. Every task goes through multiple stages – New, Allocated, Assigned etc. if the task has not been moved to allocator stage, it will not be assigned to worker nodes. With the help of network control plane(gossip protocol), multiple tasks distributed across the multiple worker node is taken care and managed effectively.

I hope you liked reading this deep-dive article. In future blog post, I will try to cover deep dive session into Docker network and volume aspects. Till then, Happy Swarming !!!




Docker 1.12 Networking Model Overview

Estimated Reading Time: 8 minutes

“The Best way to orchestrate Docker is Docker”

In our previous post, we talked about Swarm Mode’s built-in orchestration and distribution engine.Docker’s new deployment API objects like Service and Node , built-in multi-host and multi-container cluster management integrated with Docker Engine, decentralized design,declarative service model, scaling and resiliency services , desired state conciliation, service discovery,  load balancing,  out-of-box security by default and rolling updates etc. just makes Docker 1.12  all-in-all automated deployment and management tool for Dockerized distributed applications and microservices at scale in production.



Under this post, we are going to deep-dive into Docker 1.12 Networking model. I have 5 node swarm cluster test environment as shown below:


If you SSH to test-master1 and check the default network layout:


Every container has an IP address on three overlay networks:

  1. Ingress
  2. docker_gwbridge
  3. user-defined overlay

Ingress Networking:

The swarm manager uses ingress load balancing to expose the services you want to make available externally to the swarm. The swarm manager can automatically assign the service a PublishedPort or you can configure a PublishedPort for the service in the 30000-32767 range.  What it actually means is Network ingress into the cluster is based on a node port model in which each service is randomly assigned a cluster-wide reserved port between 30000 and 32000 (default range). This means that every node in the cluster listens on this port and routes traffic for that service to it. This is true irrespective of whether a particular worker node is actually running the specified service.

It is important to note that only those services that has a port published (using the -p option) require the ingress network. But for those backend services which doesn’t publish ports, the corresponding containers are NOT attached to the ingress network.

External components, such as cloud load balancers, can access the service on the PublishedPort of any node in the cluster whether or not the node is currently running the task for the service. All nodes in the swarm cluster route ingress connections to a running task instance. Hence, ingress  follow a node port model in which each service has the same port on every node in the cluster.


The `default_gwbridge` network is added only for non-internal networks. Internal networks can be created with “–internal” option.. Containers connected to the multi-host network are automatically connected to the docker_gwbridge network. This network allows the containers to have external connectivity outside of their cluster, and is created on each worker node.

Docker Engine provides you flexibility to create this default_gwbridge by hand instead of letting the daemon create it automatically. In case you want docker to create a docker_gwbridge network in desired subnet, you can tweak it as shown below:

$docker network create –subnet={Your prefered subnet } -o -o docker_gwbridge.

User-defined Overlay:

This is the overlay network that the user has specified that the container should be on. In our upcoming example , we will call it mynet. A container can be on multiple user-defined overlays.


Enough with the theoretical aspects !! Let’s try hands on the networking model practically.

As shown below, I have 3 nodes swarm cluster with 1 master and 2 worker nodes.


I created a user-defined overlay through the below command:

$ sudo docker network create -d overlay mynet

I can see that the new overlay network gets listed under “swarm” scope(as we are already into swarm mode)  as shown below:


I have a service “frontier” running tasks on node1, node2 and master1 as shown below:



We can check the container running under node-1 and node-2 respectively:



Meanwhile, I added a new Node-3 and then scaled it to 10.


Now I can see that the containers are scaled across the swarm cluster.

To look into how overlay networking works, let us target the 4th Node and add it to the Swarm Cluster.


Now, the node list gets updated as shown below:


Whenever you add any node to the swarm cluster, it doesn’t automatically reflect the mynet overlay network as shown below:


The overlay network only gets reflected whenever the new task is assigned and this happens on-demand.

Let us try to scale our old service and see if node-4 network layout gets reflected with mynet network.

Earlier, we had 10 replicas running which is scaled across master1, node1, node2 and node4. Once we scaled it to 20, the swarm cluster engine will scale it across all the nodes as shown below:


Let us now check the network layout at node-4:

Hence, the overlay network gets created on-demand whenever the new task is assigned to this node.


Swarm nodes are “self-organizing and self-healing. What it means? Whenever any node or container goes crash or sudden unplanned shutdown, the scaling swarm engine attempts to correct to make things right again. Let us look into this aspect in detail:

As we see above, here is an example of nodes and running tasks:

Master-1 running 4 tasks

Node-1 running 4 tasks

Node-2 running 4 tasks

Node-3 running 4 tasks

Node-4 running  4 instances

Now let’s bring down node-4.


As soon as all the containers running on node-4 is stopped, it tries to start another 4 containers with different IDs on the same node.


So this shows self-healing aspect of Docker Swarm Engine.


Let’s try bringing down node-4 completely. As soon as you bring down the node-4, the containers running on node-4 gets started on other node automatically.



Master-1 running 5 tasks

Node-1 running 5 tasks

Node-2 running 5 tasks

Node-3 running 5 tasks

Hence, this is how it organizes 20 tasks which is now scaled across master1, node1, node2 and node3.

Global Services:

This option enables service tasks to sit on all the nodes. You can create a service with –mode-global option to enable this functionality as shown below:



There are scenerios when you sometime want to segregate workloads on your cluster where you specifically want workloads to go to only a certain set of nodes .One example I have pulled from DockerCon slide which shows SSD based constraints which can be applied as shown below:



Routing Mesh 

We have reached the last topic of this blog post and that’s not complete without talking about Routing mesh. I have pulled out the presentation demonstrated at DockerCon 2016 which clearly shows us how routing mesh works.

                                                                                                                                                                                                        ~ Source: DockerCon 2016

To understand how Routing mesh works, suppose that there is one manager node and 3 worker nodes which is serving myapp:80. Whenever an operator try to access myapp:80 on the exposed port, he probably because of an external load-balancer might happen to hit worker-2 and it sounds good because worker-2 is having 2 copies of the frontend containers and it is ready to serve it without any issue. Imagine a scenario when user access myapp:80 and gets redirected to worker-3 which currently has no copies of the containers. This is where Routing Mesh technology comes into the picture. Even though the worker-3 has no copies of this container, the docker swarm engine is going to re-route the traffic to worker-2 which has necessary copies to serve it.Your external Load-balancer don’t need to understand where is the container running. Routing mesh takes care of that automatically. In short, Container-aware routing mesh is capable of transparent rerouting the traffic from node-3 to a node that is running container(which is node-2) shown above. Internally, Docker Engine  allocates a cluster wide port and it maps that port to the containers of a service and the routing mesh will take care of routing traffic to the containers of that service by exposing a port on every node in the swarm.

In our next post, we will talk about routing mesh in detail and cover volume aspect of Swarm Mode. Till then, Happy Swarming !!!

Docker Engine 1.12 comes with built-in Distribution & Orchestration System

Estimated Reading Time: 7 minutes

Docker Engine 1.12 can be rightly called ” A Next Generation Docker Clustering & Distributed System”. Though Docker Engine 1.12 Final Release is around corner but the recent RC2 brings lots of improvements and exciting features. One of the major highlight of this release is Docker Swarm Mode which provides powerful yet optional ability to create coordinated groups of decentralized Docker Engines. Swarm Mode combines your engine in swarms of any scale. It’s self-organizing and self-healing. It enables infrastructure-agnostic topology.The newer version democratizes orchestration with out-of-box capabilities for multi-container on multi-host app deployments as shown below:

Built on Engine as a uniform building block for self organizing and healing group of Engines, Docker ensures that orchestration is accessible for every developer and operation user. The new Swarm Mode adopts  the de-centralized architecture rather than centralized one (key-value store) as seen in the earlier  Swarm releases. Swarm Mode uses the Raft consensus algorithm  to perform leader selection, and maintain the cluster’s states.

In Swarm Mode, all Docker Engine will unite into a cluster with management tier. It is basically master – slave system but all Docker Engine will be united and they will maintain a cluster state. Instead of running a single container, you will declare a desired state for your application which means multiple container and then engine themselves will maintain that state. Additionally, a new “docker service” feature has been added under the new release. The “docker service create” is expected to be an evolution of “docker run”. Docker run is imperative command and all it helps you  to get container up and running.The new “docker service create” command declare that you have to setup a server which can run one or more containers and those container will run , provided the state you declare for the service will be maintained in Engine / inside the distributed store based on raft consensus protocol.That brings the notion of desired state reconciliation. Whenever any node in the cluster goes down, the swarm itself will recognize that there has been deviation between the desired state and it will bring up  new instance to reconstruct the reconciliation. I highly recommend visualizing to understand what does it mean.

Docker Swarm Node is used  for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and much more. Let’s see what features does Docker Swarm Mode adds to Docker Cluster functionality:Pic-6

Looking at the above features, Docker Swarm mode brings the following benefits :

  • Distributed: Swarm Mode uses the Raft Consensus Algorithm in order to coordinate and does not rely on a single point of failure to perform decisions.
  • Secure: Node communication and membership within a Swarm are secure out of the box. Swarm Mode uses mutual TLS for node authentication, role authorization and transport encryption, automating both certificate issuance and rotation.
  • Simple: Swarm Mode is operationally simple and minimizes infrastructure dependencies. It does not need an external database to operate. It uses internal distributed State store.

Below picture depicts Swarm Mode cluster architecture. Fundamentally its a master and slave architecture. Every node in a swarm is Docker Host running Docker Engine. Some of the node has privilege role called Manager.The manager node participate in “raft consensus” group. As shown below, components in blue color are sharing Internal Distributed State
store of the cluster while the green colored components/boxes are worker Nodes. The worker node receive work instructions from the manager group and this is clearly shown in dash lines.Pic-3

Below picture shows how Docker Engine Swarm Mode nodes works together:


For operation team, it might be relief-tablet as there is no need of any external key-value store like etcd and consul.Docker Engine 1.12 has internal distributed state store to coordinate and hence no longer single point of failure. Additionally, Docker security is no longer an additional implementation,the secure mode is enabled by default.

Getting started with Docker Engine 1.12

Under this blog post, I will cover the following aspects:

  1. Initializing the Swarm Mode
  2. Creating the services and Tasks
  3. Scaling the Service
  4. Rolling Updates
  5. Promoting the node to Manager group

To test drive Docker Mode, I used 4 node cluster in Google Cloud Engine all running the latest stable Ubuntu 16.04 system as shown below:


Setting up docker 1.12-rc2 on all the nodes should be simple enough with the below command:

                                                      #curl -fsSL | sh

Run the below command to initialize Swarm Mode under the master node:


Let’s look at docker info command:


Listing the Docker Swarm Master node:


Let us add the first Swarm agent node(worker node) as shown below:


Let’s go back to Swarm Master Node to see the latest Swarm Mode status:


Similarly, we can add the 2nd Swarm agent node to Swarm Mode list:


Finally, we see all the nodes listed:


Let’s add 3rd Swarm Agent node in the similar fashion as shown above:


Finally, the list of worker and master nodes gets displayed as shown below:

Let’s try creating a single service:


As of now, we dont have any service created. Let’s start creating a service called collab which uses busybox image from Dockerhub and all it does is ping website.


Verifying and inspecting the services is done through the below command:




Quick Look at Scaling !!!

Task is an atomic unit of service.We actually create a task whenever we add a new service. For example, as shown below we created a task called collab.


Let’s scale this service to 5:


Now you can see that the service has been scaled to 5.


Rolling Updates Made Easy

Updating a service is pretty simple. The “docker service update” is feature rich and provides loads of options to play around with the service.


Let’s try updating redis container from 3.0.6 to 3.0.7 with 10s delay and parallelism count of 2.


Wow !!! Rolling updates just went flawless.

Time to promote the Agent Node to Manager Node

Let’s try to promote Swarm Agent Node-1 to Manager group as shown below:


In short, Swarm Mode is definitely a neat and powerful feature which provides an easy way to orchestrate Docker containers and replication of services. In our next post, we will look at how overlay networking works under Swarm Mode.