Join our Discord Server
Ajeet Raina Ajeet Singh Raina is a former Docker Captain, Community Leader and Arm Ambassador. He is a founder of Collabnix blogging site and has authored more than 570+ blogs on Docker, Kubernetes and Cloud-Native Technology. He runs a community Slack of 8900+ members and discord server close to 2200+ members. You can follow him on Twitter(@ajeetsraina).

Docker 1.12 Swarm Mode – Under the hood

3 min read

Today Docker Inc. released Engine 1.12 Release Candidate 4 with numerous improvements and added security features. With an optional “Swarm Mode” feature rightly integrated into core Docker Engine, a native management of a cluster of Docker Engines, orchestration, decentralized design, service and application deployment, scaling, desired state reconciliation, multi-host networking, service discovery and routing mesh implementation is just a matter of few liner commands.

In the previous posts, we introduced Swarm Mode, implemented a simple service applications and went through 1.12 networking model. Under this post, we will deep dive into Swarm Mode and study what kind of communication gets generated between master and worker nodes in the Swarm cluster.

Setting up Swarm Master Node

Let’s start setting up Swarm Mode cluster and see how underlying communication takes place. I will be using docker-machine to setup master and worker nodes on my Google Cloud Engine.

$docker-machine create -d google –google-project <project-id> –engine-url https://test.docker.com test-master1

If you have less time setting up Swarm Cluster, do refer https://github.com/ajeetraina/google-cloud-swarm. I have forked it from here.

As you see below, Docker Hosts machines gets created through docker-machine with all the nodes running Docker Engine 1.12-rc4.

Let’s initialize the swarm mode on the first master node as shown below:

I have used one liner docker-machine command to keep it clean and simple. The docker-machine command will SSH to the master node and initialize the swarm mode.

The newly released RC4 version holds improvement in terms of security which is enabled by default. In earlier release, one has to pass –secret parameter to secure and control which worker node can join and which can’t. But going forward, the swarm mode automatically generates random secret key. This is just awesome !!!

[Under the hood] – Whenever we do “docker swarm init”, a TLS root CA (Certificate Authority) gets created as shown below.

Then a key-pair is issued for the first node and signed by root CA.

Let’s add the first worker node as shown below:

Looking at inotify output:

When further nodes joins the swarm, they are issues their own keypair, signed by the root CA, and they also receive the root CA public key and certificate. All the communication is encrypted over TLS.

The node keys and certificates are automatically renewed on regular intervals (by default 90 days) but one can tune it with docker swarm update command.

Let us spend some time understanding the master and worker architecture in detail.

 

Every node in Swarm Mode has a role which can be categorized as  Manager and Worker. Manager node has responsibility to actually orchestrate the cluster, perform the health-check, running containers serving the API and so on. The worker node just execute the tasks which are actually containers. It can-not decide to schedule the containers on the different machine. It can-not change the desired state. The workers only takes work and report back the status. You can enable node promotion or demotion easily through one-liner command.

Managers and Workers uses two different communication models. Managers have built-in RAFT system that allows them to share information for new leader election. At one time, only manager is actually performing the scaling and they use a leader follower model to figure out which one is supposed to be what. No External K-V store is required as built-in internal distributed state store is available.

Workers, on the other side, uses GOSSIP network protocol which is quite fast and consistent. Whenever any new container/tasks gets generated in the cluster, the gossip is going to broadcast it to all the other containers in a specific overlay network that this new container has started. Please remember that ONLY the containers which are running in the specific overlay network will be communicated and NOT globally. Gossip is optimized for heavy traffic.

Let us go one level more deeper to understand how the underlying service is created and dispatched to the worker nodes. Before creating the service, let us first create a new overlay network called mynetwork.

The inotify triggers the relevant output accordingly:

Let’s create our first service:

$sudo docker-machine ssh test-master1 ‘sudo docker service create –name collabnix –replicas 3 \
   –network mynetwork dockercloud/hello-world

Once you run the above command, 3 replicas of services gets generated and distributed across the cluster nodes.

[Under the hood] – Let’s understand what happens whenever a new service is created.

 

Whenever we create overlay network through “docker network create -d overlay” command, it basically goes to manager. Manager is built up of multiple pipeline stages. One of them is Allocator. Allocator takes the network creation request and choose particular pre-defined sub network that is available. Allocation purely happen in the memory and hence it goes quick. Once network is created, it’s time to connect service to that network. Say, you start with service creation, orchestrator is involved and try to generate the requisite number of tasks which is nothing but containers in real world. But the tasks needs IP address, VXLAN ids as the overlay network needs that too. The allocation happens in the manager nodes. Once allocation gets completed, tasks are created and the state is preserved in the raft store. Once allocation is done, only then the scheduler will be able to move that particular task into the assigned state which is then dispatched to one of the worker node. Manager can also be worker. Every task goes through multiple stages – New, Allocated, Assigned etc. if the task has not been moved to allocator stage, it will not be assigned to worker nodes. With the help of network control plane(gossip protocol), multiple tasks distributed across the multiple worker node is taken care and managed effectively.

I hope you liked reading this deep-dive article. In future blog post, I will try to cover deep dive session into Docker network and volume aspects. Till then, Happy Swarming !!!

 

 

 

Have Queries? Join https://launchpass.com/collabnix

Ajeet Raina Ajeet Singh Raina is a former Docker Captain, Community Leader and Arm Ambassador. He is a founder of Collabnix blogging site and has authored more than 570+ blogs on Docker, Kubernetes and Cloud-Native Technology. He runs a community Slack of 8900+ members and discord server close to 2200+ members. You can follow him on Twitter(@ajeetsraina).
Join our Discord Server
Index