What’s new in Docker 1.12.0 Load-Balancing feature?

Estimated Reading Time: 8 minutes

In the previous blog post, we deep-dived into Service Discovery aspects of Docker. A service is now a first class citizen in Docker 1.12.0 which allows replication, update of images and dynamic load-balancing. With Docker 1.12, services can be exposed on ports on all Swarm nodes and load balanced internally by Docker using either a virtual IP(VIP) based or DNS round robin(RR) based Load-Balancing method or both.

AAEAAQAAAAAAAAa5AAAAJGFjMWI2N2VhLTcyYTQtNGUyOC04OGI0LTIxZTkxZGRhY2E3Ng

In case you are very new to Load-balancing concept, the load balancer assigns workload to a set of networked computer servers or components in such a manner that the computing resources are used in an optimal manner. A load balancer provides high availability by detecting server or component failure and re-configuring the system appropriately. Under this post, I will try to answer the following queries:

Que1

Let’s get started –

Is Load-Balancing new to Docker?

Load-balancing(LB) feature is not at all new for Docker. It was firstly introduced under Docker 1.10 release where Docker Engine implements an embedded DNS server for containers in user-defined networks.In particular, containers that are run with a network alias ( — net-alias) were resolved by this embedded DNS with the IP address of the container when the alias is used.

No doubt, DNS Round robin is extremely simple to implement and is an excellent mechanism to increment capacity in certain scenarios, provided that you take into account the default address selection bias but it possess certain limitations and issues like some applications cache the DNS host name to IP address mapping and  this causes applications to timeout when the mapping gets changed.Also, having non-zero DNS TTL value causes delay in DNS entries reflecting the latest detail. DNS based load balancing does not do proper load balancing based on the client implementation. To learn more about DNS RR which is sometimes called as poor man’s protocol, you can refer here.

What’s new in Load-balancing feature under Docker 1.12.0?

  • Docker 1.12.0 comes with built-in Load Balancing feature now.LB is designed as an integral part of Container Network Model (rightly called as CNM) and works on top of CNM constructs like network, endpoints and sandbox. Docker 1.12 comes with VIP-based Load-balancing.VIP based services use Linux IPVS load balancing to route to the backend containers
  • No more centralized Load-Balancer, it’s distributed and hence scalable. LB is plumbed into individual container. Whenever container wants to talk to another service, LB is actually embedded into the container where it happens. LB is more powerful  now and just works out of the box.

Pci321

  • New Docker 1.12.0 swarm mode uses IPVS(kernel module called “ip_vs”) for load balancing. It’s a load balancing module integrated into the Linux kernel
  • Docker 1.12 introduces Routing Mesh for the first time.With IPVS routing packets inside the kernel, swarm’s routing mesh delivers high performance container-aware load-balancing.Docker Swarm Mode includes a Routing Mesh that enables multi-host networking. It allows containers on two different hosts to communicate as if they are on the same host. It does this by creating a Virtual Extensible LAN (VXLAN), designed for cloud-based networking. we will talk more on Routing Mesh at the end of this post.

Whenever you create a new service in Swarm cluster, the service gets Virtual IP(VIP) address. Whenever you try to make a request to the particular VIP, the swarm Load-balancer will distribute that request to one of the container of that specified service. Actually the built-in service discovery resolves service name to Virtual-IP. Lastly, the service VIP to container IP load-balancing is achieved using IPVS. It is important to note here that VIP is only useful within the cluster. It has no meaning outside the cluster because it is a private non-routable IP.

I have 6 node cluster running Docker 1.12.0 in Google Cloud Engine. Let’s examine the  VIP address through the below steps:

  1. Create a new overlay network:

    $docker network create –driver overlay \

     –subnet 10.0.3.0/24 \

     –opt encrypted \

      collabnet

To1

 

2. Let’s create a new service called collabweb which is a simple Nginx server as shown:

        $ docker service create \

       —replicas 3 \

       —name collabweb \

       —network collabnet \

       nginx

3. As shown below, there are 3 nodes where 3 replicas of containers are running the service under the swarm overlay network called “collabnet”.

To3

4. Use docker inspect command to look into the service internally as shown below:

To5

 

It shows “VIP” address added to each service. There is a single command which can help us in getting the Virtual IP address as shown in the diagram below:

To9

5. You can use nsenter utility to enter into its sandbox to check the iptables configuration:

To10

In any iptables, usually a packets enters the Mangle Table chains first and then the NAT Table chains.Mangling refers to modifying the IP Packet whereas NAT refers to only address translation. As shown above in the mangle table,10.0.3.2 service IP gets marking of 0x10c using iptables OUTPUT chain. IPVS uses this marking and load balances it to containers 10.0.3.3, 10.0.3.5 and 10.0.3.6 as shown:

To11

As shown above, you can use  ipvsadm  to set up, maintain or inspect the IP virtual server table in the Linux kernel.This tool can be installed on any of Linux machine through apt or yum based on the Linux distribution.

A typical DNS RR and IPVS LB can be differentiated as shown in the below diagram where DNS RR shows subsequent list of IP addresses when we try to access the service each time(either through curl or dig) while VIP  load balances it to containers(i.e. 10.0.0.1, 10.0.0.2 and 10.0.0.3)

LB-1

 

6. Let’s create  a new service called collab-box under the same network. As shown in the diagram, a new Virtual-IP (10.0.3.4) will be automatically attached to this service as shown below:

To33

Also, the service discovery works as expected,

pc45

Why IPVS?

IPVS (IP Virtual Server) implements transport-layer load balancing inside the Linux kernel, so called Layer-4 switching. It’s a load balancing module integrated into the linux kernel. It is based on Netfilter.It supports TCP, SCTP & UDP, v4 and v7. IPVS running on a host acts as a load balancer before a cluster of real servers, it can direct requests for TCP/UDP based services to the real servers, and makes services of the real servers to appear as a virtual service on a single IP address.

It is important to note that IPVS is not a proxy — it’s a forwarder that runs on Layer 4. IPVS forwards traffic from clients to back-ends, meaning you can load balance anything, even DNS! Modes it can use include:

  • UDP support
  • Dynamically configurable
  • 8+ balancing methods
  • Health checking

IPVS holds lots of interesting features and has been in kernel for more than 15 years. Below chart differentiate IPVS from other LB tools:

LB-4

 
Is Routing Mesh a Load-balancer?

Routing Mesh is not Load-Balancer. It makes use of LB concepts.It provides global publish port for a given service. The routing mesh uses port based service discovery and load balancing. So to reach any service from outside the cluster you need to expose ports and reach them via the Published Port.

In simple words, if you had 3 swarm nodes, A, B and C, and a service which is running on nodes A and C and assigned node port 30000, this would be accessible via any of the 3 swarm nodes on port 30000 regardless of whether the service is running on that machine and automatically load balanced between the 2 running containers. I will talk about Routing Mesh in separate blog if time permits.

It is important to note that Docker 1.12 Engine creates “ingress” overlay network to achieve the routing mesh. Usually the frontend web service and sandbox are part of “ingress” network and take care in routing mesh.All nodes become part of “ingress” overlay network by default using the sandbox network namespace created inside each node. You can refer this link to learn more about the internals of Routing Mesh.

Is it possible to integrate an external LB to the services in the cluster.Can I use HA-proxy in Docker Swarm Mode?

You can expose the ports for services to an external load balancer. Internally, the swarm lets you specify how to distribute service containers between nodes.If you would like to use an L7 LB you either need to point them to any (or all or some) node IPs and PublishedPort. This is only if your L7 LB cannot be made part of the cluster. If the L7 LB can be made of the cluster by running the L7 LB itself as a service then they can just point to the service name itself (which will resolve to a VIP). A typical architecture would look like this:

LB

In my next blog, I am going to elaborate on External Load balancer primarily. Keep Reading !

Clap

  • Devin

    Really enjoying these in depth posts on Swarm Mode

    • Thanks Devin. Great to hear your feedback.

  • Pingback: What’s new in Docker 1.12.0 Load-Balancing feature? – HGW XX/7()

  • Atharva

    Thanks for such an insightful article !
    If we compare kubernetes service with Swarm service, can we say that PublishedPort corresponds to NodePort in k8 and Routing mesh is similar to kubeproxy ?

      • Yes, correct. If you dont specify –PublishedPort under Swarm Mode, it randomly selects 30,000+ ports which I think is similar to what K8 NodePort functions today. Yes, kubeproxy looks very similar to Routing Mesh.

  • Pingback: Docker Weekly Round Up | Docker Blog()

  • Pingback: Docker Weekly | Roundup | DevOps Home()

  • Vijay

    I have been reading the docker 1.12 articles over the last 2 months and also heard you speak at the meetup few times and its been always a pleasure. The depth you have gone in writing the minute details is really commendable..
    Thanks a ton.

    • Thanks for reading the post. Will keep on bringing interesting stuffs for Docker community.

  • Alex

    Very interesting article!

    Even though I have a 2 VM test swarm (1.12.1-rc1 on ubuntu 16.04), and could launch services that would be split on those 2, I couldn’t replicate the nsenter part of the article nor the ipvsadm one…

    How am I to find the “f1d…” ID used? 🙂
    Any idea why ipvsadm would show nothing?

    Interestingly, within the containers, in VIP or DNSRR mode, it will only get to the containers on the same node and I couldn’t ping containers across the overlay network. I’m probably missing something obvious…

    Cheers!

    • Thanks for your comments and feedback.

      To answer your first question, you need to first find out SandBoxID through the below command:

      $docker inspect service-name | grep -i sandbox

      This provides you with:

      SandBoxID:
      SandBoxKey:

      Now you need to run the below command:

      ls /var/run/docker/netns

      to find the sandboxes.

      You can now use nsenter –net=SandBoxID to enter into the sandbox.

      Assume that you don’t have nsenter running, then follow the below steps:

      $mkdir /var/run/netns
      $touch /var/run/netns/n
      $mount -o bind /var/run/docker/netns/SandboxID /var/run/netns/n
      $ip netns exec n bash

      Then you enter into the sandbox.

      To answer the second question, can you paste how you created this swarm cluster? By going through the steps, I can assist you further.

  • Deepak

    I am just a newbie to the docker world, not even a sysadmin guy,I am a dev but I really enjoy reading the detailed articles of yours.

    • Thanks Deepak. Will keep bringing new contents around Docker. Thanks for your time.

  • I’ll immediately take hold of your rss as I can not to find your email subscription hyperlink
    or e-newsletter service. Do you have any? Kindly permit me
    recognize in order that I may just subscribe.

    Thanks. http://bing.net

  • I created simple 2 container service :

    [root@cent04 ~]# docker service create –name w6 –publish 9006:80 –replicas 2 test_apache
    [root@cent04 ~]# docker service inspect w6
    […]
    “VirtualIPs”: [
    {
    “NetworkID”: “c6rg8tdamg6q9qw4imu2aiv00”,
    “Addr”: “10.255.0.2/16”

    seems that swarm put 10.255.0.2 IP as my VIP (it is strange a little bit)
    This IP is part of my ingress network:

    [root@cent04 ~]# docker network inspect ingress
    [
    {
    “Name”: “ingress”,
    “Id”: “c6rg8tdamg6q9qw4imu2aiv00”,
    “Scope”: “swarm”,
    “Driver”: “overlay”,
    “EnableIPv6”: false,
    “IPAM”: {
    “Driver”: “default”,
    “Options”: null,
    “Config”: [
    {
    “Subnet”: “10.255.0.0/16”,
    “Gateway”: “10.255.0.1”
    }
    ]
    },
    “Internal”: false,
    “Containers”: {
    “ingress-sbox”: {
    “Name”: “ingress-endpoint”,
    “EndpointID”: “2ff377de27b0a588ab2559d54f67c37a2bef3f681fecb6fed6bbf1fde2542448”,
    “MacAddress”: “02:42:0a:ff:00:05”,
    “IPv4Address”: “10.255.0.5/16”,
    “IPv6Address”: “”
    }
    },
    “Options”: {
    “com.docker.network.driver.overlay.vxlanid_list”: “256”
    },
    “Labels”: {}
    }
    ]

    my iptables (on all 4 nodes) show that requests to port 9006 goes to IP – 172.18.0.2:

    Chain DOCKER-INGRESS (2 references)
    target prot opt source destination
    DNAT tcp — anywhere anywhere tcp dpt:9006 to:172.18.0.2:9006

    this address belongs to docker_gwbridge network (as “ingress-sbox”):

    [root@cent05 ~]# docker network inspect docker_gwbridge
    [
    {
    “Name”: “docker_gwbridge”,
    “Id”: “e845ff8a82a279c0d49f4a3a56651ca1f5bc9095aeb83947eaf719b74321329a”,
    “Scope”: “local”,
    “Driver”: “bridge”,
    “EnableIPv6”: false,
    “IPAM”: {
    “Driver”: “default”,
    “Options”: null,
    “Config”: [
    {
    “Subnet”: “172.18.0.0/16”,
    “Gateway”: “172.18.0.1”
    }
    ]
    },
    “Internal”: false,
    “Containers”: {
    “ingress-sbox”: {
    “Name”: “gateway_ingress-sbox”,
    “EndpointID”: “8e3689d5306c06ea7ab227e6febad281723993e29b496f6f862a6ed7c3a84662”,
    “MacAddress”: “02:42:ac:12:00:02”,
    “IPv4Address”: “172.18.0.2/16”,
    “IPv6Address”: “”
    }
    },
    “Options”: {
    “com.docker.network.bridge.enable_icc”: “false”,
    “com.docker.network.bridge.enable_ip_masquerade”: “true”,
    “com.docker.network.bridge.name”: “docker_gwbridge”
    },
    “Labels”: {}
    }
    ]

    When I create my own network and 2nd service with this network:

    [root@cent04 ~]# docker network create –driver overlay –subnet 192.168.100.0/24 my_net
    [root@cent04 ~]# docker service create –name w7 –publish 9007:80 –network my_net –replicas 2 test_apache

    I get 2 VIPs for this service – one from my network and one from ingress network:

    [root@cent04 ~]# docker service inspect w7
    […]
    “VirtualIPs”: [
    {
    “NetworkID”: “c6rg8tdamg6q9qw4imu2aiv00”,
    “Addr”: “10.255.0.9/16”
    },
    {
    “NetworkID”: “90bfajtxg5s045o6r3am2e0yf”,
    “Addr”: “192.168.100.2/24”
    }

    iptables -t nat -L shows that port 9007 maps again to IP=172.18.0.2

    curl to this IP works and LoadBalancing works:

    [root@cent04 ~]# curl 172.18.0.2:9007
    904b9e31cdee
    [root@cent04 ~]# curl 172.18.0.2:9007
    c78922494779

    But on ingress VIP doesnt work:
    [root@cent04 ~]# curl 10.255.0.9:9007
    curl: (7) Failed to connect to 10.255.0.9: Network is unreachable

    and on my_net VIP doesnt work too:
    [root@cent04 ~]# curl 192.168.100.2:9007
    curl: (7) Failed to connect to 192.168.100.2: Network is unreachable

    seems that my LAB works completely different than Yours 🙂

    4 x HOST=Centos 7.2
    docker = 1.12.1-rc2

  • Thanks,
    great Doc, I had a problem for understanding LB under Docker , now i have large idea for deploy it in my infra

    • Good to see that you are finding this blog helpful. I will maintain the quality and come up with more interesting stuff.

  • Perfect work you have performed, this website is actually cool with reliable information.

  • I have setup swarm in multi host environment . The swarm internal DNS just registers the entries for service containers but without health check .

    Is there any way to make swarm DNS remove entries for service containers which are stopped / unhealthy ?

    I am using nginx in front of a python app.

  • Excellent post, you have stated some great points , I also think this a very superb
    web site.

  • Your writing talent is absolutely appreciated!! Thank you.
    You saved me a lot of frustration. Awesome post!

  • Thanks for this great article. You state in the article that this LB service is only for internal cluster consumption.
    How do you deploy a LB mechanism to access the nginx service from a public point of view?

  • Thanks for finally writing about >Whats new in Docker 1.12.0
    Load-Balancing feature? – Collabnix <Loved it!

  • stallapp

    Thanks for the post. Here I have a question.

    All requests to a service deployed in swarm goes to load balancer of that service i.e Virtual IP of the load balancer (VIP) and you said VIP is only useful inside the cluster, no use of it outside.

    What does this mean?. To which IP client should request for service? could you please clarify it?

    Thanks