Ajeet Raina Docker Captain, ARM Innovator & Docker Bangalore Community Leader.

Understanding Node Failure Handling under Docker 1.12 Swarm Mode

3 min read

In the last Meetup (#Docker Bangalore), there has been lots of curiosity around “Desired State Reconciliation” & “Node Management” feature in case of Docker Engine 1.12 Swarm Mode.  I found lots of queries post the presentation session on how Node Failure Handling is taken care in case of new Docker Swarm Mode , particularly when master node participating in the raft consensus goes down. Under this blog post, I will demonstrate how Master Node Failure is achieved which is very specific to RAFT consensus algorithm. We will look at how Swarmkit (the technical foundation of Swarm Mode implementation) uses the raft consensus algorithm and enables NO single point of failure feature to perform effective decision in the distributed system.

In the previous post we did a deep-dive into Swarm Mode implementation where we talked about the communication in between manager and worker nodes. Machines running SwarmKit can be grouped together in order to form a Swarm, coordinating tasks with each other. Once a machine joins, it becomes a Swarm Node. Nodes can either be worker nodes or manager nodes. Worker nodes are responsible for running Tasks while Manager nodes accept specifications from the user and are responsible for reconciling the desired state with the actual cluster state.

1-x1CGldynWhcl5mOPievFYA

Manager nodes maintain a strongly consistent, replicated (Raft based) and extremely fast (in-memory reads) view of the cluster which allows them to make quick scheduling decisions while tolerating failures.Node roles (Worker or Manager) can be dynamically changed through API/CLI calls.  Say, if any of master or worker node fails, SwarmKit reschedules its tasks(which are nothing but containers) onto a different node.

A Quick Brief on Raft Consensus Algorithm

Let’s understand what raft consensus is all about. A Raft cluster contains several servers; five is a typical number, which allows the system to tolerate two failures. At any given time each server is in one of three states: leader, follower, or candidate. In normal operation there is exactly one leader and all of the other servers are followers. Followers are passive: they issue no requests on their own but simply respond to requests from leaders and candidates. The leader handles all client requests (if a client contacts a follower, the follower redirects it to the leader). The third state, candidate, is used to elect a new leader. Raft uses a heartbeat mechanism to trigger leader election. When servers start up, they begin as followers. A server remains in follower state as long as it receives valid RPCs from a leader or candidate. Leaders send periodic heartbeats to all followers in order to maintain their authority. If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader. To understand the raft implementation, I recommend reading https://github.com/hashicorp/raft

Node-1

PLEASE NOTE that there should always be an odd number of managers (1,3,5 or 7) to reach the consensus.  If you have just two managers, with one manager down results in a situation where you can’t achieve the consensus.Reason –  greater than 50% of the managers need to “agree” to actually makes the raft consensus work.

Demonstrating Manager Node Failure

Let me demonstrate the master node failure scenario with the existing Swarm Mode cluster running on Google Cloud Engine. As shown below, I have 5 nodes forming Swarm Mode cluster installed running the experimental Docker 1.12.0-rc4 release.

 

snap-1

The Swarm Mode cluster is already running a service which is replicated across 3 nodes – test-master1, test-node2 and test-node1 out of total 5 nodes. Let us use docker-machine(my all-time favorite) command to ssh to test-master1 and promote workers (test-node1 and test-node2) to the manager node as shown above.

snap-2

Hence, the worker nodes are rightly promoted to manager node which is shown as “Reachable”.

The “$docker ps” command shows that there is a task (container) already running on the master node. Please remember that “$docker ps” has to manually run on the dedicated node to know what local containers are running on the particular node.

snap-3

The below picture depicts the detailed list of the containers(or tasks) which are distributed across the swarm cluster.

snap-5

Let’s bring down the manager node “test-master1” either by shutting it down uncleanly or stopping the instance through the available GCE feature.(as show below). The manager node(test-master1) is no longer reachable. If you try to ssh to test-node2 and check if the cluster is up and running, you will find that node failure has been taken care and desired state reconciliation comes into the picture. Now the 3-replicas of tasks or containers are running on test-node1, test-node2 and test-node3.

snap-6

 

To implement raft consensus, there is a minimal recommendation of an odd number of managers (1,3,5 or 7). The maximum recommendation of manager node is 5 for better performance while increasing the manager nodes to 7 might incur performance bottleneck as there will be additional overhead in terms of communication to keep the mutual agreement in place between the managers.

[powr-hit-counter id=bae9cd01_1469093484207]

Have Queries? Join https://launchpass.com/collabnix

Ajeet Raina Docker Captain, ARM Innovator & Docker Bangalore Community Leader.

48 Replies to “Understanding Node Failure Handling under Docker 1.12 Swarm Mode”

  1. Thanks for one’s marvelous posting! I seriously enjoyed reading
    it, you can be a great author. I will make sure to bookmark your blog and definitely will come back very
    soon. I want to encourage you to definitely continue your great writing, have a nice evening!

  2. Great post! I have a question in regards to joining new nodes to the cluster. When Swarm Mode is used to initialize a Swarm, tokens are generated. The tokens can be used with the `swarm join` command to add a node or manager to the cluster. Does this continue to work if the initial manager is no longer available? In other words, are the tokens still valid on the other managers?

    1. @Mitchell, Thanks for reading the blog. During the initialization of Swarm Mode, Swarm Manager is required. While we promote 2 of other nodes to Leader, then the swarm mode is going to survive even if the initial manager node goes down. Hence, the cluster is going to work even if initial manager is no longer available.Atleast there should be 1 manager and 2 Leaders to be running to handle the situation when one of the manager node goes down.

  3. This design is steller! You certainly realize how to keep a reader amused.

    In between your wit as well as your videos,
    I used to be almost moved to start my very own blog (well, almost…HaHa!) Fantastic job.
    I actually loved whatever you needed to say, and over that, the way you presented it.
    Too cool!

  4. My brother recommended I might like this web site. He was entirely right.
    This submit truly made my day. You cann’t consider simply how so
    much time I had spent for this information! Thanks!

  5. Exceptional post however , I was wanting to know if you
    could write a litte more on this subject? I’d be very thankful
    if you could elaborate a little bit more. Thanks!

  6. I cling on to listening to the rumor lecture about receiving boundless online grant applications so I have been looking around for the most excellent site to get one. Could you advise me please, where could i acquire some?

  7. I will immediately take hold of your rss as I can not to find your e-mail subscription hyperlink or newsletter service. Do you have any? Kindly let me understand in order that I may just subscribe. Thanks.

  8. Awesome blog! Do you have any hints for aspiring writers? I’m planning to start my own site soon but I’m a little lost on everything. Would you advise starting with a free platform like WordPress or go for a paid option? There are so many options out there that I’m totally overwhelmed .. Any tips? Cheers!

  9. Heya i’m for the first time here. I came across this board and I find It truly useful & it helped me out much. I hope to give something back and help others like you helped me.

  10. Woah! I’m really loving the template/theme of this blog. It’s simple, yet effective. A lot of times it’s hard to get that “perfect balance” between user friendliness and visual appearance. I must say you’ve done a excellent job with this. Also, the blog loads very fast for me on Opera. Superb Blog!

  11. Nice read, I just passed this onto a friend who was doing some research on that. And he actually bought me lunch since I found it for him smile Therefore let me rephrase that: Thank you for lunch! “Do you want my one-word secret of happiness–it’s growth–mental, financial, you name it.” by Harold S. Geneen.

  12. Hey, I think your site might be having browser compatibility issues. When I look at your website in Firefox, it looks fine but when opening in Internet Explorer, it has some overlapping. I just wanted to give you a quick heads up! Other then that, amazing blog!

  13. A formidable share, I simply given this onto a colleague who was doing a bit evaluation on this. And he in reality purchased me breakfast as a result of I found it for him.. smile. So let me reword that: Thnx for the treat! However yeah Thnkx for spending the time to discuss this, I really feel strongly about it and love reading extra on this topic. If possible, as you develop into experience, would you thoughts updating your blog with extra details? It’s extremely useful for me. Massive thumb up for this blog publish!

  14. F*ckin’ amazing things here. I’m very glad to see your article. Thanks a lot and i’m looking forward to contact you. Will you please drop me a mail?

  15. Thank you for the sensible critique. Me & my neighbor were just preparing to do a little research about this. We got a grab a book from our local library but I think I learned more from this post. I am very glad to see such wonderful info being shared freely out there.

  16. Excellent read, I just passed this onto a colleague who was doing some research on that. And he actually bought me lunch because I found it for him smile Therefore let me rephrase that: Thanks for lunch! “By nature, men are nearly alike by practice, they get to be wide apart.” by Confucius.

  17. You could certainly see your skills within the work you write. The world hopes for more passionate writers like you who are not afraid to say how they believe. All the time go after your heart.

  18. I have been exploring for a bit for any high-quality articles or blog posts in this kind of space . Exploring in Yahoo I finally stumbled upon this web site. Reading this information So i’m happy to show that I have an incredibly good uncanny feeling I discovered exactly what I needed. I most unquestionably will make sure to don’t omit this website and provides it a look on a continuing basis.

  19. Thank you for some other fantastic article. The place else could anybody get that kind of info in such an ideal means of writing? I have a presentation next week, and I’m on the look for such information.

  20. My spouse and i were very fortunate when John managed to do his web research through the ideas he discovered when using the weblog. It is now and again perplexing to just be freely giving guidelines some other people may have been trying to sell. So we fully understand we’ve got the writer to appreciate for this. All the explanations you’ve made, the simple website navigation, the relationships your site assist to foster – it’s most overwhelming, and it’s helping our son in addition to us recognize that the theme is excellent, and that is seriously important. Many thanks for the whole thing!

  21. Great post. I was checking continuously this blog and I’m impressed! Very helpful info specifically the last part 🙂 I care for such info a lot. I was seeking this certain info for a long time. Thank you and best of luck.

  22. I?¦ll immediately take hold of your rss as I can’t to find your e-mail subscription hyperlink or newsletter service. Do you have any? Kindly let me recognise so that I may just subscribe. Thanks.

  23. Nice post. I learn something more challenging on different blogs everyday. It will always be stimulating to read content from other writers and practice a little something from their store. I’d prefer to use some with the content on my blog whether you don’t mind. Natually I’ll give you a link on your web blog. Thanks for sharing.

  24. Hi there this is kind of of off topic but I was wondering if blogs use WYSIWYG editors or if you have to manually code with HTML. I’m starting a blog soon but have no coding know-how so I wanted to get advice from someone with experience. Any help would be greatly appreciated!

Leave a Reply

Your email address will not be published.