How to Deploy Apache Kafka on AWS Platform using Docker Swarm Mode?

Table of Contents

I am thrilled and excited to start a new open source project called “Pico”. Pico is a beta project which is targeted at object detection and analytics using Apache Kafka, Docker, Raspberry Pi & AWS Rekognition Service. The whole idea of Pico project is to simplify object detection and analytics process using few bunch of Docker containers. A cluster of Raspberry Pi nodes installed at various location points are coupled with camera modules and sensors with motion detection activated on them. Docker containers running on these Raspberry Pis are able to convert these nodes into CCTV camera. After producing images of all these cameras, the real-time data are then consumed on any of the five containers because of the replication factor of Kafka. The camera captured video streams and processed by Apache Kafka. The data is consumed inside a different container which runs on all of these nodes. AWS Rekognition analyses the real time video data & searches object on screen against a collection of objects.

In my past blog, I already demonstrated how to convert Raspberry Pi into CCTV camera using Docker container. The next target was to understand what is Apache Kafka and how to implement it on AWS Cloud so that the real time data can be sent to AWS Rekognition Service. I spent considerable amount of time understanding the basics of Apache Kafka before I jump directly into Docker Compose to containerize the various services which falls under this piece of software stack.

Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation. It is written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

Apache Kafka is a distributed, partitioned, and replicated publish-subscribe messaging system that is used to send high volumes of data, in the form of messages, from one point to another. It replicates these messages across a cluster of servers in order to prevent data loss and allows both online and offline message consumption. This in turn shows the fault-tolerant behaviour of Kafka in the presence of machine failures that also supports low latency message delivery. In a broader sense, Kafka is considered as a unified platform which guarantees zero data loss and handles real-time data feeds.

Architecture of Apache Kafka

The overall architecture of Kafka is shown below.

It is composed of three server machines which together act as a cluster computing platform. In a typical Kafka cluster, each server is configured to behave as a single broker system that shows the persistence and replication of message data. In other words, we can say that there is more than one broker in a typical Kafka cluster. Essentially, broker is the key component of Kafka cluster which is basically responsible for maintaining published data. Each broker instance can easily handle thousands of reads and writes per topic, as they have a stateless behavior.

At a basic level, Kafka broker uses topics to handle message data. The topic is first created and then divided into multiple partitions in order to balance load. The above diagram illustrates the basic concept of topic which is divided into three partitions. Each partition has multiple offsets in which messages are stored. As an example, suppose that the topic has a replication factor of value ‘3’, then Kafka will create three identical replicas of each partition regarding the topic and distribute them across the cluster. In order to balance load and maintaining data replication, each broker stores one or more partition replicas. Suppose that there are N brokers and N number of partitions then each broker will store one partition.

What’s the role of Zookeeper?

Kafka uses Zookeeper to maintain cluster state. Zookeeper is a synchronization and coordination service for managing Kafka brokers and its main functionality is to perform leader election across multiple broker instances. Under zookeeper, one server acts as a leader and the other two servers act as followers. Leader node handles all reads and writes per partition. Follower node just follows the instructions given by the leader node. If the leader fails, then the follower node will be automatically appointed as a new leader.

Benefits of Apache Kafka

Kafka has the following benefits.

Durability: Kafka allows messages to persist on the disk in order to prevent data loss. It uses distributed commit log for replicating messages across the cluster, and thus making it a durable system.
Scalability: It can easily be expanded without any downtime. Since a single Kafka cluster is acting as a central backbone for handling the large organization, we can elastically spread it to multiple clusters.
Reliability: It is reliable over time, as it is considered as a distributed, repli- cated, and fault tolerant messaging system.
Efficiency: Kafka publishes and subscribes messages efficiently which shows high system throughput. It can store terabytes of messages without any performance impact.

Under this blog post, I will showcase how to implement Apache Kafka on 2 Node Docker Swarm Cluster running on AWS via Docker Desktop.

Pre-requisites:

Docker Desktop for Mac
AWS Account ( You will require t2.medium instances for this)
AWS CLI installed

Adding Your Credentials:

[Captains-Bay]? >  cat ~/.aws/credentials
[default]
aws_access_key_id = XXXA 
aws_secret_access_key = XX

Verifying AWS Version

[Captains-Bay]? >  aws --version
aws-cli/1.11.107 Python/2.7.10 Darwin/17.7.0 botocore/1.5.70

Setting up Environmental Variable

[Captains-Bay]? >  export VPC=vpc-ae59f0d6
[Captains-Bay]? >  export REGION=us-west-2a
[Captains-Bay]? >  export SUBNET=subnet-827651c9
[Captains-Bay]? >  export ZONE=a
[Captains-Bay]? >  export REGION=us-west-2

Building up First Node using Docker Machine

[Captains-Bay]? >  docker-machine create  --driver amazonec2  --amazonec2-access-key=${ACCESS_KEY_ID}  --amazonec2-secret-key=${SECRET_ACCESS_KEY} --amazonec2-region=us-west-2 --amazonec2-vpc-id=vpc-ae59f0d6 --amazonec2-ami=ami-78a22900 --amazonec2-open-port 2377 --amazonec2-open-port 7946 --amazonec2-open-port 4789 --amazonec2-open-port 7946/udp --amazonec2-open-port 4789/udp --amazonec2-open-port 8080 --amazonec2-open-port 443 --amazonec2-open-port 80 --amazonec2-subnet-id=subnet-72dbdb1a --amazonec2-instance-type=t2.micro kafka-swarm-node1

Listing out the Nodes

[Captains-Bay]? >  docker-machine ls
NAME                ACTIVE   DRIVER      STATE     URL                         SWARM   DOCKER     ERRORS
kafka-swarm-node1   -        amazonec2   Running   tcp://35.161.106.158:2376           v18.09.6   
kafka-swarm-node2   -        amazonec2   Running   tcp://54.201.99.75:2376             v18.09.6

Initialiating Docker Swarm Manager Node

ubuntu@kafka-swarm-node1:~$ sudo docker swarm init --advertise-addr 172.31.53.71 --listen-addr 172.31.53.71:2377
Swarm initialized: current node (yui9wqfu7b12hwt4ig4ribpyq) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-xxxxxmr075to2v3k-decb975h5g5da7xxxx 172.31.53.71:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

Adding Worker Node

ubuntu@kafka-swarm-node2:~$ sudo docker swarm join --token SWMTKN-1-2xjkynhin0n2zl7xxxk-decb975h5g5daxxxxxxxxn 172.31.53.71:2377
This node joined a swarm as a worker.

Verifying 2-Node Docker Swarm Mode Cluster

ubuntu@kafka-swarm-node1:~$ sudo docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
yui9wqfu7b12hwt4ig4ribpyq *   kafka-swarm-node1   Ready               Active              Leader              18.09.6
vb235xtkejim1hjdnji5luuxh     kafka-swarm-node2   Ready               Active                                  18.09.6

Installing Docker Compose

curl -L https://github.com/docker/compose/releases/download/1.25.0-rc1/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   617    0   617    0     0   2212      0 --:--:-- --:--:-- --:--:--  2211
100 15.5M  100 15.5M    0     0  8693k      0  0:00:01  0:00:01 --:--:-- 20.1M
root@kafka-swarm-node1:/home/ubuntu/dockerlabs/solution/kafka-swarm# chmod +x /usr/local/bin/docker-compose
root@kafka-swarm-node1:/home/ubuntu/dockerlabs/solution/kafka-

ubuntu@kafka-swarm-node1:~/dockerlabs/solution/kafka-swarm$ sudo docker-compose version
docker-compose version 1.25.0-rc1, build 8552e8e2
docker-py version: 4.0.1
CPython version: 3.7.3
OpenSSL version: OpenSSL 1.1.0j  20 Nov 2018

Building up Kafka Application

git clone https://github.com/collabnix/dockerlabs
cd dockerlabs/solution/kafka-swarm

ubuntu@kafka-swarm-node1:~/dockerlabs/solution/kafka-swarm$ sudo docker stack deploy -c docker-compose.yml mykafka
Creating network mykafka_default
Creating service mykafka_zkui
Creating service mykafka_broker
Creating service mykafka_manager
Creating service mykafka_producer
Creating service mykafka_zookeeper
ubuntu@kafka-swarm-node1:~/dockerlabs/solution/kafka-swarm$

Verifying Apache Kafka Stack

ubuntu@kafka-swarm-node1:~/dockerlabs/solution/kafka-swarm$ sudo docker stack lsNAME                SERVICES            ORCHESTRATOR
mykafka             5                   Swarm

Verifying Apache Kafka Services

ubuntu@kafka-swarm-node1:~/dockerlabs/solution/kafka-swarm$ sudo docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                                                                                     PORTS
t04p6i8zky4z        mykafka_broker      replicated          0/3                 qnib/plain-kafka:2018-04-25_1.1.0                                                         *:9092->9092/tcp
r5f0x9clnwix        mykafka_manager     replicated          1/1                 qnib/plain-kafka-manager:2018-04-25                                                       *:9000->9000/tcp
jzwvrt4df66b        mykafka_producer    replicated          3/3                 qnib/golang-kafka-producer:2018-05-01.12                                                  
09lkbevsktt9        mykafka_zkui        replicated          1/1                 qnib/plain-zkui@sha256:30c4aa1236ee90e4274a9059a5fa87de2ee778d9bfa3cb48c4c9aafe7cfa1a13   *:9090->9090/tcp
b1hqfk1vc4lu        mykafka_zookeeper   replicated          1/1                 qnib/plain-zookeeper:2018-04-25                                                           *:2181->2181/tcp
ubuntu@kafka-swarm-node1:~/dockerlabs/solution/kafka-swarm$

In my next blog post, I will talk about Docker compose file which automates the overall Pico project right from video streaming captured from Raspberry Pi to object detection and analytics via AWS Rekognition Service. Stay tuned !

Credits: