How to Develop Event-Driven Applications with Kafka and Docker

Table of Contents

Event-driven architectures have become increasingly popular with the rise of microservices. These architectures are built around the concept of reacting to events in real-time, making them highly scalable and ideal for modern distributed systems. Apache Kafka, a distributed event streaming platform, is often at the heart of these architectures, serving as an excellent tool for managing event-driven applications.

While Kafka is powerful, setting up and deploying your own Kafka instance for development can be challenging. This is where Docker comes in. Docker simplifies the process by containerizing Kafka components, making the setup portable and consistent across environments. Docker's containerization allows you to package Kafka along with its dependencies, ensuring that it runs the same way regardless of where it's deployed.

This tutorial will guide you through setting up Kafka with Docker to create an event-driven application. By the end, you will have a functional system where events are produced, consumed, and processed in real-time.

Prerequisites

Before you begin:

Docker and Docker Compose Installed: Install Docker and Docker Compose on your system. Refer to the official Docker installation guide for instructions.
Minimum System Requirements: Ensure you have at least 4 GB RAM and 2 CPUs for smooth operation.

Step 1: Setting Up the Dockerized Kafka Environment

With the introduction of Kafka 3.3, the deployment of Kafka has been greatly simplified by no longer requiring Zookeeper, thanks to KRaft (Kafka Raft). We'll use this simplified setup with Docker.

Create a working directory for your project:

mkdir kafka-docker-tutorial
cd kafka-docker-tutorial

Create a compose.yaml file in this directory with the following content:

services:
  kafka:
    image: apache/kafka
    ports:
      - "9092:9092"
    environment:
      # Configure listeners for both docker and host communication
      KAFKA_LISTENERS: CONTROLLER://localhost:9091,HOST://0.0.0.0:9092,DOCKER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: HOST://localhost:9092,DOCKER://kafka:9093
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,DOCKER:PLAINTEXT,HOST:PLAINTEXT

      # Settings required for KRaft mode
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@localhost:9091

      # Listener to use for broker-to-broker communication
      KAFKA_INTER_BROKER_LISTENER_NAME: DOCKER

      # Required for a single node cluster
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

Start the Kafka environment:
```
docker compose up -d
```
Verify that Kafka is running:
```
docker ps
```
You should see the apache/kafka service listed.

To verify the cluster is up and running and get its cluster ID, run:

docker exec -ti kafka /opt/kafka/bin/kafka-cluster.sh cluster-id --bootstrap-server :9092

This should produce output similar to:

Cluster ID: 5L6g3nShT-eMCtK--X86sw

This setup uses the latest apache/kafka image, which includes helpful scripts to manage and work with Kafka. It sets up a single-node Kafka cluster using KRaft mode, eliminating the need for Zookeeper.

The configuration allows for connections from both the host machine and other Docker containers, making it versatile for different development scenarios. The HOST listener (port 9092) is for host connections, while the DOCKER listener (port 9093) is for connections from other containers.

Docker Compose simplifies the process of defining and running this multi-container Docker application, making it easy to start, stop, and manage your Kafka environment.

Step 2: Create a Kafka Topic

Topics in Kafka act as channels where events are sent and received. Let's create a topic for our application.

The apache/kafka image includes helpful scripts in the /opt/kafka/bin directory. We can use these to create and manage topics. To create a topic named demo, run the following command:
```
docker exec -ti kafka /opt/kafka/bin/kafka-console-producer.sh --bootstrap-server :9092 --topic demo
```
This command will start a console producer. You can enter messages, one per line. For example:
```
First message
Second message
```
Press Enter to send the last message and then press Ctrl+C when you're done. These messages will be published to Kafka, creating the topic if it doesn't already exist.
To verify the topic creation and consume the messages we just published, use the following command:
```
docker exec -ti kafka /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server :9092 --topic demo --from-beginning
```
You should see the messages you published:
```
First message
Second message
```
Press Ctrl+C to stop consuming messages.
If you want to list all topics, you can use the following command:
```
docker exec -ti kafka /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server :9092
```
The demo topic should be listed among any other topics that exist.

This approach leverages the scripts provided by the apache/kafka image, making it easier to interact with Kafka directly from your host machine. It also demonstrates how to both produce and consume messages, which is crucial for understanding how Kafka topics work.

Note: In this setup, we're using a single-node Kafka cluster with KRaft mode, so we don't need to specify partitions or replication factors explicitly. Kafka will use default values appropriate for a single-node setup.

Step 3: Write a Kafka Producer

The producer sends events to the Kafka topic. We'll create a Node.js script to simulate event production, as the provided knowledge sources use JavaScript examples.

First, make sure you have Node.js installed on your host machine.
Create a new directory for your application and navigate into it:
```
mkdir kafka-app
cd kafka-app
```
Initialize a new Node.js project and install the required dependencies:
```
npm init -y
npm install kafkajs
```

Create a file named producer.js and add the following code:

const { Kafka } = require('kafkajs')

const kafka = new Kafka({
  clientId: 'my-app',
  brokers: ['localhost:9092']
})

const producer = kafka.producer()

const run = async () => {
  await producer.connect()

  const events = [
    { event_id: 1, description: "Event One" },
    { event_id: 2, description: "Event Two" },
    { event_id: 3, description: "Event Three" }
  ]

  for (const event of events) {
    await producer.send({
      topic: 'demo',
      messages: [
        { value: JSON.stringify(event) },
      ],
    })
    console.log(`Produced: ${JSON.stringify(event)}`)
    await new Promise(resolve => setTimeout(resolve, 1000))
  }

  await producer.disconnect()
}

run().catch(console.error)

This script creates a Kafka producer, connects to the Kafka broker we set up in Docker, and sends three events to the 'demo' topic we created earlier.

Run the producer script:
```
node producer.js
```
You should see logs indicating that events are being sent to the demo topic.
To verify that the messages were published into the cluster, you can use the Kafka console consumer we used earlier:
```
docker exec -ti kafka /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server :9092 --topic demo --from-beginning
```
You should see the messages you just produced.

This approach uses Node.js and the KafkaJS library, which are more commonly used with Docker and Kafka in the provided examples. The producer connects to the Kafka broker we set up in Docker using the host machine's localhost, as we exposed port 9092 in our Docker setup.

Note: top the consumer (Ctrl+C) when you're done viewing the messages.

Step 4: Write a Kafka Consumer

Consumers read events from a Kafka topic. We'll create a Node.js script for this, consistent with our producer setup.

In the same kafka-app directory where you created the producer, create a new file named consumer.js with the following content:

const { Kafka } = require('kafkajs')

const kafka = new Kafka({
  clientId: 'my-app',
  brokers: ['localhost:9092']
})

const consumer = kafka.consumer({ groupId: 'demo' })

const run = async () => {
  await consumer.connect()
  await consumer.subscribe({ topic: 'demo', fromBeginning: true })

  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      console.log({
        value: JSON.parse(message.value.toString()),
      })
    },
  })
}

run().catch(console.error)

This script creates a Kafka consumer that connects to our Docker-based Kafka broker, subscribes to the 'demo' topic, and logs each message it receives.

Run the consumer script:
```
node consumer.js
```
The consumer will output the events it reads from the demo topic. Keep this running in a separate terminal window.
To test the consumer, you can run the producer script again in another terminal:
```
node producer.js
```
You should see the consumer logging the messages produced by the producer.

The consumer connects to the Kafka broker we set up in Docker using the host machine's localhost, as we exposed port 9092 in our Docker setup. It subscribes to the 'demo' topic and will receive messages from the beginning of the topic (fromBeginning: true), which is equivalent to setting auto_offset_reset='earliest' in the original Python script.

Note: Kafka message processing is asynchronous. If you don't see messages immediately, it doesn't necessarily mean there's an error. You might need to wait a short while for the consumer to process messages, especially if there are many messages in the topic.

Step 5: Integrate Event Processing

Now that we have a producer and consumer, let's add basic processing logic to simulate handling these events in our Node.js application.

Update the consumer.js script:

const { Kafka } = require('kafkajs')

const kafka = new Kafka({
  clientId: 'my-app',
  brokers: ['localhost:9092']
})

const consumer = kafka.consumer({ groupId: 'demo' })

function processEvent(event) {
  console.log(`Processing Event ID ${event.event_id}: ${event.description}`)
  // Add your event processing logic here
}

const run = async () => {
  await consumer.connect()
  await consumer.subscribe({ topic: 'demo', fromBeginning: true })

  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      const event = JSON.parse(message.value.toString())
      processEvent(event)
    },
  })
}

run().catch(console.error)

This script now includes a processEvent function that simulates processing each event. You can add more complex logic to this function as needed for your application.

Run the updated consumer:
```
node consumer.js
```
Processed events will now display their descriptions in real-time.
To test the event processing, run the producer in another terminal:
```
node producer.js
```
You should see the consumer logging the processed events.

Step 6: Scale Consumers

Kafka allows you to scale consumers for better performance. However, the approach to scaling consumers depends on your specific setup and requirements. Here's how you can scale consumers in our Node.js application:

First, modify your consumer.js to include a unique identifier for each consumer instance:

const { Kafka } = require('kafkajs')
const crypto = require('crypto')

const kafka = new Kafka({
  clientId: 'my-app',
  brokers: ['localhost:9092']
})

const consumerId = crypto.randomBytes(4).toString('hex')
const consumer = kafka.consumer({ groupId: 'demo' })

function processEvent(event) {
  console.log(`Consumer ${consumerId} processing Event ID ${event.event_id}: ${event.description}`)
}

const run = async () => {
  await consumer.connect()
  await consumer.subscribe({ topic: 'demo', fromBeginning: true })

  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      const event = JSON.parse(message.value.toString())
      processEvent(event)
    },
  })
}

run().catch(console.error)

To scale consumers, simply run multiple instances of this script. You can do this by opening multiple terminal windows and running the consumer script in each:
```
node consumer.js
```
Each instance will have a unique consumerId, allowing you to track which consumer is processing each event.
If you're using Docker Compose, you can scale services using the --scale option. However, this requires modifying your Compose file to include a consumer service. Add the following to your compose.yaml:
```
services:
  # ... existing kafka service ...

  consumer:
    build: .
    command: node consumer.js
    depends_on:
      - kafka
```
Then you can use the --scale option to create multiple consumer instances:
```
docker compose up --scale consumer=3 -d
```
This will create three instances of your consumer service.

Step 7: Clean Up Resources

After testing your event-driven application, clean up the Docker environment to free resources.

Stop the containers:
```
docker-compose down
```
Remove unused Docker volumes and networks:
```
docker system prune -a
```

Conclusion

You have successfully developed an event-driven application using Kafka and Docker. This setup allows you to produce, consume, and process events efficiently in a distributed system. Expand the application further by adding features like error handling, logging, or deploying a scalable Kafka cluster in a production environment. For more advanced use cases, refer to the official Kafka documentation.