Top Related Projects
Kafka (and Zookeeper) in Docker
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors
Quick Overview
The wurstmeister/kafka-docker repository provides a Docker setup for Apache Kafka. It allows users to quickly deploy a Kafka environment using Docker containers, making it easier to set up and manage Kafka clusters for development, testing, and production environments.
Pros
- Easy and quick setup of Kafka environments using Docker
- Configurable and customizable through environment variables
- Supports single-node and multi-node Kafka clusters
- Actively maintained with regular updates
Cons
- May not be suitable for large-scale production deployments without additional configuration
- Limited documentation for advanced use cases
- Potential performance overhead due to containerization
- Requires familiarity with Docker for effective use and troubleshooting
Getting Started
To get started with wurstmeister/kafka-docker, follow these steps:
-
Clone the repository:
git clone https://github.com/wurstmeister/kafka-docker.git
-
Navigate to the project directory:
cd kafka-docker
-
Start a single-node Kafka cluster:
docker-compose up -d
-
To create a topic, run:
docker-compose exec kafka kafka-topics.sh --create --topic test-topic --partitions 1 --replication-factor 1 --bootstrap-server localhost:9092
-
To produce messages, use:
docker-compose exec kafka kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092
-
To consume messages, use:
docker-compose exec kafka kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092
For more advanced configurations and multi-node setups, refer to the project's documentation on GitHub.
Competitor Comparisons
Kafka (and Zookeeper) in Docker
Pros of docker-kafka
- Simpler setup with fewer configuration options, making it easier for beginners
- Maintained by Spotify, potentially benefiting from their expertise in large-scale Kafka deployments
- Includes a pre-configured Zookeeper instance, simplifying the overall setup
Cons of docker-kafka
- Less actively maintained, with fewer recent updates compared to kafka-docker
- Limited configuration options, which may not suit more complex or customized deployments
- Lacks some advanced features and optimizations present in kafka-docker
Code Comparison
kafka-docker:
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
ports:
- "2181:2181"
kafka:
build: .
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: localhost
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
docker-kafka:
version: '2'
services:
kafka:
image: spotify/kafka
ports:
- "9092:9092"
environment:
ADVERTISED_HOST: localhost
ADVERTISED_PORT: 9092
The kafka-docker example shows a more detailed configuration with separate Zookeeper and Kafka services, while docker-kafka provides a more streamlined setup with Zookeeper included in the Kafka image.
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors
Pros of fast-data-dev
- Includes a comprehensive set of Kafka ecosystem tools (e.g., Schema Registry, Kafka Connect, REST Proxy)
- Provides a web UI for easier management and monitoring
- Designed for rapid development and testing of Kafka-based applications
Cons of fast-data-dev
- Larger image size due to the inclusion of multiple components
- May be overkill for simple Kafka setups or production environments
- Less flexibility in configuring individual components compared to kafka-docker
Code Comparison
fast-data-dev:
version: '2'
services:
fast-data-dev:
image: lensesio/fast-data-dev
ports:
- "2181:2181"
- "9092:9092"
- "8081:8081"
kafka-docker:
version: '2'
services:
zookeeper:
image: wurstmeister/zookeeper
kafka:
image: wurstmeister/kafka
ports:
- "9092:9092"
environment:
KAFKA_ADVERTISED_HOST_NAME: localhost
The fast-data-dev example shows a single container with multiple ports exposed, while kafka-docker separates Zookeeper and Kafka into distinct services, offering more granular control over the setup.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
kafka-docker
Dockerfile for Apache Kafka
The image is available directly from Docker Hub
Tags and releases
All versions of the image are built from the same set of scripts with only minor variations (i.e. certain features are not supported on older versions). The version format mirrors the Kafka format, <scala version>-<kafka version>
. Initially, all images are built with the recommended version of scala documented on http://kafka.apache.org/downloads. To list all available tags:
curl -s https://registry.hub.docker.com/v2/repositories/wurstmeister/kafka/tags\?page_size\=1024 | jq -r '.results[].name' | sort -u | egrep '\d.\d{2}-.*'
Everytime the image is updated, all tags will be pushed with the latest updates. This should allow for greater consistency across tags, as well as any security updates that have been made to the base image.
Announcements
- 04-Jun-2019 - Update base image to openjdk 212 (Release notes. Please force pull to get these latest updates - including security patches etc.
Pre-Requisites
- install docker-compose https://docs.docker.com/compose/install/
- modify the
KAFKA_ADVERTISED_HOST_NAME
in docker-compose.yml to match your docker host IP (Note: Do not use localhost or 127.0.0.1 as the host ip if you want to run multiple brokers.) - if you want to customize any Kafka parameters, simply add them as environment variables in
docker-compose.yml
, e.g. in order to increase themessage.max.bytes
parameter set the environment toKAFKA_MESSAGE_MAX_BYTES: 2000000
. To turn off automatic topic creation setKAFKA_AUTO_CREATE_TOPICS_ENABLE: 'false'
- Kafka's log4j usage can be customized by adding environment variables prefixed with
LOG4J_
. These will be mapped tolog4j.properties
. For example:LOG4J_LOGGER_KAFKA_AUTHORIZER_LOGGER=DEBUG, authorizerAppender
NOTE: There are several 'gotchas' with configuring networking. If you are not sure about what the requirements are, please check out the Connectivity Guide in the Wiki
Usage
Start a cluster:
docker-compose up -d
Add more brokers:
docker-compose scale kafka=3
Destroy a cluster:
docker-compose stop
Note
The default docker-compose.yml
should be seen as a starting point. By default each broker will get a new port number and broker id on restart. Depending on your use case this might not be desirable. If you need to use specific ports and broker ids, modify the docker-compose configuration accordingly, e.g. docker-compose-single-broker.yml:
docker-compose -f docker-compose-single-broker.yml up
Broker IDs
You can configure the broker id in different ways
- explicitly, using
KAFKA_BROKER_ID
- via a command, using
BROKER_ID_COMMAND
, e.g.BROKER_ID_COMMAND: "hostname | awk -F'-' '{print $$2}'"
If you don't specify a broker id in your docker-compose file, it will automatically be generated (see https://issues.apache.org/jira/browse/KAFKA-1070. This allows scaling up and down. In this case it is recommended to use the --no-recreate
option of docker-compose to ensure that containers are not re-created and thus keep their names and ids.
Automatically create topics
If you want to have kafka-docker automatically create topics in Kafka during
creation, a KAFKA_CREATE_TOPICS
environment variable can be
added in docker-compose.yml
.
Here is an example snippet from docker-compose.yml
:
environment:
KAFKA_CREATE_TOPICS: "Topic1:1:3,Topic2:1:1:compact"
Topic 1
will have 1 partition and 3 replicas, Topic 2
will have 1 partition, 1 replica and a cleanup.policy
set to compact
. Also, see FAQ: Topic compaction does not work
If you wish to use multi-line YAML or some other delimiter between your topic definitions, override the default ,
separator by specifying the KAFKA_CREATE_TOPICS_SEPARATOR
environment variable.
For example, KAFKA_CREATE_TOPICS_SEPARATOR: "$$'\n'"
would use a newline to split the topic definitions. Syntax has to follow docker-compose escaping rules, and ANSI-C quoting.
Advertised hostname
You can configure the advertised hostname in different ways
- explicitly, using
KAFKA_ADVERTISED_HOST_NAME
- via a command, using
HOSTNAME_COMMAND
, e.g.HOSTNAME_COMMAND: "route -n | awk '/UG[ \t]/{print $$2}'"
When using commands, make sure you review the "Variable Substitution" section in https://docs.docker.com/compose/compose-file/
If KAFKA_ADVERTISED_HOST_NAME
is specified, it takes precedence over HOSTNAME_COMMAND
For AWS deployment, you can use the Metadata service to get the container host's IP:
HOSTNAME_COMMAND=wget -t3 -T2 -qO- http://169.254.169.254/latest/meta-data/local-ipv4
Reference: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
Injecting HOSTNAME_COMMAND into configuration
If you require the value of HOSTNAME_COMMAND
in any of your other KAFKA_XXX
variables, use the _{HOSTNAME_COMMAND}
string in your variable value, i.e.
KAFKA_ADVERTISED_LISTENERS=SSL://_{HOSTNAME_COMMAND}:9093,PLAINTEXT://9092
Advertised port
If the required advertised port is not static, it may be necessary to determine this programatically. This can be done with the PORT_COMMAND
environment variable.
PORT_COMMAND: "docker port $$(hostname) 9092/tcp | cut -d: -f2"
This can be then interpolated in any other KAFKA_XXX
config using the _{PORT_COMMAND}
string, i.e.
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://1.2.3.4:_{PORT_COMMAND}
Listener Configuration
It may be useful to have the Kafka Documentation open, to understand the various broker listener configuration options.
Since 0.9.0, Kafka has supported multiple listener configurations for brokers to help support different protocols and discriminate between internal and external traffic. Later versions of Kafka have deprecated advertised.host.name
and advertised.port
.
NOTE: advertised.host.name
and advertised.port
still work as expected, but should not be used if configuring the listeners.
Example
The example environment below:
HOSTNAME_COMMAND: curl http://169.254.169.254/latest/meta-data/public-hostname
KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092,OUTSIDE://_{HOSTNAME_COMMAND}:9094
KAFKA_LISTENERS: INSIDE://:9092,OUTSIDE://:9094
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
Will result in the following broker config:
advertised.listeners = OUTSIDE://ec2-xx-xx-xxx-xx.us-west-2.compute.amazonaws.com:9094,INSIDE://:9092
listeners = OUTSIDE://:9094,INSIDE://:9092
inter.broker.listener.name = INSIDE
Rules
- No listeners may share a port number.
- An advertised.listener must be present by protocol name and port number in the list of listeners.
Broker Rack
You can configure the broker rack affinity in different ways
- explicitly, using
KAFKA_BROKER_RACK
- via a command, using
RACK_COMMAND
, e.g.RACK_COMMAND: "curl http://169.254.169.254/latest/meta-data/placement/availability-zone"
In the above example the AWS metadata service is used to put the instance's availability zone in the broker.rack
property.
JMX
For monitoring purposes you may wish to configure JMX. Additional to the standard JMX parameters, problems could arise from the underlying RMI protocol used to connect
- java.rmi.server.hostname - interface to bind listening port
- com.sun.management.jmxremote.rmi.port - The port to service RMI requests
For example, to connect to a kafka running locally (assumes exposing port 1099)
KAFKA_JMX_OPTS: "-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote.rmi.port=1099"
JMX_PORT: 1099
Jconsole can now connect at jconsole 192.168.99.100:1099
Docker Swarm Mode
The listener configuration above is necessary when deploying Kafka in a Docker Swarm using an overlay network. By separating OUTSIDE and INSIDE listeners, a host can communicate with clients outside the overlay network while still benefiting from it from within the swarm.
In addition to the multiple-listener configuration, additional best practices for operating Kafka in a Docker Swarm include:
- Use "deploy: global" in a compose file to launch one and only one Kafka broker per swarm node.
- Use compose file version '3.2' (minimum Docker version 16.04) and the "long" port definition with the port in "host" mode instead of the default "ingress" load-balanced port binding. This ensures that outside requests are always routed to the correct broker. For example:
ports:
- target: 9094
published: 9094
protocol: tcp
mode: host
Older compose files using the short-version of port mapping may encounter Kafka client issues if their connection to individual brokers cannot be guaranteed.
See the included sample compose file docker-compose-swarm.yml
Release process
See the wiki for information on adding or updating versions to release to Dockerhub.
Tutorial
Top Related Projects
Kafka (and Zookeeper) in Docker
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot