paper : https://pages.cs.wisc.edu/~akella/CS744/F17/838-CloudPapers/Kafka.pdf

Kafka is distributed publisher-subscriber messaging system that allows for high-throughput, fault-tolerant data processing. It consists of the following components:

  • Producer: The producer is responsible for sending messages to Kafka brokers. It can be a client application or a server that generates log data.

  • Broker: A Kafka broker is a node in the Kafka cluster that stores and manages log data partitions. Brokers can be horizontally scaled to increase capacity.

  • Topic: A topic is a category or feed name to which messages are published by producers. Topics can be partitioned to enable parallel processing and improved scalability.

  • Partition: A partition is a logical unit of data storage in Kafka. It consists of a sequential and immutable series of messages. A topic can be divided into one or more partitions, which can be distributed across multiple brokers for replication and fault tolerance.

  • Consumer: A consumer is an application or service that subscribes to one or more topics and reads messages from them.

  • Consumer Group: A consumer group is a set of consumers that work together to consume messages from a topic. Consumers within the same group share the same subscription and work together to ensure each message is only processed once.

  • ZooKeeper: Kafka uses ZooKeeper for cluster coordination, configuration management, and leader election.

Usage

from kafka import KafkaProducer, KafkaConsumer producer = KafkaProducer(bootstrap_servers=['localhost:9092']) # set up producer consumer = KafkaConsumer('my_topic', bootstrap_servers=['localhost:9092']) # set up consumer for i in range(10): message = 'message {}'.format(i) producer.send('my_topic', message.encode())

for message in consumer: print(message.value) # consumer reads messages from topic

Alternatives

Its closest competitor today is perhaps RabbitMQ. They are both distributed messaging systems that provide reliable messaging between applications and systems. While they have some similarities in functionality, they have quite different architectures and design philosophies.

Kafka is designed for high-throughput, low-latency streaming data processing, while RabbitMQ is designed for high availability, fault tolerance, and routing flexibility. Kafka is commonly used for use cases such as real-time analytics, event-driven architectures, and streaming data processing, while RabbitMQ is commonly used for messaging, task queues, and workflow orchestration.

So, while both systems serve similar purposes, their specific strengths and weaknesses make them better suited for different use cases. Ultimately, the choice between Kafka and RabbitMQ (or any other messaging system) should depend on the specific needs and requirements of a given project.