RabbitMQ vs Kafka: Which Platform Should You Choose in 2023?

Have you ever found yourself standing at a crossroads, trying to decide between RabbitMQ vs Kafka for your Microservices-based system? Have you ever wondered which of these messaging platforms is most suitable for your use case?

RabbitMQ and Apache Kafka are well-known solutions in the asynchronous messaging domain, but despite popular belief, they aren’t one-size-fits-all solutions. As a software architect, I have seen firsthand how the wrong decision can break what would otherwise be a solid software architecture.

Dive deeper with me as I demystify the basic patterns of asynchronous messaging. I’ll unveil the unique structures of Apache Kafka and RabbitMQ, guiding you through their intricate designs and mechanisms.

Keep reading to discover the key differences, the pros and cons, and get expert advice on how to make the right choice for your unique scenario. Your perfect match might be just a few paragraphs away!

Key Takeaways

RabbitMQ is a message broker, while Kafka is a distributed streaming platform.
Kafka excels in real-time data streaming and guarantees in-order message processing within topic partitions.
RabbitMQ provides advanced message routing based on subscriber rules and supports features like Message TTL.
Kafka lacks built-in filtering capabilities and TTL mechanisms but scales well through partitioning.
The choice between RabbitMQ and Kafka depends on specific requirements and the desired use case.

What Are Asynchronous Messaging Patterns?

Asynchronous messaging is a messaging scheme where message production by a producer is decoupled from its processing by a consumer. When dealing with messaging systems, we typically identify two main messaging patterns – Message Queuing and Publish/Subscribe.

Message Queueing

In the Message Queuing communication pattern, queues temporally decouple producers from consumers. Queues also allow us to scale producers and consumers independently and provide a degree of fault tolerance against processing errors.

Multiple producers can send messages to the same queue; however, when a consumer processes a message, it is locked or removed from the queue and is no longer available. Only a single consumer consumes a specific message.

Message Queuing

If the consumer fails to process a specific message, the messaging platform typically returns the message to the queue, making it available to other consumers.

Publish/Subscribe

In the Publish/Subscribe (or Pub/Sub) communication pattern, multiple subscribers can receive and process a single message concurrently.

Publish/Subscribe

For example, this pattern allows a publisher to notify all subscribers that something has happened in the system.

Many queueing platforms often associate pub/sub with the term topics. In RabbitMQ, topics are a specific type of pub/sub implementation (a type of exchange, to be exact), but during this post, I refer to topics as a representation of pub/sub as a whole.

Generically speaking, there are two types of subscriptions:

An ephemeral subscription, where the subscription is only active as long the consumer is up and running. Once the consumer shuts down, its subscription and yet-to-be-processed messages are lost.
A durable subscription, where the subscription is maintained as long as it is not explicitly deleted. When the consumer shuts down, the messaging platform maintains the subscription, and message processing can be resumed later.

What is RabbitMQ?

RabbitMQ is an open-source Message Broker and is also often referred to as a Service Bus. It natively supports both messaging patterns described above.

Other popular implementations of message brokers include ActiveMQ, ZeroMQ, Azure Service Bus, and Amazon Simple Queue Service (SQS). These implementations have a lot in common; many concepts described in this post apply to most of them.

Queues

RabbitMQ supports classic Message Queuing out of the box. A developer defines named queues, and then publishers can send messages to that named queue. Consumers, in turn, use the same queue to retrieve messages to process them.

It’s worth mentioning that under the hood, this simple setup is emulated by creating a “Direct Exchange,” which filters queue delivery based on a simple routing rule. You can read more on Message Exchanges in the following section.

Message Exchanges

RabbitMQ implements Pub/Sub via the use of message exchanges. A publisher publishes its messages to a message exchange without knowing who are the subscribers of these messages.

Each consumer wishing to subscribe to an exchange creates its queue; the message exchange then queues produced messages for consumers to consume. It can also filter messages for some subscribers based on various routing rules.

RabbitMQ Message Exchange

It is important to note that RabbitMQ supports both ephemeral and durable subscriptions. A consumer can decide the type of subscription they’d like to employ via RabbitMQ’s API.

Due to RabbitMQ’s architecture, we can also create a hybrid approach where some subscribers form consumer groups who work together processing messages in the form of competing consumers over a specific queue. In this manner, we implement the pub/sub pattern while allowing some subscribers to scale up to handle received messages.

Pub/Sub & Queuing Combined

What is Apache Kafka?

Apache Kafka is not an implementation of a message broker. Instead, it is an open-source distributed event streaming platform. Unlike RabbitMQ, which is based on queues and exchanges, Kafka’s storage layer uses a partitioned transaction log (read more below under Topics).

Kafka also provides a Streams API to process streams in real time and a Connectors API for easy integration with various data sources. However, these are out of the scope of this post.

The Cloud Vendors provide alternative solutions for Kafka’s storage layer. These solutions include Azure Event Hubs and, to some extent, AWS Kinesis Data Streams. There are also cloud-specific and open-source alternatives to Kafka’s stream processing capabilities, but again, these are out of the scope of this post.

Topics

Kafka does not implement the notion of a queue. Instead, Kafka stores collections of records in categories called topics.

For each topic, Kafka maintains a partitioned log of messages. Each partition is an ordered, immutable sequence of records where messages are continually appended. Kafka appends messages to these partitions as they arrive. By default, it uses a round-robin partitioner to spread messages uniformly across partitions.

Producers can modify this behavior to create logical streams of messages. For example, we might want to make logical message streams in a multitenant application according to every message’s tenant ID. In an IoT scenario, we might want to map each producer’s identity to a specific partition constantly.

Ensuring all messages from the same logical stream map to the same partition guarantees their in-order delivery to consumers.

Kafka Producers

Consumers consume messages by maintaining an offset (or index) to these partitions and reading them sequentially. A single consumer can consume multiple topics, and consumers can scale up to the number of partitions available. As a result, when creating a topic, one should carefully consider the expected throughput of messaging on that topic.

A group of consumers working together to consume a topic is called a consumer group. Kafka’s API typically handles the balancing of partition processing between consumers in a consumer group and storing consumers’ current partition offsets.

Kafka Consumers

Implementing Messaging Patterns with Kafka

Kafka’s implementation maps quite well to the pub/sub pattern. A producer can send messages to a specific topic, and multiple consumer groups can consume the same message. Each consumer group can scale individually to handle the load (up to the number of available partitions).

Since consumers maintain their partition offset, they can choose to have a durable subscription that maintains its offset across restarts or an ephemeral subscription, which throws the offset away and restarts from the latest record in each partition every time it starts up. However, it is a less-than-perfect fit for the message queuing pattern.

Of course, we could have a topic with just a single consumer group to emulate classic Message Queuing. Nevertheless, this has multiple drawbacks that are discussed later in this post.

It is important to note that Kafka retains messages in partitions up to a preconfigured period, regardless of whether consumers consumed these messages. This retention means that consumers are free to reread past messages. Furthermore, developers can also use Kafka’s storage layer for implementing mechanisms such as Event Sourcing and Audit Logs.

What Are the Differences between RabbitMQ and Apache Kafka?

While RabbitMQ and Kafka are sometimes interchangeable, their implementations are very different. As a result, we cannot view them as members of the same category of tools; one is a message broker, and the other is a distributed event streaming platform. As solution architects, we should acknowledge these differences and actively consider which solutions we should use for a given scenario.

For example, Kafka is best used for processing data streams, while RabbitMQ has minimal guarantees regarding ordering messages within a stream. On the other hand, RabbitMQ has built-in support for retry logic and dead-letter exchanges, while Kafka leaves such implementations in the hands of its users.

This section highlights these and other notable differences between Apache Kafka vs RabbitMQ.

Message Ordering

RabbitMQ provides few guarantees regarding ordering messages sent to a queue or exchange. While it may seem evident that consumers process messages in the order producers send them, this is very misleading. The RabbitMQ documentation states the following regarding its ordering guarantees:

Messages published in one channel, passing through one exchange and one queue and one outgoing channel will be received in the same order that they were sent.
RabbitMQ Broker Semantics

Differently put, as long as we have a single message consumer, it receives messages in order. However, once multiple consumers read messages from the same queue, we have no guarantee regarding the processing order of messages.

This lack of ordering guarantee happens because consumers might return (or redeliver) messages to the queue after reading them (e.g., processing failure). Once a message is returned, another consumer can pick it up for processing even if it has already consumed a later message. Thus, the following diagram shows that consumer groups process messages out of order.

An example of lost message ordering when using RabbitMQ

Of course, we could regain message ordering in RabbitMQ by limiting consumer concurrency to one. More precisely, the thread count within the single consumer should be limited to one since any parallel message processing can cause the same out-of-order issue. However, limiting ourselves to one single-threaded consumer severely impacts our ability to scale message processing as our system grows. As a result, we should not light-heartedly perform this tradeoff.

On the other hand, Kafka provides a reliable ordering guarantee for message processing. Kafka guarantees that all messages sent to the same topic partition are processed in order. By default, Kafka places messages in partitions with a round-robin partitioner.

However, a producer can set a partition key on each message to create logical data streams (such as messages from the same device or messages belonging to the same tenant). All messages from the same stream are then placed within the same partition, causing them to be processed in order by consumer groups.

We should note, however, that within a consumer group, each partition is processed by a single thread of a single consumer. As a result, we cannot scale the processing of a single partition. However, in Kafka, we can scale the number of partitions within a topic, causing each partition to receive fewer messages and adding additional consumers for the additional partitions.

Winner – Kafka is the clear winner as it allows processing messages in order. RabbitMQ only has weak guarantees in this regard.

Message Routing

RabbitMQ can route messages to subscribers of a message exchange based on subscriber-defined routing rules. A Topic Exchange can route messages based on a dedicated header named routing_key. Alternatively, a Headers Exchange can route messages based on arbitrary message headers. Both exchanges effectively allow consumers to specify the type of messages they want to receive, thus providing solution architects with great flexibility.

On the other hand, Kafka does not allow consumers to filter messages on a topic before polling them. A subscribed consumer receives all messages in a partition without exception. As a developer, you could use a Kafka Stream Job, which reads messages from the topic, filters them, and pushes them to another topic to which a consumer can subscribe. Nonetheless, this requires more effort and maintenance and has more moving parts.

Winner – RabbitMQ provides superior support for routing and filtering messages for consumers.

Message Timing

RabbitMQ provides various capabilities in regards to timing a message sent to a queue:

Message Time-To-Live (TTL) – a TTL attribute can be associated with each message sent to RabbitMQ. Setting the TTL is done either directly by the publisher or as a policy on the queue itself. Specifying a TTL allows the system to limit the validity period of the message. If a consumer does not process it in due time, it is automatically removed from the queue (and transferred to a dead-letter exchange; read more on that later). TTL is beneficial for time-sensitive commands that become irrelevant after some time has passed without processing.
Delayed/Scheduled Messages – RabbitMQ supports delayed/scheduled messages via a plugin. When this plugin is enabled on a message exchange, a producer can send a message to RabbitMQ, and the producer can delay the time RabbitMQ routes this message to a consumer’s queue. This feature allows a developer to schedule future commands that are not meant to be handled before. For example, when a producer hits a throttling rule, we might want to delay the execution of specific commands later.

Kafka provides no support for such features. It writes messages to partitions as they arrive, where they are immediately available for consumers to consume. Also, Kafka provides no TTL mechanism for messages, although we could implement one at the application level. We must also remember that a Kafka partition is an append-only transaction log. As a result, it cannot manipulate the message time (or location within the partition).

Winner – RabbitMQ wins this hands-down, as the nature of its implementation limits Kafka.

Message Retention

RabbitMQ evicts messages from storage as soon as consumers successfully consume them. This behavior cannot be modified. It is part of almost all message brokers’ designs.

In contrast, Kafka persists all messages by design up to a configured timeout per topic. Regarding message retention, Kafka does not care about the consumption status of its consumers as it acts as a message log. Consumers can consume every message as much as they want and travel back and forth “in time” by manipulating their partition offset. Periodically, Kafka reviews the age of messages in topics and evicts those old enough messages.

Kafka’s performance is not dependent on storage size. So, in theory, one can store messages almost indefinitely without impacting performance (as long as your nodes are large enough to hold these partitions).

Winner – Kafka is designed to retain messages, while RabbitMQ is not. There is no competition here, and Kafka is declared the winner.

Fault Handling

When dealing with messages, queues, and events, developers are often under the impression that message processing always succeeds. After all, since producers place each message in a queue or topic, even if a consumer fails to process a message, it can simply retry until it succeeds.

While this is true, we should put additional thought into this process. We should acknowledge that message processing can fail in some scenarios. We should gracefully handle these situations, even if the solution is partly composed of human intervention.

There are two types of possible errors when processing messages:

Transient failures – failures that occur due to a temporary issue such as network connectivity, CPU load, and a service crash. We can usually mitigate this kind of failure by retrying over and over again.
Persistent failures – failures that occur due to a permanent issue that cannot be resolved via additional retries. Common causes of these failures are software bugs or an invalid message schema (i.e., a poison message).

As architects and developers, we should ask ourselves: How often do we retry upon a message processing failure? How long should we wait between retries? How do we distinguish between transient and persistent failures? And most importantly – what do we do when all retries fail or we encounter a persistent failure? While the answers to these questions are domain-specific, messaging platforms typically provide us with the tools to implement our solution.

RabbitMQ provides tools such as delivery retries and dead-letter exchanges (DLX) to handle message processing failures. The main idea of a DLX is that RabbitMQ can automatically route failed messages to a DLX based on an appropriate configuration and apply further processing rules on the message at this exchange, including delayed retries, retry counts, and delivery to a “human intervention” queue. This article provides additional insights on possible patterns for handling retries in RabbitMQ.

The most important thing to remember here is that in RabbitMQ when a consumer is busy processing and retrying a specific message (even before returning it to the queue), other consumers can concurrently process the messages that follow it. Message processing is not stuck while a specific consumer retries a particular message. As a result, a message consumer can synchronously retry a message for as much as it wants without affecting the entire system.

Consumer 1 can continue retries on message 1 while other consumers continue processing messages.

Contrary to RabbitMQ, Kafka does not provide any such mechanisms out of the box. With Kafka, we must provide and implement message retry mechanisms at the application level. Also, we should note that when a consumer is busy synchronously retrying a specific message, other messages from the same partition cannot be processed.

We cannot reject and retry a specific message and commit a message that came after it since the consumer cannot change the message order. As you remember, the partition is merely an append-only log.

An application-level solution can commit failed messages to a “retry topic” and handle retries from there; however, we lose message ordering in this solution. An example of such an implementation by Uber Engineering can be found here. If message processing latency is not an issue, the vanilla Kafka solution with adequate error monitoring might suffice.

Messages in the bottom partition are not handled if the consumer is stuck retrying a message.

Winner – RabbitMQ is a winner on points since it provides some mechanism to solve this problem out of the box.

Scale

There are multiple benchmarks out there, checking the performance of RabbitMQ and Kafka. While generic benchmarks have limited applicability toward specific cases, Kafka generally performs better than RabbitMQ. Kafka uses sequential disk I/O to boost performance. Its architecture using partitions means it scales horizontally (scale-out) better than RabbitMQ, which scales better vertically (scale-up).

Large Kafka deployments can commonly handle hundreds of thousands of messages per second and even millions per second. Pivotal previously recorded a RabbitMQ cluster running one million messages per second; however, it did this on a 30-node cluster with load optimally spread across multiple queues and exchanges.

Typical RabbitMQ deployments include 3-7 node clusters that do not necessarily optimally divide the load between queues. These typical clusters can usually expect to handle a load of several tens of thousands of messages per second.

Winner – while both platforms can handle massive loads, Kafka typically scales better and achieves higher throughput than RabbitMQ, thus winning this round. However, it is essential to note that most systems never reach these limitations! So unless you’re building the next millions-of-users smash-hit software system, you don’t need to care about scale so much, as both platforms can serve you well.

Consumer Complexity

RabbitMQ uses a smart-broker & dumb-consumer approach. Consumers register to consume queues, and RabbitMQ pushes them with messages to process as they come in. RabbitMQ also has a pull API, but is much less used.

RabbitMQ manages the distribution of messages to consumers and removes messages from queues (possibly to DLXs). The consumer does not need to worry about any of this. RabbitMQ’s structure also means that a queue’s consumer group can efficiently scale from just one consumer to multiple consumers when the load increases without any changes to the system.

RabbitMQ consumers efficiently scale up and scale down.

Kafka, on the other hand, uses a dumb-broker & smart-consumer approach. Consumers in a consumer group need to coordinate leases on topic partitions between them (so that only one consumer in a consumer group listens to a specific partition). Consumers also need to manage and store their partitions’ offset index.

Fortunately, the Kafka SDK takes care of these for us, so we don’t need to manage it ourselves. However, when we have a low load, a single consumer needs to process and keep track of multiple partitions in parallel, which requires more resources on the consumer side.

Also, as the load increases, we can only scale the consumer group to the point where the number of consumers equals the number of partitions in the topic. Above that, we need to configure Kafka to add additional partitions. However, as the load decreases again – we cannot remove the partitions we already added, adding more to the work consumers need to do. Albeit, as mentioned above, the SDK handles this extra work.

Kafka partitions cannot be removed, leaving consumers with more work after scaling down.

Winner – RabbitMQ, by design, is built with dumb consumers in mind. As a result, it is the winner of this round.

Protocol Compatibility

While Apache Kafka uses a custom binary protocol over TCP, RabbitMQ supports standard protocols natively or via plugins. Most notably, RabbitMQ supports AMQP 0-9-1 out of the box. It adds support for AMQP 1.0, MQTT, and STOMP via plugins.

Winner – RabbitMQ, as it supports various standard protocols.

Which Platform Should You Use? RabbitMQ vs Kafka

Now we’re at the million-dollar question – when should we use RabbitMQ, and when should we use Kafka? If we summarize the above differences between the two, we arrive at the following conclusion:

RabbitMQ is preferable when we need:
1. Advanced and flexible routing rules.
2. Message timing control (controlling either message expiry or message delay).
3. Advanced fault handling capabilities in cases when consumers are more likely to fail to process messages (either temporarily or permanently).
4. Simpler consumer implementations.
Kafka is preferable when we require:
1. Strict message ordering.
2. Message retention for extended periods, including the possibility of replaying past messages.
3. The ability to reach a large scale when traditional solutions do not suffice.

We can implement most use cases using both platforms. However, as solution architects, we must choose the most suitable tool for the job. When making this choice, we should consider both functional differences, as highlighted above, and non-functional constraints. These constraints include things such as:

Existing developer knowledge of these platforms.
Availability of a managed cloud solution if applicable.
The operational cost of each solution.
Availability of SDKs for our target stack.

When developing complex software systems, we might be tempted to implement all required messaging use cases using the same platform. Nevertheless, from my experience, more often than not, using both platforms can have many benefits.

For example, in an event-driven architecture-based system, we could use RabbitMQ to send commands between microservices and use Kafka to implement business event notifications. This is because event notifications are often used for event sourcing, batch operations (ETL style), or audit purposes, thus making Kafka very valuable for its message retention capabilities.

Commands, on the other hand, typically require additional processing on the consumer side, processing that could fail and need advanced fault-handling capabilities. Here, RabbitMQ shines for its capabilities. I might write a detailed post on it in the future, but you must remember – your mileage may vary, as suitability depends on your specific requirements.

Closing Thoughts

I started this post by observing that many developers view RabbitMQ and Kafka as interchangeable. I hope that reviewing their differences assisted in gaining insight into these platforms’ implementation and technical uniqueness.

In the rapidly evolving landscape of messaging platforms, RabbitMQ and Kafka stand out as formidable choices, each with its own strengths and specializations.

RabbitMQ’s advanced routing capabilities and message timing features make it particularly versatile in various enterprise settings. On the other hand, Kafka’s proficiency in handling vast real-time data streams and its robust scaling mechanisms underscores its value in web-scale applications.

As with many technological decisions, the optimal choice boils down to the specific needs and goals of the project at hand. Both platforms are powerful in their right, and understanding their nuances can significantly inform the direction of your software design and development endeavors.

RabbitMQ vs Kafka: Which Platform to Choose in 2023?

Key Takeaways