Are you really sure you want Apache Kafka?

Paul Kelly
6 min readMay 8, 2021

Apache Kafka, the event streaming bus that started life at LinkedIn, has become very popular over the last few years. Kafka has lots of strengths — but is it always the right choice?

Kafka can scale to handle very high throughputs — much higher than most message queueing systems. It’s robust, and can be configured to store events persistently for any length of time.

The persistent storage of events within the topic is a powerful feature of Kafka, but it also adds some complexity — you can configure consumers to only read events from after when they joined the topic, or from the start of the topic, or from a given index. It can also make testing a little harder; for example, you don’t want your system tests to run based on the events sent from your last system test.

Kafka is a good choice if you are building a brand new microservice architecture using events to drive choreography. It is less suitable when you are modernising an older application which was built around message queues, or want to use queues as a way of scaling based on the amount of work at a given time.

Despite this, there is a trend in some enterprises to mandate Kafka as the strategic messaging platform, and to discourage usage of other message queues. These decisions seem to be based on a combination of spec comparison and popular sentiment, rather than a consideration of the practical differences between message queues and Apache Kafka for applications and application developers. The things that make Kafka so effective in some architectures make it less suitable for applications that really just need a traditional message queue.

Kafka is not a direct replacement for the types of message buses enterprises have built their systems round in the past (for example IBM MQ, RabbitMQ or Apache ActiveMQ). The first clue here is that Kafka terminology is based around topics rather than queues. In messaging system terminology, a topic has a publisher-subscriber semantic — it is designed for multiple subscribers who will all receive the same messages.

But many applications are designed around queues; each message is only consumed once. Some applications handle higher loads by adding multiple listeners of the same type to a queue — but once a listener consumes a message, it is removed from the queue so that no two listeners receive the same message.

Kafka topics also provide functionality similar to a queue, but it is implemented differently. The diagram below shows a topic with three producers sending events. There are two separate consumer groups that use the events for different purposes. Each group has two individual consumers. Each consumer in a group receives events from a different partition — so although different consumer groups all receive all the same events, different consumers within a group all receive different events. The semantics of consumers within any individual group are similar to multiple listeners attached to the same message queue.

Shows Kafka topic with two partitions and two consumers in each of two separate consumer groups

It is up to the developer writing the producer applications to decide how messages are balanced across the partitions in a topic. You can have a round-robin where messages are distributed more or less evenly, or you can specify which partition each individual message goes to (there are several different ways of doing this and the system is very flexible).

Dynamic Scaling

Kafka’s Topic and Partitions design helps enable Kafka’s tremendous throughput capabilities. But it brings some complexity and drawbacks as well. In particular:

  1. You need to determine the number of partitions in advance.
  2. You can’t have more consumers in a group than there are partitions (you can have fewer).

When you attach a consumer to a topic, Kafka will attach it to one or more partitions depending on the number of consumers already in the group. It will do its best to balance the load across the consumers. For example, if you have three partitions and two consumers, one consumer will be attached to two partitions and the other to one. But no partition ever has more than one consumer (within a group) attached to it.

The rebalancing that happens each time you attach a new consumer also comes with a comparatively high run-time cost as event delivery stops while the rebalance is being done. For example, if we have three partitions and two consumers in a group, then add a third consumer, Kafka has to stop delivering events from one partition while it removes it from one consumer and attaches the new one.

This means Kafka is not as well suited to dynamic scaling of worker nodes as traditional message queues. A pattern that message buses like RabbitMQ are sometimes used for is distributing work across a number of worker nodes. You put your jobs onto a queue, and worker nodes consume the jobs. You can scale the number of worker nodes up and down dynamically based on the depth of the queue so that you have more workers when the system is busy.

The run-time costs of rebalancing make Kafka less effective for this type of scenario — and you can never have more workers than there are partitions in your topic.

Refactoring Older Applications

In many enterprises the business critical applications are often the older ones. There is lots of activity around modernising these applications to run on cloud platforms. I’m using cloud platforms in its broadest possible sense here to include on-premise Kubernetes and PaaS platforms as well as the hyperscalers like AWS, Azure, and GCP.

Many successful modernisations are done by refactoring the application a slice at at a time rather than attempting a complete rewrite. As new microservices are written, they gradually replace the functionality of the original application — but until the original application has been completely replaced the new and the old have to work together.

If the original application uses message queues, mandating Kafka as the “strategic” platform for the new services adds an extra layer of complexity to the modernisation effort. For example, will you have to add message gateways or adapters between the existing message queues and the new Kafka bus? Depending on the final target architecture this might be worthwhile, but in the early stages of a modernisation you are still learning about the system you are replacing and the one you are creating. Limiting the amount of change reduces both risk, and the length of time to get something to production

JMS

Older Java applications are likely to be using JMS. Apache ActiveMQ is based on JMS, and IBM MQ and RabbitMQ are both capable of handling JMS messages. Kafka does not have any direct support for JMS. This is another factor to consider.

Conclusion

Apache Kafka is an excellent technology with a wide range of use cases, but it is not always the best choice, and choosing between messaging systems based on specifications and features can be misleading. I’ve looked at Apache Kafka vs the rest mainly through the lens of application modernisation, but below there are links to a couple of excellent articles that compare Apache Kafka and RabbitMQ in greater technical depth. You might also enjoy Shaun Anderson’s polemic on the 3Ks of the Apocalypse.

And if you do decide to use Apache Kafka, I’ve also written a couple of articles with some practical guidance for developers.

Further Reading

Kafka vs RabbitMQ:

For developers:

--

--

Paul Kelly

Senior Solutions Architect at VMware Tanzu Labs.