Do you want to use Kafka? Or do you need a Queue?

Do you want to use Kafka? Or do you need a message broker and queues? While they can seem similar, they have different purposes. I’m going to explain the differences, so you don’t try to brute force patterns and concepts in Kafka that are better used with a message broker.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Partitioned Log

Kafka is a log. Specifically a partitioned log. I’ll discuss the partition part later in this post and how that affects ordered processing and concurrency.

Kafka Log

When a producer publishes new messages, generally events, to a log (a topic with Kafka), it appends them.

Kafka Log

Events aren’t removed from a topic unless defined by the retention period. You could keep all events forever or purge them after a period of time. This is an important aspect to note in comparison to a queue.

With an event-driven architecture, you can have one service publish events and have many different services consuming those events. It’s about decoupling. The publishing service doesn’t know who is consuming, or if anyone is consuming, the events it’s publishing.

Consumers

In this example, we have a topic with three events. Each consumer works independently, processing messages from the topic.

Consumers

Because events are not removed from the topic, a new consumer could start consuming the first event on the topic. Kafka maintains an offset per topic, per consumer group, and partition. I’ll get to consumer groups and partitions shortly. This allows consumers to process new events that are appended to the topic. However, this also allows existing consumers to re-process existing messages by changing the offset.

Just because a consumer processes an event from a topic does not mean that they cannot process it again or that another consumer can’t consume it. The event is not removed from the topic when it’s consumed.

Commands & Events

A lot of the trouble I see with using Kafka revolves around applying various patterns or semantics typical with queues or a message broker and trying to force it with Kafka. An example of this is Commands.

There are two kinds of messages. Commands and Events. Some will say Queries are also messages, but I disagree in the context of asynchronous messaging.

Commands are about invoking behavior. There can be many producers of a command. There is a required single consumer of a command. The consumer will be within the logical boundary that owns the definition/schema of the command.

Sending Commands

Events are about notifying other parts of your system that something has occurred. There is only a single publisher of an event. The logical boundary that publishes an event owns the schema/definition. There may be many consumers of an event or none.

Publishing Events

Commands and events have different semantics. They have very different purposes, and how that also pertains to coupling.

Commands vs Events

By this definition, how can you publish a command to a Kafka topic and guarantee that only a single consumer will process it? You can’t.

Queue

This is where a queue and a message broker differ.

When you send a command to a queue, there’s going to be a single consumer that will process that message.

Queue Consumer

When the consumer finishes processing the message, it will acknowledge back to the broker.

Queue Consumer Ack

At this point, the broker will remove the message from the queue.

Queue Consumer Ack Remove Message

The message is gone. The consumer cannot consume it again, nor can any other consumer.

Consumer Groups & Partitions

Earlier I mentioned consumer groups and partitions. A consumer group is a way to have multiple consumers consume from the same topic. This is a way to concurrently scale and process more messages from a topic called the competing consumer pattern.

A topic is divided into partitions. Events are appended to a partition within a topic. There can only be one consumer within a consumer group that processes messages from a partition.

Meaning you will process messages from a partition sequentially. This allows for ordered processing.

As an example of the competing consumers pattern, let’s say we have two partitions in a topic. Each partition right now is a single event in each. We have two consumers in a single consumer group. We’ve defined that the top consumer will consume from the top partition, and the bottom consumer will consume from the bottom partition.

Kafka Partition

This means that each consumer within our consumer group can process each message concurrently.

Kafka Partition

If we publish another message to the top partition, this means the top consumer again is the one responsible for consuming it. If it was busy processing another message, the bottom consumer, even if it’s available, will not consume it. Only the top consumer is associated with the top partition.

This allows you to consume messages in order, so as long as you associate them to the same partition.

In contrast, the competing consumers’ pattern with a queue works slightly differently as we don’t have partitions.

If we have two messages in a topic, and we have two consumers within a single consumer group.

Competing Consumers

Messages are consumed by any free/available consumer. Because there are two free consumers, both messages will be consumed concurrently.

Competing Consumers

Even though messages are distributed FIFO (First-in-First-Out), that doesn’t mean we will process them in order.

Why does this matter? With Kafka partitions, you can process messages in order. Because there is only a single consumer within a consumer group associated with a partition, you’ll process them one by one. This isn’t possible with queues. The downside is that if you publish messages to a partition faster than you can consume them, you can end up in a backlog disaster.

Kafka or Message Broker Queues & Topics?

Hopefully, this post (and video) illustrated some of the differences. The primary issue I’ve come across is people using Kafka but trying to apply patterns and concepts (commands, competing consumers, dead letter queues) that are typical with a message broker using queues and topics, but it just doesn’t fit.

Typically when you’re creating an asyncronous workflow, you’re consuming events and sending commands. While you technically can create a topic for commands, you can’t guarantee there won’t be more than a single consumer. Is this a big deal? To me, semantics matter. If you’re already using Kafka and don’t want to introduce another piece of infrastructure like a queue/message broker, then I understand the reasoning for doing so.

Understanding the differences in how the competing consumers pattern works. If you’re not configured correctly and are publishing to a single partition, then you can’t increase throughput by adding another consumer to a consumer group.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Event Choreography for Loosely Coupled Workflow

What’s Event Choreography Workflow? Let’s back up a bit to answer that. Event Driven Architecture is a way to make your system more extensible and loosely coupled. Using events as a way to communicate between service boundaries. But how do you handle long-running business processes and workflows that involve multiple services? Using RPC is going back to tight coupling, which we’re trying to avoid with Event Driven Architecture. So what’s a solution? Event Choreography.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

RPC

So why not just use RPC via an HTTP API or gRPC? Well, the point of using asynchronous messaging and an event driven architecture is to be loosely coupled and remove temporal coupling. Temporal coupling matters because a long-running business process or workflow that involves many different services means that all services need to be available, and there can be no failures. You can add retries for transient failures, but if a significant issue with a service or data causes the workflow to fail, you have no recourse to resolve the issue. You can be stuck in an inconsistent state.

For example, if we had many service-to-service RPC calls, and far down the call stack, there is a failure, how do you handle that?

RPC Service to Service

If Service A and Service C made state changes, they would need to have some type of compensating action to reverse the state change. There is no distributed transaction; you can’t just roll back the state of each service database. You need to deal with failures in each service that makes a service-to-service RPC call.

When all services are working correctly, there are no issues. But the moment you have a service that becomes unavailable, any workflow that involves that service will be immediately affected, likely causing the entire workflow to fail and leaving your system in an inconsistent state.

There are a whole other set of issues with RPC between services that I’m not even mentioning here, one of them being latency. Check out my post REST APIs for Microservices? Beware!

RPC Orchestration

Now you might think instead of making service-to-service RPC calls, is to make some type of orchestration that would make the RPC calls.

RPC Orchestration: Call Service A

After it makes the first RPC call to the first service, it would make the following subsequent calls to other services in order.

RPC Orchestration: Call Service B

This would address the failures as the orchestrator could handle the retries upon failures. If there is a more extended failure/outage, it will make a request for a compensating action to void/undo a call to a previous service.

RPC Orchestration: Failure

While this orchestrator sounds better than service-to-service RPC, we haven’t solved much. We still have a high degree of coupling, and our workflow will fail if a service is down or unavailable. If we have a failure and we need to make a call back to a previous service to perform some type of compensating action, what happens if that call fails? Again, we’re back to being in an inconsistent state.

Event Choreography

We can loosely couple between services with an event driven architecture and the publish-subscribe pattern and remove temporal coupling by publishing and consuming events.

For example, Service A receives a request from the client that kicks off the workflow.

Event Choreography

After making its state change to its database, Service A would publish an event to our message broker.

Event Choreography

At this point, the request from the Client to Service A is complete. Service B will consume the message/event published by Service A. Service B consumes this message to do its part of the workflow.

Event Choreography

Once Service B has successfully consumed the message, it will publish an event to the broker, which will kick off the next service that is a part of the workflow.

Event Choreography

Service C is the next service involved in the workflow, and it consumes the event published by Service B.

Event Choreography

Because we aren’t temporally coupled, each service consumes a message and processes it independently.

This means if Service C is unavailable or has a backlog of messages to process, that doesn’t break the workflow. Once the service becomes available again, and the message is processed, the workflow continues.

Service Unavailable

Is Event Choreography a silver bullet for workflow involving multiple services? No. Of course not. The challenge with event choreography for workflow is when you have more extended or complex workflows that involve more than just a few services. It is hard to visualize a workflow because there is no centralized orchestration to understand the entire workflow.

If you have complex workflows that involve many different services, check out my post on Workflow Orchestration for Resilient Systems

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

The world is full of Asynchronous Workflow

The world is asynchronous. Many workflows and business processes you encounter out in the world are long-running and driven by asynchronous systems. Yet as developers, we’re still often writing procedural and synchronous code to model these business processes. I will give one of my favorite examples of going out to eat at a restaurant to illustrate this and how this can be applied to software.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Asynchronous Chicken Wings?

One of my favorite ways to explain asynchronous workflows is to use a restaurant as an example. There are asynchronous workflows everywhere when you eat at a restaurant.

However, many developers can get caught in the trap of only working with synchronous blocking RPC calls between services or boundaries when building systems.

To illustrate, let’s use the restaurant example. We can think of two different service boundaries. The waiter/waitress is one; the kitchen is another. If you were making blocking RPC calls between them, that would mean that when the customer places their food order with the waiter/waitress, they would immediately go to the kitchen and tell the cooking staff. The waiter/waitress would stand there waiting for the food to be finished so they could bring it to the customers’ table. The customer is doing nothing else this entire time; they are simply waiting.

Blocking

That’s not how a restaurant works. Interactions between the customer, waiter/waitress, and the kitchen aren’t blocking.

If you’re developing a microservices or service-oriented system and making service-to-service calls with HTTP or gRPC then you’re making blocking calls.

We know those interactions at a restaurant aren’t blocking. They are asynchronous. When a waiter places an order in the kitchen, they don’t stand there waiting for the food. So why would we write systems that communicate using blocking calls between them?

Asynchronous Workflow

To illustrate how the asynchronous workflow really works, first a waiter/waitress goes to a customers table and gets their order.

Asynchronous Workflow

They may right afterward go to another customer’s table and collect their order as well.

Asynchronous Workflow

Now that they have two orders from different customers, they proceed to place the order with the kitchen.

Asynchronous Workflow

At this time, our customers are free to do whatever they want. The waiter/waitress might be doing some other work unrelated to attending tables. The kitchen, at this point, is working on the two orders that were placed. Once they are finished, the waiter/waitress then gets the food from the kitchen.

Asynchronous Workflow

They would then immediately go and bring the food to the customers’ table.

Asynchronous Workflow

The same process would occur for the other customer order. Once it’s ready, the waiter/waitress would get the food from the kitchen and bring it to the customers’ table.

Nothing is blocking. The waiter/waitress is free to do other tasks while food is being cooked.

Messaging

You can build asynchronous workflows using messaging. Sending commands and publishing events to a message broker. This allows us to have our interactions asynchronous and in isolation. We aren’t temporally coupled between services. Each service can work independently without requiring the other.

When the client tells the wait staff (waiter/waitress) their order, they persist that into their database.

Client calls Wait Staff

They then could create a command for the kitchen boundary to prepare the food.

Wait Staff creates Message/Command

Once the message is sent to the broker, the client and wait staff interaction is complete. At this point, the wait staff isn’t concerned about the kitchen; it knows it will consume the message which tells it to prepare the food. If the kitchen is flooded with messages, preparing the food might take longer than expected. But because this is asynchronous, it doesn’t mean it will fail, just that it will take longer.

Asynchronous Workflow

The kitchen will then consume the message and prepare the food.

Kitchen Consumes Message/Command

Once the food is ready, the kitchen can publish an event back to the broker or possibly provide a reply to the wait staff to notify them that the food is ready.

All these interactions are asynchronous and non-blocking.

Asynchronous Workflow

Long-running business processes and workflows are everywhere. Just look to the real world sometimes for examples of how they work. You can often use them to illustrate how a message or event-driven system would work. Being loosely coupled between services can make your system much more resilient to failures and influx of load.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

You also might like

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design