Sponsor: Do you build complex software systems? See how NServiceBus makes it easier to design, build, and manage software systems that use message queues to achieve loose coupling. Get started for free.
I advocate a lot for asynchronous messaging. It can add reliability, temporal decoupling, and much more. But what are some of the challenges? One of them is backpressure and flow control. This occurs when you’re producing more messages can you can consume. Meaning you’re piling up messages in your queue and you can never catch up. The queue just keeps growing.
Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.
Producers and Consumers
Producers send messages to a broker/queue and a Consumer processes those messages. For a simplistic view, we have a single producer and a single consumer.
The producer creates a message and sends it to the broker/queue.
The message can sit on the broker until the consumer is ready to process it. This enables the producer and the consumer to be temporally decoupled.
The consumer then processes the message and it is removed from the broker/queue.
As long as you can consume messages on average faster than messages are produced, you won’t get into having a queue backlog.
But since there can be many producers, or because of load, you may start producing more messages at a faster rate than can be consumed.
For example, if you’re producing a single message every second, yet it takes you 1.5 seconds to process the message, you’re going to start filling up the queue. You’ll never be able to catch up and have an empty queue.
Most systems will have peaks and valleys in terms of how many messages are produced. During the valleys is where the consumer can catch up. But if again, on average, you’re producing more messages than can be processed, you’re going to build a queue backlog.
One solution is to add more consumers so that you can process more messages concurrently. Basically, you’re increasing your throughput. You need to match or exceed the rate of production with consumption.
The competing consumer pattern is having multiple instances of the consumer that are competing for messages on the queue.
Since we have multiple consumers, we can now process 2 messages concurrently. Since one consumer is busy processing a message, if another message is sent to the queue, we have another consumer that is available.
The consumer that is available will compete for the next message and process it.
There are a couple of issues with the Competing consumers’ pattern.
The first is if you’re expecting to be processing messages in order. Just because you’re using a FIFO (first-in, first-out) queue, that does not mean you’ll process messages in order as they were produced. Because you’re processing messages concurrently, you could finish processing messages out of order.
The second issue is you’ve moved the bottleneck. Any downstream services that are used when consuming a message will now experience additional load. For example, if you’re interacting with a database, you’re now going to add additional calls to that database because you’re now processing more messages at a given time.
A queue is like a buffer. My analogy is to think of a queue as a pond of water. There is a stream of water as an inflow on one end, and a stream of water as an outflow on the other.
If the stream of water coming in is larger than the stream of water going out, the water level on the pond will increase. In order to lower the water level, you need to widen the outgoing stream to allow more water to escape. This will lower the water level.
But another way to maintain the water level is to limit the amount of water entering the pond.
In other words, limit the producer to only be able to add so many messages to the queue.
Setting a limit on the broker/queue itself means when the producer tries to send a message to the queue, if it’s reached its limit, it won’t accept the message.
Because the producer might be a client/UI, you might want to have built-in retry and other ways of handling this failure if you cannot enqueue a message. Generally, I think this way of handling this backpressure is used as a safeguard to not overwhelm the entire system.
Ultimately when dealing with queues (and topics!) you need to understand some metrics. The rate at which you’re producing messages. The rate at which you can consume messages. How long are messages sitting in the queue? What’s the lead time, from when it was produced to when it was consumed to be processed? What’s the processing time, and how long does it take to consume a specific message?
Knowing these metrics will allow you to understand how to handle backpressure and flow control. Look at both sides, producing and consuming.
Look at competing consumers’ pattern to increase throughput. Also, look at if there are optimizations to be made for how a consumer is processing a message. Be aware of downstream services that also will be affected by increasing throughput.
Add a safeguard on the producing side to not get into a queue backlog situation you can’t recover from.
Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.
You also might like
- Message Ordering in Pub/Sub or Queues
- Real-World Event Driven Architecture! 4 Practical Examples
- Handling Failures in Message Driven Architecture
- When distributed systems get frustrated