Event Carried State Transfer: Keep a local cache!

What’s Event Carried State Transfer, and what problem does it solve? Do you have a service that requires data from another service? You’re trying to avoid making a blocking synchronous call between services because this introduces temporal coupling and availability concerns? One solution is Event Carried State Transfer.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Temporal Coupling

If you have a service that needs to get data from another service, you might just think to make an RPC call. There can be many reasons for needing data from another service. Most often, it’s for query purposes to generate a ViewModel/UI/Reporting. If you need data to perform a command because you need data for business logic, then check out my post on Data Consistency Between Services.

The issue with making an RPC call and the temporal coupling that comes with it is availability. If we need to make an RPC call from ServiceA to ServiceB, and ServiceB is unavailable, how do we handle that failure, and what do we return to the client?

Service to Service

We want ServiceA to be available even when ServiceB is unavailable. To do this, we need to remove the temporal coupling so we don’t need to make this RPC call.

This means that ServiceA needs all the data to fulfill the request from the client.

Service has all the data within it's own boundary

Services should be independent. If a client makes a request to any service, that service should not need to make a call to any other service. It has to have all the data required.

Local Cache

One way to accomplish this is to be notified via an event asynchronously when data changes within a service boundary. This allows you to call back the service to get the latest data/state from the service and then update your database, which is acting as a local cache.

For example, if a Customer were changed in ServiceB, it would publish a CustomerChanged event containing the Customer ID that was changed.

Publish Event

When ServiceA consumes that event, it would then do a callback to ServiceB to get the latest state of the Customer.

Consume and Callback Publisher

This allows us to keep a local cache of data from other services. We’re leveraging events to notify other service boundaries that the state has changed within a service boundary. Other services can then call the service to update their local cache.

The downside to this approach is that you could be receiving/accepting a lot of requests for data from other services if you’re publishing many events. From the example, ServiceB could have an increased load handling the requests for data.

You’re still dealing with availability, however. If you consume an event and then make an RPC call to get the latest data, the service isn’t available or responding. As with any cache, it’s going to be stale.

Callback Failure/Availability

Event Carried State Transfer

Instead of having these callbacks to the event’s producer, the event contains the state. This is called Event Carried State Transfer.

If all the relevant data related to the state change is in the event, then ServiceA can simply use the data in the event to update its local cache.

Event Carried State Transfer

There are three key aspects to Event Carried State Transfer: Immutable, Stale, and Versioned.

Events are immutable. When they were published, they represented the state at that moment in time. You can think of them as integration events. They are immutable because you don’t own any of the data. Data ownership belongs to the service that’s publishing the event. You just have a local cache. And as mentioned earlier, you need to expect it to be stale because it’s a cache.

Versioning

There must be a version that increments within the event that represents the point in time when the state was changed. For example, if a CustomerChanged event was published for CustomerID 123 multiple times, even if you’re using FIFO (first-in-first-out) queues, that does not mean you’ll process them in order if you’re using the Competing Consumers Pattern.

Competing Consumers

When you consume an event, you need to know that you haven’t processed a more recent version already. You don’t want to overwrite with older data.

Check out my post Message Ordering in Pub/Sub or Queues and Competing Consumers Pattern for Scalability.

Data Ownership

So what type of data would you want to keep as a local cache updated via Event Carried State Transfer? Generally, reference data from supporting boundaries. Not transactional data.

Because reference data is non-volatile, it fits well for a local cache. This type of data isn’t changing often, so you’re not as likely to be concerned with staleness.

Transactional data, however, I do not see as a good fit. Generally, transactional data should be owned and contained within a service boundary.

An example with an online checkout process. When the client starts the checkout process, it makes requests to the Ordering service.

Start Checkout Process

The client then needs to enter their billing and credit card information. This information isn’t sent directly to the Ordering service but to the Payment service. The payment service would store the billing and credit card information to the ShoppingCartID.

Payment Information

Finally, the order is reviewed, and to complete the order, the client then requests the Ordering service to place the order. At this point, the order service would publish an OrderPlaced event only containing the OrderID and ShoppingCartID.

Place Order

The Payment service would consume the OrderPlaced event and use the ShoppingCartID within the event to look up the credit card information within its database so it can then call the credit card gateway to make the credit card transaction.

Consume OrderPlaced and Process Payment

Event Carried State Transfer

Event Carried State Transfer is a way to keep a local cache of data from other service boundaries. This works well for reference data from supporting boundaries that aren’t changing that often. However be careful about using it with transactional data and don’t force the use of event carried state transfer where you should be directing data to the appropriate boundary.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Avoiding a QUEUE Backlog Disaster with Backpressure & Flow Control

I advocate a lot for asynchronous messaging. It can add reliability, temporal decoupling, and much more. But what are some of the challenges? One of them is backpressure and flow control. This occurs when you’re producing more messages can you can consume. Meaning you’re piling up messages in your queue and you can never catch up. The queue just keeps growing.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.

Producers and Consumers

Producers send messages to a broker/queue and a Consumer processes those messages. For a simplistic view, we have a single producer and a single consumer.

The producer creates a message and sends it to the broker/queue.

Single Producer sending a message to a queue

The message can sit on the broker until the consumer is ready to process it. This enables the producer and the consumer to be temporally decoupled.

Temporal Decoupling provided as the queue holds the message

The consumer then processes the message and it is removed from the broker/queue.

Consumer process the message from the queue

As long as you can consume messages on average faster than messages are produced, you won’t get into having a queue backlog.

But since there can be many producers, or because of load, you may start producing more messages at a faster rate than can be consumed.

For example, if you’re producing a single message every second, yet it takes you 1.5 seconds to process the message, you’re going to start filling up the queue. You’ll never be able to catch up and have an empty queue.

Queue backlog

Most systems will have peaks and valleys in terms of how many messages are produced. During the valleys is where the consumer can catch up. But if again, on average, you’re producing more messages than can be processed, you’re going to build a queue backlog.

Competing Consumers

One solution is to add more consumers so that you can process more messages concurrently. Basically, you’re increasing your throughput. You need to match or exceed the rate of production with consumption.

The competing consumer pattern is having multiple instances of the consumer that are competing for messages on the queue.

Competing Consumers of multiple consumer instances

Since we have multiple consumers, we can now process 2 messages concurrently. Since one consumer is busy processing a message, if another message is sent to the queue, we have another consumer that is available.

Each consumer available competes for the next message

The consumer that is available will compete for the next message and process it.

Competing consumers adds more processing which increases throughput

There are a couple of issues with the Competing consumers’ pattern.

The first is if you’re expecting to be processing messages in order. Just because you’re using a FIFO (first-in, first-out) queue, that does not mean you’ll process messages in order as they were produced. Because you’re processing messages concurrently, you could finish processing messages out of order.

The second issue is you’ve moved the bottleneck. Any downstream services that are used when consuming a message will now experience additional load. For example, if you’re interacting with a database, you’re now going to add additional calls to that database because you’re now processing more messages at a given time.

Competing consumers adding additional load on downstream services

Incoming

A queue is like a buffer. My analogy is to think of a queue as a pond of water. There is a stream of water as an inflow on one end, and a stream of water as an outflow on the other.

If the stream of water coming in is larger than the stream of water going out, the water level on the pond will increase. In order to lower the water level, you need to widen the outgoing stream to allow more water to escape. This will lower the water level.

But another way to maintain the water level is to limit the amount of water entering the pond.

In other words, limit the producer to only be able to add so many messages to the queue.

Setting a limit on the broker/queue itself means when the producer tries to send a message to the queue, if it’s reached its limit, it won’t accept the message.

Queue Backlog handled by limiting producer

Because the producer might be a client/UI, you might want to have built-in retry and other ways of handling this failure if you cannot enqueue a message. Generally, I think this way of handling this backpressure is used as a safeguard to not overwhelm the entire system.

Queue Backlog

Ultimately when dealing with queues (and topics!) you need to understand some metrics. The rate at which you’re producing messages. The rate at which you can consume messages. How long are messages sitting in the queue? What’s the lead time, from when it was produced to when it was consumed to be processed? What’s the processing time, and how long does it take to consume a specific message?

Knowing these metrics will allow you to understand how to handle backpressure and flow control. Look at both sides, producing and consuming.

Look at competing consumers’ pattern to increase throughput. Also, look at if there are optimizations to be made for how a consumer is processing a message. Be aware of downstream services that also will be affected by increasing throughput.

Add a safeguard on the producing side to not get into a queue backlog situation you can’t recover from.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Message Ordering in Pub/Sub or Queues

Do you need message ordering? Processing messages in order as they were sent/published to a queue or topic sounds simple but has some implications. What happens when processing a message fails? What happens to all subsequent messages? How is throughput handled when using the competing consumers’ pattern, which loses the guarantee processing in order. Lastly, is ordering even important?

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.

Message Ordering

What are some reasons that most people think they need to process messages in a particular order? The most common reason is workflow. As an example events are expected to have occurred in a particular order such as OrderPlaced, OrderBilled, OrderShipped.

The second most common is because of CRUD & Property-Based Events. Generally, this is more referring to Event Carried State Transfer, where you are trying to propagate the state of entities to other services. As an example, ProductCreated and ProductNameUpdated. You must process these events in order. If you process ProductNameUpdated before ProductCreated, you have no data/record to update and change the Product Name.

So how do you achieve message ordering? Well if you have a message broker that supports FIFO (First In, First Out) and you have a single consumer that is processing messages in a single-threaded or one at a time, you will process the messages in the same order as they were delivered to the broker.

As an example, we have a producer (or many) creating messages and sending them to the broker.

As the prior example, let’s say the first Event was ProductCreated, and then the user immediately changed the name, so another event was generated ProductNameUpdated. Both events are now on Topic waiting to be consumed.

With a single consumer, it will process the first event (ProductCreated).

Once it’s done and updated its local cache of the product, the consumer will then process the next event ProductNameUpdated.

Competing Consumers

When you want to scale and process more messages you’ll start using the competing consumers’ pattern. This is basically having multiple instances of the consumer running so you can process more messages concurrently.

This poses a problem because if we have two events ProductCreated and ProductNameUpdated sitting in our Topic waiting to be consumed, we now have two consumers able to process those events.

Since they are independent and process messages concurrently, this means that one consumer could be processing ProductCreated at the same time that the other consumer is processing ProductNameUpdated.

Now there is a race condition and we need ProductCreated to finish processing first, otherwise, ProductNameUpdated will fail since we haven’t finished processing ProductCreated yet.

Competing consumers applies to Topics, often called Consumer Groups as well as Queues.

One solution to solve this issue with processing messages in order when using competing consumers’ pattern is to process related messages one at a time. To achieve this, different messaging platforms will call these Partitions, Message Groups, or an Ordering Key.

To illustrate this, let’s say we have a Partition/Message Group/Ordering Key on a ProductID. This means that only a single consumer will get process the messages for that ProductID.

We have two messages sent to a Topic. ProductCreated for productID=1 and ProductCreated for ProductID=2.

The first (top in the diagram) consumer is responsible for handling all events where ProductID=1

And the bottom consumer is responsible for ProductID=2.

Now if we get another event ProductNameChanged for ProductID=1, it will to the first partition.

And the same consumer will handle it since it is handling all the messages for that partition.

This strategy allows you to process messages in the order that relate to each other and you want to process them one at a time. However, it still allows you to process many messages concurrently that don’t relate. Consumers can be responsible for many different Partitions/Message Groups/Ordering Keys.

Failures

Failures are another thing to consider. If you’re processing messages in order, how do you want to handle not being able to process a message? What happens to all the messages behind it waiting to be processed that relate to the failed message?

If the first event to be processed is ProductCreated, and the following event is ProductNameChanged. They are waiting to be consumed by a single consumer to be processed one at a time.

If the consumer attempts to process ProductCreated but it fails to do so, because of a bug, serialization issue, or whatever the reason, how do you now process ProductNameChanged?

This is very situational. Maybe you can discard the failed message and continue on. Maybe you need to treat it as a poison pill and stop processing any future messages. Again, this needs to be considered if you want to process messages in order.

WorkFlow

Do you really need to process messages in order? Often times you can create a policy to understand when all the relevant events have occurred so that you can then perform a specific action.

When an order is placed in our system, we need to charge the customer in billing, and then create a shipping label in the warehouse.

Billing will consume the OrderPlaced event and charge the customer.

Once it charges the customer, it will publish an OrderBilled event.

The warehouse will have a Policy (illustrated by an NServiceBus Saga below), that will keep track that both OrderPlaced and OrderBilled have occurred.

Now depending on message ordering and when these are published and how we consume them, these can be processed out of order. We could of processed OrderBilled first, and then later OrderPlaced was consumed, even though that’s not the order they were published.

Once both events have been consumed by the Policy/Saga, it can send a CreateShippingLabel command to the Warehouse.

Here’s an example of what this looks like with NServiceBus.

Message Ordering

So do you need to process messages in order? Maybe, maybe not. I’d take a look at the workflow of what you’re trying to achieve to be sure that you actually require to process messages in order. If you do, you’ll have to leverage FIFO (first-in, first-out) queues or topics. As well as use a broker that supports single consumers to process partitions, message groups, or ordering keys.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design