Blocking or Non-Blocking API calls?

When should you use blocking or non-blocking API calls? Blocking synchronous communication vs non-blocking asynchronous communication? Should commands fire and forget? What’s the big deal of making blocking synchronous calls between services?

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.

Communication

There are 3 different forms of communication that I’m generally thinking of. Commands, Queries, and Events. I’m going to cover all three and the pros/cons of using blocking synchronous or non-blocking asynchronous with each of them.

Commands

With commands, it’s all about users’/clients expectations. When the command is performed, do they expect to get a result immediately and the execution be fully complete? Because of this expectation, often times commands are blocking synchronous calls.

Blocking Command

If the client was requesting a state change to the service, the expectation is that if they want to then perform a query afterward, they would expect to read their write. There would be no eventual consistency involved. Blocking synchronous calls can make sense here.

Where asynchronous commands can be useful is if the execution time is going to be longer than the user might want to wait. This is an issue with blocking calls as the client could be waiting to get a response back to know if the command is completed. As an example, if the command had to fetch and mutate a lot of data in a database that could take a long time. Or perhaps it has to send 1000s of emails which could take a lot of time to execute.

Another situation where async commands are helpful is when you want to capture the command and just tell the user it has been accepted. You can then process the command asynchronously and then let the user know asynchronously that it has been completed. An example of this is placing an Order on an e-commerce site. When the user places an order, it doesn’t immediately create the order and charge their credit card, it simply creates a command that is placed on a queue.

Non-Blocking Command

Once the message has been sent to the broker, the client is then told the order has been accepted (not created). Then asynchronously the process of creating and charging the customer’s credit card can occur.

Consume Asyncronously

Once the asynchronous work has been completed, typically you’d receive an email letting you know the order has been created. Depending on your context and system, something like WebSockets can also be helpful for pushing notifications to the client to let them know when asynchronous work has been completed. Check out my post on Real-Time Web by leveraging Event Driven Architecture

Queries

Almost all queries are naturally request-response. They will be blocking calls by default because the purpose of a query is to get a result.

Blocking Query

This is pretty straightforward when a service is interacting with a database or any of its own infrastructure within a logical boundary, such as a cache.

Where this totally falls apart is when you have services that are physically deployed independently and they are calling each other using blocking synchronous communication (eg, HTTP, gRPC).

Blocking Query Service to Service

There are many different issues with service-to-service direct RPC style communication. I cover a bunch of them in my post REST APIs for Microservices? Beware!

One issue is latency. Because you don’t have really any visibility into the entire call stack without something like distributed tracing, you can end up with very slow queries because there are so many RPC calls being made between services to build up the result of a query.

The second issue and the worst offender is tight coupling. If a service is failing or unavailable, your entire system could be down if you aren’t handling all these failures appropriately. If Service D has availability issues, this will affect service A indirectly because it’s calling service C which requires service D.

Blocking Query Service to Service Failures

Congratulations, you’ve built a distributed monolith! Or as I like to call it, a distributed turd pile.

So how do you handle queries since they are naturally blocking and synchronous? Well, what you’re trying to do is view composition.

There are different ways to tackle this problem, one of which is to have a gateway or BFF (Backend-for-Frontend) that does the composition. Meaning it is the one responsible for making the calls to all the services and then composing the results back out to the client.

Query Composition

This isn’t the only way to do UI/ViewModel composition. There are other approaches such as moving the logic to the client or having each logical boundary own various UI components. Regardless, each service is independent and does not make calls to other services.

As for availability, if Service C is unavailable, that portion of the ViewModel/UI can be handled explicitly.

There is one option that I won’t go into in this post, which is asynchronous request-reply. This allows you to leverage a message broker (queues) but still have the client/caller block until it gets a reply message. Check out my post Asynchronous Request-Response Pattern for Non-Blocking Workflows.

Events

It may seem obvious (or not) but events generally should be processed asynchronously. The biggest reason why is because of isolation.

If you publish an event in a blocking synchronous way, this means that the service calling the consumers when the event occurs has to deal with the failures of the consumer.

Blocking Events

If there were two consumers, and the first consumer handled the event correctly, but the second consumer failed, the service/caller would need to handle that failure so it doesn’t bubble up back to the client. Also without any isolation, if the consumer takes a long time to process the event, this affects the entire blocking call from the client. The more consumers, the longer the entire call will take form the client. You want each consumer to execute/handle the event in isolation.

The publisher should be unaware of the consumers.

Publish Event

It doesn’t care if there are any consumers or if there are 100. It simply publishes events.

Subscribe and Consume Events

Blocking or Non-Blocking API calls?

Hopefully, this illustrated the various ways that blocking synchronous and non-blocking asynchronous can be applied to Commands, Queries, and Events.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Gotchas! in Event Driven Architecture

Event Driven Architecture has a few gotchas. Things that you always need to be aware of or think about. Regardless of whether you’re new to Event Driven Architecture or a seasoned veteran, these just don’t go away. Let me cover 4 aspects of event driven architecture that you need to think about.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

At Least Once Delivery

You’ll likely use a broker that supports “at least once delivery”. This means that a message can be delivered to a consumer at least once. Meaning it can happen once or more. Because of this, consumers need to be able to handle getting a duplicate message, or a message they’ve already consumed.

There are different reasons why this can occur, and the first is because of the message broker.

When a message broker delivers a message to the consumer, the consumer needs to process the message and then acknowledge back to the broker that it consumed it successfully.

Gotchas! in Event Driven Architecture

If the consumer fails to acknowledge to the broker, say because an internal exception occurred when processing the message, then the message broker after a given period of time (invisibility timeout) will re-deliver that message to the consumer. This could also happen if the length of time it takes to consume a message is greater than the invisibility timeout. If the consumer does not acknowledge before the timeout, it will get re-delivered, regardless of why.

The second reason why this can happen is because of the publisher of the message. If the publisher is using the Outbox Pattern, this could cause them to send the same message more than once. However, the publisher might also send the same message in different cases such as sending a different message with the same values. In either case, if you’re going to be negatively impacted by processing a duplicate message, then you need to make your consumers idempotent.

Consistent Publishing

Events become first-class within Event Driven Architecture in your system, you need to be aware of when and where you’re publishing them. Often times events are published to indicate something has occurred such as a state change. This means that your code needs to be consistent that if a certain action has occurred, you’re publishing the appropriate event. If you make a state change and do not publish the event, this can have a negative impact on other parts of your system that are expecting it.

As an example, here’s a transaction script that is related to a food delivery app. When the food delivery driver arrives at the restaurant to pick up the food, the is this first action they perform, the “Arrive”.

One of the things we are doing along with making some state changes is publishing an Arrived event.

Now the second action the delivery driver will perform is called the “Pickup” after they have arrived at the restaurant. Once they “pick up” the food, this is the action that is performed.

Some of the logic we have however is that if they haven’t yet done the Arrive action first, we’ll just let them do the Pickup. The problem is that we’re making the state change for “Arrive”, yet not publishing the Arrived event.

The gotcha here is that Events are now first class. When certain state changes occur, you might need to publish an event for other service boundaries. Making this simple state change without publishing the Arrived event could have a negative impact on other service boundaries. You want to narrow down and encapsulate state change sand publishing events into something like an Aggregate.

Bypassing API

You cannot bypass your API. Your API is what is responsible for publishing events when certain state changes or behaviors are invoked. Event Driven Architecture makes Events first class and needs to be reliable and consistently published.

As an example, the dotted outlined square is a logical service boundary. When a client makes a request to the App Service to perform some action, it makes some state change to its database.

It also then will publish an event to the message broker for possibly other service boundaries to be aware and consume that event.

If you were to bypass the API, no event would be published. You CANNOT have another logical boundary, or even yourself as a developer go in and manually change state/data in a database without publishing an event.

Gotchas! in Event Driven Architecture

In the example above, the event published might be used to invalidate a cache. If data is changed manually or by some other boundary then the cache will not be invalidated and will be inconsistent.

You cannot bypass your API since it’s controlling making state changes and publishing the appropriate event. State changes and Events go hand in hand.

Failures

Failures in Event Driven Architecture can have serious implications on throughput, performance and result in cascading failures.

As an example, a consumer is processing a message and when doing so needs to make an HTTP call to an external service.

The issue arises when the external service becomes unavailable or has any abnormal latency to process the HTTP request.

Gotchas! in Event Driven Architecture

When the normal processing time of a message is 100ms, and because a timeout with the external service turns your processing time into 1 minute, this can cause backpressure if you’re producing more messages than you can consume.

There are many different strategies for handling failures such as immediate retries, backoffs, circuit breakers, poison messages, and dead letter queues.

Gotchas! in Event Driven Architecture

Handling failures when processing messages is a critical part that is constantly needing to be addressed. Not all situations are the same but it makes you evaluate how you handle failures and also understand how different consumers have different processing and latency requirements.

Source Code

Developer-level members of my YouTube channel or Patreon get access to the full source for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Semantic Versioning is overrated

Package Managers (npm, nuget, pip, composer, maven, etc) are an incredible tool for leveraging other libraries and frameworks into your products or projects. What seems to have been largely lost in the decision-making process on if you should take on a dependency and what the implications are. Once you do take a dependency, how much do you trust the vendor/author and semantic versioning? Before you do a quick npm install or update let me explain some things you should be considering.

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything that is in this post.

Ownership

The theme of this post is really to emphasize that dependencies are your problem and you need to take ownership over the decision-making process of taking on a dependency.

Dependencies are nothing more than an extension of your own code. They require some serious consideration on what dependencies you’re taking on, why you’re taking the dependency, and should even be taking the dependency, to begin with.

Trust

Trust plays a role in taking on a dependency. If you’re working on a product that is going to be long-lived, you want to align yourself with dependencies that you’ll grow with. There’s a certain level of stability that you’ll want dependencies to have.

Meaning if you develop a product that lasts 10+ years, what dependencies will you take that will also live that long. These can either drag you down over time like an anchor or propel you forward with their own innovations.

As an example, I started a product in 2015 with .NET Framework 4.7 and ASP.NET just as .NET Core (now .NET5+) was emerging but not yet released. While the effort to move to ASP.NET Core took time and knowledge it was possible and was completed. We’ve kept up to date with .NET as it stands today and has been propelled forward by all of the advancements rather than being dragged down and being stuck on the .NET Framework.

Semantic Versioning

So where does Semantic Versioning fit into any of this that I’ve written about so far? Well, Semantic Versioning is nothing more than a way to communicate that something changed in a package that you depend on. That’s it.

While the intent is to identify the degree of change with major, minor, and patch versions, that’s only as good as the author’s intent of following that definition. For example, if the patch version changes because of a bug fix that is done in a backward-compatible manner, what’s your level of trust that it actually still is backward-compatible? Just because the API surface didn’t change doesn’t mean it’s backward-compatible. What happens if you have a work-around for a bug, that then is fixed. That’s not backward-compatible to you if it breaks you now that it’s fixed.

Semantic Versioning tells you something changed. That’s it.

Test

Even if the authors of a package have good intent and understanding of SemVer, that doesn’t mean what should be a non-breaking change won’t be breaking to you. The only way to satisfy this is to have the proper level of testing that validates your usage of a dependency.

You only need to concern yourself with the functionality and API surface that you actually use from that dependency.

Pin Versions & Upgrades

If you don’t have good test coverage or the ability to understand upgrades of the package, pin your versions. Don’t rely on ranges but rather have a cadence for manually dealing with updating packages.

The longer you wait to upgrade to a new version the more you’ll have to either change in the long run. If you don’t want to get on the upgrade train, don’t pick or bet on a dependency that’s constantly releasing. This could be because of feature enhancements that you don’t need or bug fixes due to instability.

It goes back to ownership and making a decision based on your context.

Context is King

If you’re working on a project or product that is going to be short-lived, almost nothing I’m talking about matters. You’ll likely not be even bothering to update any dependencies unless there’s some security issue you need to address.

However, if the project/product is going to be long-lived over many years, you’ll likely be constantly evolving the system. In doing so you’re going to likely need to update dependencies. The more dependencies you have the more work it is to manage them.

Source Code

Developer-level members of my YouTube channel or Patreon get access to the full source for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design