Have you replaced your DB because of the Repository Pattern?

Have you been able to replace your database implementation transparently because of the use of the repository pattern? While this is a controversial question and topic, I will explain why it doesn’t need to be. Sure, you’re creating an abstraction, but an abstraction around what?

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Repository Pattern

First, we need to agree on the definition of the Repository Pattern. I will use the definition in the Patterns of Enterprise Application Architecture.

Conceptually, a Repository encapsulates the set of objects persisted in a data store and the operations performed over them, providing a more object-oriented view of the persistence layer.

I think everyone can agree on that definition, but I think some of the confusion is around what a “set of objects” means. What type of objects are we talking about?

I’ve found two ways people think about objects in a repository. The first is thinking of them as a data model. It represents how you persist data in your data store and the mapping that goes along.

The second group of people comes at it from a Domain Driven Design perspective and thinks about a domain model of aggregates composed of entities and value objects.

There are generally, these two sides view a repository differently, hence why the use of a repository can become a hot topic.

Differences

On the surface, you might think of an aggregate or a data model with some hierarchy as being the same. But there’s a difference.

In the example above, I have a repository that fetches a basket and its collection of basket items. This object model hierarchy sure looks like an aggregate.

However, the difference between an aggregate and a data model is behaviors.

Behavior

With an aggregate, you’re exposing behaviors to make state changes. An aggregate represents a consistency boundary. The root entity is the entry point to invoke behaviors that will make changes to the state of the aggregate as a whole. Because of this, it’s also the place where business rules are enforced.

Here’s an example.

While this is a trivial example, AddItem ensures that only one item in our _items collection exists for a catalogItemId. Because items themselves don’t know about other items, this is why our Basket is the aggregate root.

If we worked with a data model, we could have this logic in a transaction script or somewhere higher up the call stack, but the point of our aggregate is to hold all the logic to where we make state changes.

Data

So, if you’re making state changes and need to encapsulate those behaviors within an aggregate, what situation would you want a data model?

Typically, you want to query data for UI purposes. In that case, we’ve established a difference between needing to make state changes and querying data. Guess where we’ve landed? You guessed it, CQRS.

So, when you start thinking about the difference between the two, why would I want to use a repository that’s returning an aggregate when I don’t need to make state changes?

Sure, you can use an aggregate for queries if you expose the data within them, but do you need all that data? Your aggregate is backed by all the data used for state changes to ensure consistency. Queries are often specialized for their use case. They don’t always need all the data. Ultimately, you’d often be over-fetching data if you returned an aggregate for query purposes.

You’ll likely conclude that then you only need a repository for commands. If a repository returns an aggregate, and an aggregate is used for state changes, that only happens when executing a command.

Queries are a use-case. As mentioned, it is often very specialized for a specific need to fetch specific data. Using a repository to over-fetch data and then having to transform it into you use case can add a lot of indirection and complexity.

Let your queries define how they retrieve data. Does this mean not abstracting your data access? It can. Or it might not. You make that decision. Now, you’re talking about coupling between queries and your database. Do you need an abstraction if you have ten queries coupled to your database? Do you need an abstraction if you have 1000 queries coupled to your database? Your answers may differ.

Repository Pattern

Thanks to David Fowler for tweeting this, as it inspired this video/blog. The answers to this question are going to be all over the place. Some don’t abstract or use the repository pattern because they think or have never replaced an underlying database. Another group of people always abstract and use the repository pattern because they have replaced their underlying database.

I think the key thing to be thinking about, however, is your use case. If you’re using a repository for commands and returning aggregates, that repository/abstraction will look very different than an abstraction that’s for querying data in a very specific way and/or doing transformations while querying the database and not in memory once data is returned.

A repository for an aggregate might have only a few methods. GetById, Save. There may be a few more, but you know exactly the aggregate you need to call. You’re not querying and filtering by other data elements, it’s generally by the key.

Join!

Developer-level members of my Patreon or YouTube channel get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

What’s enough Complexity for a Domain Model?

Should you be applying domain-driven design? Do you need a domain model? You might want to if you have a lot of complexity within your domain. I often get asked, “what is a lot of complexity?”. I’m going to provide an example by uncovering more and more complexity within a domain. And give a few different insights that should help you make the decision.

YouTube

Check out my YouTube channel, where I post all kinds of content accompanying my posts, including this video showing everything in this post.

Mounting Complexity

Through this post, I’m going to illustrate an example using a shipment, moving freight/goods from one location to another. Everyone should be able to relate to this. You order something online and have it delivered to you.

Let’s start with the simplest form. First, a vehicle will arrive at the first stop, often called the shipper, and pick up the freight.

No Complexity: Pickup

Then the vehicle will drive to the destination, often called the consignee, and deliver the freight.

No Complexity: Delivery

When considering an order you place online, this would mean a vehicle goes and picks up your order/package at a warehouse or store, then delivers it to your house/apartment.

No Complexity: Model

On the surface, the complexity within this domain model might seem minimal. A shipment will be associated with a vehicle, pickup, and delivery location. In terms of behaviors, we would be able to arrive at the pickup location, pick up the freight, depart the pickup location, arrive at the delivery location, and deliver.

This is pretty simple at this point in terms of our conceptual model. In terms of domain logic, it would be around making sure we arrive at our pickup location before we arrive at the delivery. This means we are doing shipment steps in the correct order.

More Complexity

As we dig into the reality of this domain, the complexity within this domain model is about to go up drastically. A vehicle isn’t always going to pick up a single package and bring it directly to the delivery location. What may also occur is bringing the shipment to a warehouse before the final delivery.

More Complexity: pickup

This means that Vehicle A might pick up the freight from our pickup location and then deliver it to a warehouse or some intermediate location.

More Complexity: Warehouse Delivery

The freight might sit at this warehouse location for a period of time.

More Complexity: Warehouse

Finally, when the product needs to move to the delivery location, we could have Vehicle B arrive at the warehouse to pick up the freight.

More Complexity: warehouse pickup

Vehicle B then drives to our delivery location to deliver the freight to the final destination.

More Complexity: delivery

We’ve added a bit more complexity. Now we need to keep track of which vehicles are involved in our shipment.

More Complexity Model

Now with this model, we could have many intermediate pickups and deliveries as needed. For example, we may have two different warehouses involved before our final destination.

More Complexity

Now let’s say that we have a shipment that the freight is split into multiple packages that need to be delivered to multiple locations.

First, the vehicle picks up multiple packages at the pickup location.

Even More Complexity: pickup

It then brings the freight to the first delivery location and delivers one of the packages.

Even More Complexity: Delivery #1

Then it drives to the other delivery location to deliver the final package.

Even More Complexity: Delivery #2

Now, we need to keep track of the freight itself to determine which delivery location it belongs to.

Even More Complexity: Model

Complexity. It never ends.

Let’s combine those last two examples: a warehouse or intermediate stop and multiple delivery locations.

Endless Complexity

Now depending on the size of freight and the types of vehicles, it’s not very efficient to have one shipment per vehicle. So this means that we have multiple shipments on a vehicle. And those individual shipments may have multiple pickup and delivery locations.

Is this starting to sound complex yet? Modeling the data might not be the most challenging part. However, all the business logic of making sure you’re routing these stops in the most efficient order, making sure you’re going to be on on-time for pickup and delivery times required, can get incredibly complex.

The reality of this domain has a lot of complexity that I’m just scratching the surface of. The point is that what looks pretty simplistic on the surface of a single package with a single pickup location and delivery location, in reality, goes much deeper and has a lot of complexity you need to manage.

What’s enough Complexity?

Do you warrant a domain model to manage the complexity? Here are a few questions to ask yourself.

Is it CRUD (create, read, update, delete)? Honestly, is it CRUD? When you’re working with a system that’s primarily CRUD, all of the workflow and business processes are in the end-users heads. They are the ones executing the workflow and entering the various pieces of data. They decide if X happens, then they need to input data into Y. The end user understands the workflow, not the software.

If you’re building a system that captures the workflow and will contain all that business logic, then yes, creating a domain model can be very beneficial as a way of organizing all that logic and complexity.

How many capabilities does your system provide? If you’re trying to decide if you should capture the workflow in your software, how many capabilities does it need to provide? If it’s minimal, it might not be needed.

If it is very limited, will it grow? Will the number of capabilities grow over time? It may start very small, but as its value is realized, more and more features are added.

The example I gave at the beginning may be the 80% use case. In that case, maybe you don’t need a domain model. But you do for the other 20%, which has a lot more complexity.

Not all models need to solve all the problems.

If you don’t know if you have a lot of complexity, then you don’t know enough about the domain and the problem the system is trying to solve. If this is the case, explore more about the domain, the problem space, and what it is you’re trying to solve.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design and access to source code for any working demo application I post on my blog or YouTube. Check out my Patreon or YouTube Membership for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design

Blocking or Non-Blocking API calls?

When should you use blocking or non-blocking API calls? Blocking synchronous communication vs non-blocking asynchronous communication? Should commands fire and forget? What’s the big deal of making blocking synchronous calls between services?

YouTube

Check out my YouTube channel where I post all kinds of content that accompanies my posts including this video showing everything in this post.

Communication

There are 3 different forms of communication that I’m generally thinking of. Commands, Queries, and Events. I’m going to cover all three and the pros/cons of using blocking synchronous or non-blocking asynchronous with each of them.

Commands

With commands, it’s all about users’/clients expectations. When the command is performed, do they expect to get a result immediately and the execution be fully complete? Because of this expectation, often times commands are blocking synchronous calls.

Blocking Command

If the client was requesting a state change to the service, the expectation is that if they want to then perform a query afterward, they would expect to read their write. There would be no eventual consistency involved. Blocking synchronous calls can make sense here.

Where asynchronous commands can be useful is if the execution time is going to be longer than the user might want to wait. This is an issue with blocking calls as the client could be waiting to get a response back to know if the command is completed. As an example, if the command had to fetch and mutate a lot of data in a database that could take a long time. Or perhaps it has to send 1000s of emails which could take a lot of time to execute.

Another situation where async commands are helpful is when you want to capture the command and just tell the user it has been accepted. You can then process the command asynchronously and then let the user know asynchronously that it has been completed. An example of this is placing an Order on an e-commerce site. When the user places an order, it doesn’t immediately create the order and charge their credit card, it simply creates a command that is placed on a queue.

Non-Blocking Command

Once the message has been sent to the broker, the client is then told the order has been accepted (not created). Then asynchronously the process of creating and charging the customer’s credit card can occur.

Consume Asyncronously

Once the asynchronous work has been completed, typically you’d receive an email letting you know the order has been created. Depending on your context and system, something like WebSockets can also be helpful for pushing notifications to the client to let them know when asynchronous work has been completed. Check out my post on Real-Time Web by leveraging Event Driven Architecture

Queries

Almost all queries are naturally request-response. They will be blocking calls by default because the purpose of a query is to get a result.

Blocking Query

This is pretty straightforward when a service is interacting with a database or any of its own infrastructure within a logical boundary, such as a cache.

Where this totally falls apart is when you have services that are physically deployed independently and they are calling each other using blocking synchronous communication (eg, HTTP, gRPC).

Blocking Query Service to Service

There are many different issues with service-to-service direct RPC style communication. I cover a bunch of them in my post REST APIs for Microservices? Beware!

One issue is latency. Because you don’t have really any visibility into the entire call stack without something like distributed tracing, you can end up with very slow queries because there are so many RPC calls being made between services to build up the result of a query.

The second issue and the worst offender is tight coupling. If a service is failing or unavailable, your entire system could be down if you aren’t handling all these failures appropriately. If Service D has availability issues, this will affect service A indirectly because it’s calling service C which requires service D.

Blocking Query Service to Service Failures

Congratulations, you’ve built a distributed monolith! Or as I like to call it, a distributed turd pile.

So how do you handle queries since they are naturally blocking and synchronous? Well, what you’re trying to do is view composition.

There are different ways to tackle this problem, one of which is to have a gateway or BFF (Backend-for-Frontend) that does the composition. Meaning it is the one responsible for making the calls to all the services and then composing the results back out to the client.

Query Composition

This isn’t the only way to do UI/ViewModel composition. There are other approaches such as moving the logic to the client or having each logical boundary own various UI components. Regardless, each service is independent and does not make calls to other services.

As for availability, if Service C is unavailable, that portion of the ViewModel/UI can be handled explicitly.

There is one option that I won’t go into in this post, which is asynchronous request-reply. This allows you to leverage a message broker (queues) but still have the client/caller block until it gets a reply message. Check out my post Asynchronous Request-Response Pattern for Non-Blocking Workflows.

Events

It may seem obvious (or not) but events generally should be processed asynchronously. The biggest reason why is because of isolation.

If you publish an event in a blocking synchronous way, this means that the service calling the consumers when the event occurs has to deal with the failures of the consumer.

Blocking Events

If there were two consumers, and the first consumer handled the event correctly, but the second consumer failed, the service/caller would need to handle that failure so it doesn’t bubble up back to the client. Also without any isolation, if the consumer takes a long time to process the event, this affects the entire blocking call from the client. The more consumers, the longer the entire call will take form the client. You want each consumer to execute/handle the event in isolation.

The publisher should be unaware of the consumers.

Publish Event

It doesn’t care if there are any consumers or if there are 100. It simply publishes events.

Subscribe and Consume Events

Blocking or Non-Blocking API calls?

Hopefully, this illustrated the various ways that blocking synchronous and non-blocking asynchronous can be applied to Commands, Queries, and Events.

Join!

Developer-level members of my YouTube channel or Patreon get access to a private Discord server to chat with other developers about Software Architecture and Design as well as access to source code for any working demo application that I post on my blog or YouTube. Check out the YouTube Membership or Patreon for more info.

Follow @CodeOpinion on Twitter

Software Architecture & Design

Get all my latest YouTube Vidoes and Blog Posts on Software Architecture & Design