Over the years I have worked on a number of systems that would be classed as event-driven, and some of these have also implemented various parts of the CQRS pattern. More recently I’ve been working on systems and services that utilise the full gambit of CQRS and Event Sourcing. These have primarily been built using Axon Framework to provide the technical infrastructure.
While these approaches generally lead to really well structured systems, as with all software development, there are always some anti-patterns that tend to emerge. These make the solutions more difficult to understand, maintain and enhance.
Over time I’ve collected up a set of these anti-patterns, and I’ve now grouped them all into this blog post. Some are very specific to CQRS and Event Sourcing, while others are related more generally to software structure or event-driven systems.
Where necessary to discuss in terms of implementation detail, I’ve focussed on how they apply specifically within Axon Framework. However, many of the details can be applied to a much wider software engineering context.
The remainder of this post looks in detail at eight specific CQRS and event-driven anti-patterns.
Background
The CQRS pattern when combined with Event Sourcing allows the creation of very powerful and flexible event-driven software solutions. The excellent Axon Framework provides a great technical platform to base these implementations on. (Click the links for a primer on each of these concepts).
Anti-Patterns
Anti-Pattern 1: Querying the Query Model to make Command decisions
Most implementations of event-driven CQRS systems tend to adopt an approach of eventual consistency: the Command Model updates its model state in response to a Command and then raises Events that are dispatched and processed asynchronously. These Events are received by the Query Model, which uses them to update its own view state. In the period between the Command Model raising the Event and the Query Model processing it, the two models can be, temporarily, inconsistent in their state.
This anti-pattern occurs when logic within the Command Model accesses the Query Model, during the period of inconsistency, in order to make business rule decisions. This can lead to a race condition where the logic reads stale data from the Query Model and therefore carries out an incorrect modification to the Command Model.
How to avoid: In any CQRS system there should be a clear separation between the Command and Query Models. Any actions or decisions within the Command Model should only ever work on the state held within that model and should never reference the Query Model in any way.
Axon Framework: this would usually be achieved by encapsulating any decision logic within the Aggregate object that holds the state that needs to be used. Controlling or Saga code would then send a Command to the Aggregate that would either raise an Event to signal continuation of processing, or throw a business exception to indicate that processing has failed.
Anti-Pattern 2: Wrapping Events inside other Commands or Events
One of the key benefits of the CQRS approach is the ability to define a domain in term of Commands and Events that are meaningful to the business. A very common approach to identifying the business flows is the technique of Event Storming. This leads to output such as this, which is taken from the Axon Framework Developer Portal:
As can be seen from the diagram, we have a clear flow of Commands and the Events that get generated when those Commands are processed. We can also see how the Events get reflected into the view state of the Query Model. When implementing this in something like Axon Framework we directly translate these into code that contains Command and Event classes, and handlers for each of these. It should be easy to trace through the code to see which Commands are handled where, the Events they publish and where those in turn are handled.
This anti-pattern generally occurs when engineers are trying to create some form of generic domain
model: one that tries to over-generalise the business flows; or that aims to be highly flexible for
some future (as yet undefined) use-cases. The result is that see things in the domain model like a PublishEventCommand
or a
ProcessingFinishedEvent
, that don’t really have any meaning to the business, but which contain some
event type that does have business meaning.
The key problem with this anti-pattern is that it obscures the flow of the business process through the code, making it difficult to relate back to the original event stormed workflows. It also makes the code much more difficult to understand and maintain - increasing costs and slowing down future deliveries.
How to avoid: Don’t introduce unnecessary generic or re-use complexity into the domain model implementation. The Commands and Events should always be only those identified during the business event storming sessions. These should all be directly published and handled in the code and should never be placed inside generic Command or Event wrapper classes, even if this looks like a convenient way to support re-use or apply DRY principles.
Anti-Pattern 3: Sending Events direct to Aggregates rather than sending them Commands
Generally in a CQRS implementation, and specifically in Axon Framework, Commands are created and are sent to Aggregate objects for processing. These Aggregates represent an entity in the domain model such as an Order or an Account. The Aggregate receives the Command, and using the information inside the Command combined with its current state it decides what business processing to carry out. If this business processing results in a change of state then the Aggregate publishes an Event representing this change.
This anti-pattern happens when, rather than making Aggregates react to Commands they instead are constructed to react directly to other Events that are happening in the system. This is typically achieved by employing the previous anti-pattern and wrapping the Event inside a generic Command, sending that to the Aggregate and then unpacking the Event and handling it inside the Aggregate logic. The most common reason for this approach is to try to minimise the amount of translations between Events and Commands and to reduce the number of Command handlers required.
This is typically a symptom of tying to create more generic solutions. It causes the same problems as the previous anti-pattern in that it obscures the flow of the business process, makes it difficult to relate back to the original event stormed workflows, muddies understanding and thus increases maintenance costs and slows future delivery.
How to avoid: Make sure that Commands are always the message that initiates work on an Aggregate. When there is an Event that occurs that should trigger more processing on the Command Model then it should always be handled outside the Aggregate (by a Saga, or generic Event Handler if processing doesn’t require workflow state). This should convert the Event into the next Command. Don’t wrap events inside a generic Command just to avoid writing separate Commands and Command Handlers.
Axon Framework: It may be tempting to adopt this pattern to minimise the number of Command types and reduce the number of Command Handlers in an Aggregate. Bear in mind that an Aggregate is just the top-level container and that they can be composed of multiple Entities that can be the direct recipients of Commands as well. An Aggregate that is handling large numbers of different Commands is probably an indication that it should be composed from a number of smaller Entity classes.
Anti-Pattern 4: Using fire and forget Commands
This anti-pattern is more specific to Axon Framework, but could also apply to any CQRS systems that allows Commands to be sent into the Command Model without waiting to check if they were applied successfully. Axon Framework supports the ability to just dispatch Commands and not wait for a confirmation. It also supports dispatching Commands and then waiting to see if the processing of that Command succeeded or not - this can be achieved both synchronously and asynchronously.
The preferred approach is to always await a result when a Command is sent into the Command Model. In this way the sender can react properly to failure cases, such as by retrying, reporting an error, or instigating some form of recovery flow. By just sending off a Command and not dealing with the possibility of failure it is much more likely that eventually there will a case where the Command Model is left in an inconsistent state.
Sometimes a manifestation of this anti-pattern is the creation of a dead-letter dump for Commands that were dispatched but never had errors checked or handled. This then creates a manual process in the system to review and replay these Commands - and manual processes rarely scale as system volume increases. Another anti-pattern possibility is that the Command Handler raises specific failure Events to indicate that a problem occurred, thus polluting the Event space of the domain model.
How to avoid: Whenever Commands are dispatched, the caller should handle the case where the Command Handler fails and apply retry or recovery logic accordingly. The Command Handlers should raise business exceptions to indicate that Commands were not handled rather than dumping failures on a dead-letter queue or raising some type of failure Event. Senders of Commands should approach Command dispatch like a function call where the result (or an exception) should always be handled.
Axon Framework: The Axon Framework CommandGateway
supports various mechanisms for
processing the results from Command Handlers, including:
- A blocking wait for a result or exception
- The ability to supply a callback function that is called with the result or exception
- A
CompletableFuture
that can be used to handle the result or exception asynchronously
Anti-Pattern 5: Updating the Query Model with an Event that isn’t captured in the Event Source
In the CQRS pattern the Query Model is a snapshot of system state used for answering Queries on a particular view of the system. Although the Query Model is usually persistent, the information contained within it should be considered transient in nature. It should always be possible to clear the Query Model and then repopulate it by replaying, in order, all of the Events from the Command Model. In many CQRS implementations these Events are obtained from the Event Sourcing store. After doing this, the Query Model should return to the same state that it was in before.
There are also other things that can be done through replaying of the Command Model Events, resulting in the population of specialist Query Models such as an elastic search or as input to train a machine learning system.
This anti-pattern occurs when the Query Model is sent an Event that isn’t captured as part of the Event Sourcing store in the Command Model. When this occurs, the Query Model is no longer a consistent view of the Command Model and thus couldn’t be recreated through the replaying of previous Events. The same issue occurs if any application code directly updates the Query Model persistent store.
How to avoid: The only update that should ever be applied to the Query Model state must occur within an Event Handler. It should also only ever use data that is contained within the Event object that is sent to the handler. Additionally, only Events that have been persisted within the Command Model Event Sourcing store should be accepted by the Query Model Event Handlers.
Axon Framework: Aim that the only Events that are ever handled by the Query Model are those that have been published by an Aggregate. Avoid using Events created by Sagas or standalone Command (or Event) Handlers for this purpose. A better approach is always to dispatch a Command to an Aggregate and have that generate a proper Event that gets persisted into the Event Source and then propagated to the Query Model.
Anti-Pattern 6: Creating complex inheritance hierarchies of Commands, Events, Aggregates and Sagas
This is probably the most generic of the anti-patterns, in that it applies to every system that you create, not just those built using CQRS and Event Sourcing. Inheritance used poorly is bad and always, in my experience, creates code that is deeply tangled and impossible to maintain.
That’s not to say that inheritance should be completely avoided, but its use should be restricted only
to places where an inheritance structure exists within the business domain itself. For example, a DepositAccount
and a SavingsAccount
are both subtypes of Account
that could appear in the domain model and be
meaningful to a business expert.
Where this becomes a real anti-pattern is when inheritance is applied to try to create a more generic and re-usable underlying structure that doesn’t exist within the business domain. There’s plenty of good reading material on this subject, and I will definitely have a dedicated blog post on this in the future.
For the moment it is suffice to say that not only does inappropriate use of inheritance create poor quality, unmaintainable code but within a CQRS system it makes it really hard to follow the path of Commands and Events through the business workflow. Having a hierarchy of Commands and Events that might be handled as instances of sub-classes in one place and as their super-class in another is really confusing. Couple this with a hierarchy of Aggregates and Sagas where Command and Event Handlers are spread across different levels and you start entering nightmare territory. Expect maintainability to plummet and delivery to be slowed massively if the code reaches this state!
How to avoid: Only use inheritance when there is a specific hierarchy that is clearly visible in the domain model itself. Never use inheritance as an approach to creating generic and re-usable codebases. For re-use purposes always prefer a composition model by extracting shared concepts into injectable shared components instead. It’s usually better to accept a bit of duplication across loosely coupled classes than trying to super-optimise along DRY principles by inheriting from a common base class or interfaces. Keep Commands, Events, Aggregates and Sagas in a flat hierarchy wherever possible and consider very carefully any time inheritance seems like a possible option.
Anti-Pattern 7: Building technical solutions from non-business Events
CQRS and Event Sourcing sit at the same level as Domain Driven Design: they are patterns for modelling and structuring systems at the business domain level. They are not intended to be techniques that you use to produce implementations of complex technical patterns.
Axon Framework combines together a really powerful set of tools and concepts. Using all its features, it is quite possible to implement things such as workflow engines or solutions for managing rate-limiting access to external resources. But should we really be doing this?
The answer is no. Attempting to do so is a definite anti-pattern that leads to codebases being much more complex and difficult to maintain than they should be. Given a powerful CQRS and Event Sourcing framework, it’s easy to get drawn into the “when you only have a hammer, then everything looks like a nail” way of thinking.
How to avoid: Only use CQRS and Event Sourcing for dealing with Commands and Events that occur within the domain model. If you experience the urge to start creating technical Events then you need to stop and ask yourself is there a better technology or library that I could utilise to implement this part of my system? The answer is usually always yes.
Axon Framework: Don’t use it as a general purpose framework, even though it has many similar properties. Use it just for its intended purpose and use other frameworks in combination.
Anti-Pattern 8: Creating a differentiation between domain and ephemeral Events
In a properly designed CQRS system, almost all Events that occur within the Command Model should be domain Events that we want to persist into the Event Sourcing store. By this we mean that they are Events that capture the state changes of an Aggregate. In addition, these Events are used to populate the Query Model and may also be used to progress a Saga to the next step in a business workflow.
There are, however, some occasions when the system may want to raise an Event that can indicate a transition from one workflow step to another without the state of the system having been changed. An example of this could be an Event that is scheduled to be sent to a Saga at some time in the future when an invoice becomes payable. This kicks off a new stage of processing that then results in state changes taking place. These types of Events don’t get stored in the Event Sourcing store as they don’t impact system state and are meaningless in any Event replay scenario. That said, these Events should be treated exactly the same way as normal Events, they just never get handled by an Aggregate’s Event Sourcing handlers.
Where this particular anti-pattern kicks in is when a system starts to create a completely different Event mechanism specifically for handling these non-domain Events. Typically this approach names them something like “Ephemeral Events” to differentiate them from those of the domain. There are two common reasons why this anti-pattern emerges:
- The previous anti-pattern is being applied and there are a bunch of technical solutions being implemented using CQRS and it has become apparent that the Events for those shouldn’t be taking up storage in the main Event Sourcing store.
- The architecture has evolved into a true event-driven system and these are all being routed through the CQRS framework rather than through a proper event platform.
How to avoid: Consider very carefully any time you come across the need to create an Event that won’t be stored in the Event Sourcing store. Check that you haven’t missed some domain model state change that should be tied to this Event. Even Events that don’t get stored should still have some meaning in the business domain, and any Events that don’t come from the business domain indicate a design smell. If you find that the number of non-Event Sourced Events in your system is growing, then really start considering whether you are, in fact, building a broader event-driven solution and adopt an appropriate eventing platform rather than using your CQRS system to implement ephemeral Events.
Axon Framework: There are some connectors that allow bridging between Axon Framework and other event-driven solutions, such as Apache Kafka. Consider these in preference to using Axon Framework for non CQRS and Event Souring functionality.
Summary
This post has looked at eight different anti-patterns related to CQRS and Event Sourcing. Some of these are also relevant in a wider software engineering context. We’ve also looked at how you can avoid falling into these anti-patterns by considering the correct use of CQRS concepts.
The most important take-away should be that CQRS and Event Sourcing should only be used for Commands and Events that appear in the domain model. These are identified using business facing design techniques such as Event Storming.
As soon as your solution starts to include Commands or Events that are not part of the business domain then this indicates that you are starting to misuse the CQRS pattern. At that point, take a step back and consider what you building and whether there better approaches to building these non-domain parts of your solution.
comments powered by Disqus