DDD microservices evolutionary architectures
Code

DDD, Microservices, and Evolutionary Architectures: Antifragile Architectures

18 June 2025 - 11 minutes reading

In the previous article, I explained the importance of Event-Driven Architectures.

In this episode, I’ll explore the concept of antifragility and how it applies to agility and architectural decisions.

Agile & Architecture: A Complicated Relationship

Building on Nassim Taleb’s thinking in The Black Swan, the author emphasizes that risk management is not about predicting foreseeable or programmable events. Instead, it involves dealing with uncertain and non-linear scenarios—precisely the type Taleb describes. This reflection extends the principles of antifragility to organizations. But can we also apply these principles to system architecture?

When it comes to Agility, the role of architecture — or what that role should be — is no longer just a technical concern. It becomes a key factor that can influence the success or failure of Agile initiatives. Often, the architect’s role is weakened or sidelined. A survey conducted by IASA Global among 260 companies revealed that more than 75% had adopted Agile practices. However, fewer than 50% had successfully integrated architecture into those processes. This points to an ongoing struggle for Agile and Architecture to find a collaborative balance.

The paper suggests a third way: embracing Agile through the lens of Antifragility. In other words, rather than trying to control change—or abandoning control altogether—it proposes building systems, both technical and organizational, that benefit from change. This approach makes it possible to design architectures that not only support Agility but enable it, thanks to their ability to remain antifragile in the face of continuous evolution.

VUCA: A New Approach Is Needed

The complexity of today’s environment renders many practices from manuals written over twenty years ago ineffective. As Nassim Taleb states in his books (Antifragile and The Black Swan), what we need is not an approach that tries to control or manage chaos, but the ability to design systems that thrive within it.

We live in a constantly evolving world: business changes rapidly, as do customer demands, which often require frequent updates or new features for software already in use—or still in development. This scenario is captured by the acronym VUCA, representing four key characteristics of the contemporary context: Volatility, Uncertainty, Complexity, and Ambiguity. The ability to adapt and grow in such a volatile and uncertain environment has become more crucial than ever.

We are facing a new phase of the software crisis, first identified in the 1968 NATO report. Today, however, the implications seem even deeper and more widespread.

As Nicholas Negroponte reminds us:

Computing is not about computers anymore. It’s about living.

Given these considerations, it becomes clear that the VUCA world demands a radically new approach.

Embracing Complexity

In the early 2000s, Dave Snowden, while working at IBM, developed a “conceptual framework” to analyze and classify complexity: the Cynefin framework (/kəˈnɛvɪn/), a Welsh term meaning habitat.

I domini del modello Cynefin
The Domains of the Cynefin Model

According to this model, modern business systems are not merely complicated; they are complex. The distinction is significant. A complex system does not respond linearly to inputs or changes. When tackling a business problem, we tend to build a model of the system. However, by definition, a model is a simplification of reality: it cannot replicate the real world in every detail.

As Rebecca Wirfs-Brock aptly points out:

A model is a simplified representation of a thing or phenomenon that intentionally emphasizes certain aspects while ignoring others. Abstraction with a specific use in mind.

Accepting the complex nature of systems means recognizing the limits of modeling and learning to operate in an environment where uncertainty and unpredictability are the norm, not the exception.

What Makes a System Fragile?

Problems arise when the simplified model of reality becomes too distant from reality itself. In such cases, the answers it provides — if any — are no longer reliable. This is described by Nassim Nicholas Taleb in The Black Swan.
Business problems fall into the category of complex systems. They exhibit unpredictable behavior, and trying to solve them with unsuitable practices often leads to failure. This is the effect Taleb calls the Platonic fold: the illusion that a theoretical model can fully explain or control reality.

The solution to fragility is not to seek more control, but to accept unpredictability as an intrinsic feature of complexity. In this context, the goal is to design antifragile systems, which not only withstand change but grow stronger because of it.

A system’s fragility largely depends on its internal structure. As discussed in previous articles, the ability to evolve is linked to how its components are connected. Rather than cohesion, the degree of coupling between parts affects flexibility. Systems with loosely coupled components are easier to modify and less likely to produce unexpected side effects.

Change-Oriented Approaches

As early as 1972, David Parnas — a pioneer in software engineering — challenged the traditional approach to modularization. He argued that better software can be built by focusing on what will change over time rather than on immediate features. Along these lines, Juval Lowy’s IDesign approach highlights the difference between functional decomposition and volatility, proposing design alternatives that simplify implementation.
Unlike methods focused solely on functional division, these approaches emphasize elements prone to change, moving beyond the logic of immediate requirements. Kjell Jørgen Hole’s book “Anti-fragile ICT System” lists the fundamental characteristics of antifragile systems:

  • Modularity: independent components connected to each other;
  • Loose coupling: low degree of coupling between modules;
  • Redundancy: multiple instances of the same component to increase resilience;
  • Diversity: ability to address a problem with different solutions.

To effectively respond to the challenges of the VUCA environment, we must radically rethink how we design systems. As Taleb, Parnas, and Lowy suggest, it is essential to focus on what we do not know, accepting our limits and the natural uncertainty of the future.

This mindset also underpins the approach proposed by Dan North. At the start of any project, we should acknowledge that we are at least at the second level of ignorance, as described in Phillip Armour’s treatise “Five Orders of Ignorance“: we do not know what we do not know. North explores these ideas further in his article “Introducing Deliberate Discovery“, which encourages conscious exploration of the unknown.

Residuality Theory

This new approach to software architecture design is central to Barry O’Reilly’s book, “Residues: Time, Change, and Uncertainty in Software Architecture“. The author presents an innovative vision that focuses on the concept of architectural residue: what remains when everything else changes. It offers a practical way to address time, change, and uncertainty in complex systems in a structured manner.

La copertina del libro di Barry O’Reilly
The book cover of Barry O’Reilly

According to the author:

the future of a system is a function of its residues – the leftovers of the system after the impact of a stressor

Barry O’Reilly starts from philosophical premises to arrive at a mathematical formalization of his theory. The starting point is clear: traditional approaches fail to effectively represent concepts such as time, uncertainty, and change. Designing architectural resilience based solely on what we know — or think we know — without stepping outside conventional paradigms does not deliver the expected results.

The idea echoes Plato’s famous “Allegory of the Cave” or — for a more contemporary analogy — the film Matrix. In the Republic’s story, a prisoner chained inside a cave sees only shadows projected on the wall and mistakes them for reality. Once freed, he discovers the true source of those shadows. After initial confusion and a longing to return to the safety of illusion, he feels compelled to free others. However, not everyone is willing to face the sunlight and prefers to remain chained in their false security.

This is why the Matrix reference is intentional: it invites us to think beyond established frameworks, exactly as Residuality Theory proposes.

Un’illustrazione del “mito della caverna” di Platone.
An illustration of Plato’s “Allegory of the Cave.”

The Importance of Stressors

Barry O’Reilly advises against focusing too much on the initial version of the architecture, which he calls the naive architecture. Instead, it is more useful to analyze the stressors: events capable of altering the system’s behavior. A stressor can take many forms. It may range from a simple login error to a financial market crash, from a server outage to the outbreak of a war. Any element that breaks the model designed to solve a business problem can be considered a stressor.
This highlights the need to think outside the box. Adopting a lateral thinking approach is essential, as suggested by Edward De Bono in his book “Six Thinking Hats“.

To identify as many stressors as possible, a thorough exploration of the problem is required. This demands active involvement from the entire team, as discussed earlier. Once the stressors are identified, the software architect must evaluate their impact on the components of the naive architecture.

This task does not require complex tools: a simple Excel sheet is enough. The sheet lists the stressors and highlights, for each one, the potential effect on architectural components. For example, in the case of an ERP system for beer production and sales, we might face very different scenarios.

 Il foglio con gli stressori nel nostro ipotetico ERP di produzione e vendita birre.
The sheet listing the stressors for our hypothetical ERP system for beer production and sales.

As can be seen, stressors are not only technical in nature. They can also include unpredictable and sudden events capable of causing significant impacts on the target market.

Boolean Networks

Around 1960, Stuart Kauffman introduced the concept of the Random Boolean Network (RBN), a network composed of nodes with binary states (0 or 1) assigned randomly when stimulated. This behavior resembles that of complex systems made up of numerous interacting components. During his studies, Kauffman observed that connecting nodes in different ways could lead to variations in system stability and in the number of possible node configurations, called phase states.
Three key factors emerged that influence the number of phase states: the number of nodes (N), the number of connections between them (K), and the tendency of nodes to behave similarly (P). Kauffman also noted that increasing dependencies among nodes—by creating tighter connections—drastically reduced the number of phase states, sometimes by many orders of magnitude. In such cases, the system tends to stabilize into specific sets of states called attractors.

Kauffman’s Boolean networks have influenced numerous studies since attractors appear in every complex system. Despite the wide variability of possible combinations, a consistent order emerges, and the system tends to return to the same group of attractors. In social contexts, this explains the repetition of specific behavioral patterns; in software, the architectural patterns we adopt can be seen as attractors, as they solve recurring problems.

An attractor represents a state toward which the system stably converges. A simple example is how humans are drawn to rest when tired and to food when hungry. Similarly, a good software architect must identify the system’s attractors to ensure its stability and effectiveness over time.

Between Criticality and Balance

In his experiments, Kauffman identified the property of criticality: at a certain number of nodes (N) and connections (K), a system becomes resilient to unexpected events without excessively consuming its own resources. In our daily work, this dynamic is clearly reflected in the comparison between monolithic systems (characterized by a low number of N and K) and microservices-based systems (with a high number of N and K). From this comparison, it emerges how crucial it is to find the right balance between these two approaches.
A third parameter, P, represents the propensity (or bias) of a node towards certain behaviors. Applied to software architectures, this value serves to constrain and define the behavior of components within the system. For this reason, we adopt interfaces instead of concrete classes for communication between cohesive components. The principles and methodologies we use — such as OOP, SOLID, DRY, among others — aim precisely to properly balance N, K, and P in our systems. The challenge lies in finding this balance, which often relies on conjecture, experience, and continuous comparison.

This problem is addressed by randomly simulating the operational environment until the architecture shows signs of criticality. This is exactly the purpose of the Excel sheet: to record the level of stress the system is subjected to and verify whether the right balance has been reached.

It is important to distinguish between criticality and correctness. In a complex system, absolute correctness is unattainable, while criticality is a realistic and desirable goal. The role of the software architect is to identify the system’s critical point, not to guarantee mathematical perfection — a task reserved for mathematicians or developers.

As reminded in the well-known Software Architecture. The Hard Parts:

Don’t try to find the best design in software architecture. Instead, strive for the least worst combination of trade-offs.

Searching for Criticality

The first Excel sheet allows us to identify the stressors that could compromise the system’s operation. Once identified, we must intervene — keeping in mind that we are still in the design phase — to create a new system based on the residues of the previous one.
Next, the new system undergoes a further round of testing with updated stressors. As mentioned earlier, a system reaches criticality when it can maintain stability even in the face of unexpected stressors emerging. The outcome of this phase will be a new Excel sheet reflecting the system’s updated state.

Il foglio di calcolo con gli stressori ottenuti dai residui.
The spreadsheet with the stressors derived from the residues.

In the first column, we list our previous system, also known as the naive system, while in the second column we show the system derived from the residues. At this point, we can calculate the residue index:

In the numerator, we place the difference between the residual architecture and the naive architecture, while in the denominator, their sum. This yields an index ranging between

If the index is positive, it indicates that the resulting system is more resilient to unexpected stressors, although further improvement may still be possible. Conversely, if the index is negative, it means the intervention did not yield significant benefits or may have even worsened the situation.

When the index approaches zero, we can reasonably consider that the right balance has been achieved: the system is sufficiently stable to handle unforeseen situations.

Conclusions

Agility, understood as the ability to quickly adapt to change, is often seen as the key response to constantly evolving contexts. However, as we have seen, agility alone is not always sufficient. It depends on available resources and conditions, and by itself does not guarantee the necessary resilience. A weak architecture can rapidly compromise both the system and the business, limiting adaptability. The concept of Residual Architecture emerges as an innovative approach to designing antifragile systems, grounded in philosophical principles that are later supported by rigorous mathematical proofs.

Article written by