I’ve always said that DDD has at its core principles the way to approach the problem by starting from the business needs, and now I find myself writing that DDD deals with how to collect data?
Speaking of modern architectures, one of the key strategic patterns of DDD, the Bounded Context, has always been linked to the concept of microservices. Bounded Context, Context Mapping, and Ubiquitous Language are the key patterns for dividing the solution into well-defined areas, with clear and explicit communication contracts. This concerns the impact of DDD in the so-called backend world.
In recent years, not satisfied with having split the backend monolith, we invented microfrontends, a way to divide the frontend into many Bounded Contexts, and here again, the strategic patterns of DDD come to support us, to cope with the growing complexity on the frontend side in modern software solutions.
Peace of mind for us developers seemed to have been reached; now, the chain of problem management on the business side seemed complete, starting from a microfrontend, which refers to REST services exposed by the related microservice, up to the total and complete mastery of the underlying database, whether it is a CRUD service or a service with CQRS/ES patterns.
But in all this, where does data collection fit in? And before that, is there a difference between the data collected in the databases of individual microservices, which we will call Operational Data, and those collected in Data Lakes, i.e., databases intended to collect data for statistical purposes to support ML models, which we will call Analytical Data?
These are the data saved in the databases of the microservices of our applications and have the responsibility of maintaining the current state of the business. They are essential data for the continuous choices that the users of our software must constantly make. A very important, indeed fundamental, aspect is the fact that these data are absolutely private, so they are not accessible by other services. It is the microservice that manages them that decides what, and how, to share them with the outside world.
This is the historical, integrated, and aggregated view of the data created as a byproduct of business management. It is maintained and used by OLAP (online analytical processing) systems. Analytical data are the temporal, historical, and often aggregated view of the company’s facts over time. They are modeled to provide retrospective or future insights. Analytical data is optimized for analytical logic, training machine learning models, and creating reports and visualizations externally, i.e., the data directly accessed by analytical consumers.
In all this, what role does DDD have? Data Mesh, directly from the author Zhamak Dehghani, is a decentralized sociotechnical approach to sharing, accessing, and managing analytical data in complex and large-scale environments, within or between organizations.
Compared to previous approaches in managing analytical data, Data Mesh introduces some changes that the following figure, taken from the eponymous book, brilliantly summarizes.
From the outside, Data Mesh may seem like a return to data silos; the fact of dividing data collection by Domain could suggest a return to the past, but Zhamak Dehghani, who first proposed this approach, has defined four fundamental principles that avoid, among other things, this danger.
These principles are designed to increase the value of data, support agility during business growth, and embrace change in a volatile and complex business context.
But what are these principles?
- Domain Ownership Principle
- Data as a Product Principle
- Self-Service Data Platform Principle
- Computational Governance Principle
To fully understand what really changes compared to a traditional approach, it is useful to understand that in Data Mesh we talk more about publication, rather than importation, of data, since it is more important to identify and use data rather than extract and load it into some Data Lake.
Let’s look a little more in detail at the properties of each individual principle.
Principle of Domain Ownership
Inspired by the concept of Domain expressed in DDD, data is organized into Domains; in a strict sense, a Domain includes a set of data homogeneous with respect to some criteria, such as origin, aggregation, and consumption of the data. In this approach, there are several aspects that we can define as revolutionary compared to the traditional way of data collection. First of all, the data management process is placed under the ownership of an interdisciplinary team of actors all operating within the sphere of competence of this data, and again, the influence of DDD is noted in this principle, emphasizing the importance of Business Experts to fully understand and exploit the potential.
This differentiation from a Data Lake model, which sees the channeling of all operational data into a single environment, reduces the information entropy arising from the fact that the figures overseeing the Data Lake itself cannot be experts in all the relevant domains.
The motivations supporting this principle are diverse:
- The ability to scale data sharing aligned with the axes of organizational growth: increasing the number of data sources, increasing the number of consumers of this data, and increasing the diversity of use cases for consuming the data.
- Optimization of continuous change thanks to the localization of change in business domains.
- Support for agility by reducing the need for synchronization between teams, removing bottlenecks in having a centralized data collection team that refers to a single Data Lake.
- Increase the resilience of machine learning solutions by removing complex intermediate data collection pipelines.
The risk of breaking down data by domain is creating silos, potentially compromising integration and overall coherence. However, to avoid this, there are other principles to follow.
Principle of Data as a Product
Much like Context Mapping resolves the communication relationship between various Bounded Contexts, the Principle of Data as a Product indicates the way in which individual domains interact to ensure smooth and efficient management of business processes. We have already mentioned that Data Mesh is oriented to manage analytical data, i.e., data derived from aggregations and/or integrations, but it should also be noted that these data can have different purposes, for example, internal reporting through dashboards and/or reports, analysis and research, perhaps exploiting Machine Learning models, or even support for downstream operational processes that generated them.
A Data Product refers to a collection of data, accompanied by the necessary code for its consumption and metadata describing its content, accuracy, age, sources, ownership, consumption methods, in short, its characteristics. It is created within the Data Domain and is intended for both internal consumption by the Domain itself and consumption by other Domains, thanks to a central catalog that registers all data products and ensures adequate interoperability between Domains.
To guarantee all this, a Data Product must meet a minimum set of eight criteria, so it must be:
Principle of the Self-Service Data Platform
The application of the Data Domain principle alone, as we mentioned, leads to the risk of generating isolated silos. The risk is mitigated by the second principle we explored, which promotes interoperability between the domains themselves. The Self-Service Data Platform principle leads to a new generation of data platform services that allow interdisciplinary domain teams to share data. The services found in this platform are focused on eliminating friction that may exist in the end-to-end data sharing journey, from producer to consumer. These services manage a reliable network of interconnected Data Products, both in linear and graph form. One of the purposes of this type of platform is to simplify the user experience in discovering, accessing, and using Data Products. The motivations supporting this principle are:
- Reduce the total cost of decentralized data ownership.
- Abstract the complexity of data management and reduce the cognitive load on domain teams in managing the Data Product lifecycle.
- Mobilize a generation of more generalist technology developers for the development of Data Products, reducing the need for specialization.
- Automate governance policies to create security and compliance standards for all Data Products.
Principle of Federated Computational Governance
The aim is to avoid the proliferation of data silos at all costs. This principle leads to the creation of a data governance operational model based on a federated decision-making and responsibility structure, thanks to a team composed of domain representatives, data platform representatives, and business experts. The operational model creates a structure of incentives and responsibilities that balances the autonomy and agility of domains with the global interoperability of the network and the automation of management policies for each Data Product.
The reasons behind this principle are:
- The ability to obtain higher-order value from the aggregation and correlation of independent but interoperable data products.
- Counteract the unintended consequences of domain-oriented decentralization, such as incompatibility and disconnection of domains.
- Enable the inclusion of cross-cutting governance requirements, such as security, privacy, legal compliance, etc., in a distributed Data Product network.
- Reduce the overhead of manual synchronization between domains and the governance function.
The Interaction of the Principles
Zhamak Dehghani’s basic idea is that the four principles are collectively necessary and sufficient. In fact, they complement each other, and each addresses and resolves the challenges generated by the others. The following figure illustrates the interaction between them.
I have certainly only scratched the surface of the problem, and I will surely return to discuss it. DDD has always fascinated me, and I am sure that I have not yet learned everything it has to teach, and the proof is this approach to data management using the DDD patterns themselves.
I highly recommend to those who are interested in exploring the topic further, to read the book by the author of Data Mesh, Zhamak Dehghani