search menu icon-carat-right cmu-wordmark

OSDU and the Master Data Repository Pattern

May 2019 Presentation
Einar Landre (Equinor)

This presentation describes the Master Data Repository Pattern, which originated from the need to separate master data lifecycle management from application-specific databases.


Software Engineering Institute




The master data repository pattern originated from the need to separate master data lifecycle management from application-specific databases. Master data represents the business objects that define a sector or industry. In the upstream oil and gas industry, Well and Reservoir are good master data examples. In health care, Patient and Hospital are good examples, and in retail you find Item and Store. Master data objects share most of the characteristics found in domain-driven design entities. They have unique business-defined identity and lineage; that is, they can tell the story of their life. Where they differ is in scope. Domain-driven design teaches how to model entities in a bounded context. Master data objects are entities with global reach, spanning multiple bounded contexts.

Using Well as an example, during geo-modeling the well is defined as a trajectory from a place on the earth's surface to the reservoir with an expected production rate. The task is to find the best possible trajectory. During drilling the well is a trajectory, but now broken down into sections with operations and other attributes required by the construction process. In production the well consists of slots with estimated and measured rates. Historically these three aspects of a Well have been managed by at least three different applications and the data stored in at least three different data stores. The downside to this approach is lost lineage and even loss of identity across stores.

To harvest the value from analytics and machine learning, data must be freed from application-specific stores. The business case is flexible use of lifecycle data independent of lifecycle phase. For an oil company, using actual rates from existing wells when planning new wells has a lot of value. Understanding possible correlations between production rates and well designs becomes dollars on the bottom line. Known approaches to this challenge are data virtualization or the use of a data lake. The main shortcomings of these approaches are lack of lineage and mixed-up identities.

Master data objects are best understood as a graph of life events. The underpinning storage machinery must support this property. Client applications have access to three standardized groups of APIs: ingestion, search, and delivery. In terms of domain-driven design, we create a bounded context responsible for identity and lineage and call it the master data repository.

At the moment of writing, we know there are many challenges ahead. The first demonstration is planned for the end of January, with a working system for Well up and running by the end of Q1 2019 on both AWS and Azure.

As we see it, there are two innovations in play: the collaborative approach among competitors to establish a standard for their master data and the early involvement of global cloud providers to support the standards in their service offerings. Last, we think this pattern has potential within other sectors and are interested in hearing the voice of the SATURN audience.