Entity Aggregation

 

Patterns and Practices home

Integration Patterns

Start | Previous | Next

Contents

Context

Problem

Forces

Solution

Example

Resulting Context

Testing Considerations

Security Considerations

Operational Considerations

Known Uses

Related Patterns

Context

Enterprise-level data is distributed across multiple repositories in an inconsistent fashion. Existing applications need to have a single consistent representation of key entities which are logical groups of related data elements such as Customer, Product, Order, or Account. Moving data between these repositories may not be a viable option.

Problem

How can enterprise data that is redundantly distributed across multiple repositories be effectively maintained by applications?

Forces

The following forces have to be considered in this context:

  • There may be multiple systems of record for the same entity. Business rules and processes could dictate the manner in which the system of record is determined for a given entity. For example, an employee entity is usually defined in human resource management system (HRMS) applications, in payroll applications, and in benefits applications, as well as in other systems. Each system defines its own view of an employee. However, if you are building an employee self-service portal, you need to have a complete view of what constitutes an employee and not just the bits and pieces.
  • Often, semantic dissonance exists between the data values represented within the same entity across repositories. The same data element may not represent the same information in multiple repositories. For example, the data element NumberOfDaysRemaining for a project task might include all days including holidays in one repository, but it might include only workdays in another repository.
  • Even when the data elements are semantically consistent, the information they represent might vary across parallel instances of the data element. In such cases, it may be difficult to determine which instance of the data element is accurate. For example, in a financial institution where there are dedicated repositories for various customer communication channels, the Available Balance entity in one repository may not be the same as the Available Balance in another repository. For example, the Available Balance in the ATM database may not be the same as the Available Balance in the repository serving the online banking channel.
  • Invalid data might have crept in through other entry points into the repositories. All the entry points may not enforce all the business and data validation rules in a consistent fashion. This is typical of mainframe systems where the validation logic enforced in the screens may be outdated and therefore not enforced in the enterprise's newer applications.
  • Referential integrity of data across multiple repositories may have been violated. This happens due to absent or malfunctioning data synchronization processes. In a manufacturing environment, it is critical that the product data management (PDM) system always be concurrent with the order management system. Orders entered in the order management system that have an invalid reference to a product can violate the referential integrity of the product data across the respective repositories.
  • Applications may need logical subsets of the data elements that may not be available in a single repository. Therefore, business logic may have to be applied to properly group the data elements into the logical subset. For example, a banking customer maintains different kinds of information across various repositories. Personal information is stored in the customer information file repository; account balance is stored in a financial application repository; and loan information is stored in the mortgage lending repository. When the customer accesses the online banking site, the nature of the customer's request determines the subset of the information to be presented. An address change request needs data from the customer information file repository, but an inquiry on the outstanding balance for all accounts and loans requires specific data from all three repositories.
  • Data synchronization processes may already exist between repositories that permit one repository to act as a front end to the other. In these cases, the synchronization mechanism is better left untouched. This is typical where database replication is used across two separate database instances that use the same underlying technology.

Solution

Introduce an Entity Aggregation layer that provides a logical representation of the entities at an enterprise level with physical connections that support the access and that update to their respective instances in back-end repositories.

This representation is analogous to the Portal Integration pattern, which presents to the end user a unified view of information that is retrieved from multiple applications. Similar to the portal layer that provides this view for the application front ends, the Entity Aggregation layer provides a similar view across the data in the back-end repositories as shown in Figure 1.

Figure 1. Entity Aggregation

Establishing Entity Aggregation involves a two-step process:

  1. Defining the enterprise-wide representation that provides a consistent unified representation of the entity.
  2. Implementing a physical bidirectional connection between this representation and its respective instances in the back-end repositories.

The following example explains this process in more detail.

Figure 2. Environment without Entity Aggregation

Figure 2 shows two applications that access their respective back-end repositories for information about the Phone Number entity within two different enterprises: U.S. Enterprise and the Europe, Middle East, and Asia (EMEA) Enterprise. Both applications maintain the information about the phone number within their respective repositories.

Each application follows the respective domestic convention for representing phone numbers in its back-end repository. The U.S. representation of the entity includes the area codes, the exchanges, and the numbers. The EMEA representation, on the other hand, represents the same information using the country code, the city code, the exchange, and the number.

As part of a merger and acquisition exercise, these enterprises merge to form a new logical enterprise. Both applications have to access the information in both repositories. Therefore, the phone number now has to be represented at an enterprise-wide level that includes both the U.S. and the EMEA business units.

Figure 3. Environment with Entity Aggregation

Figure 3 shows the manner in which Entity Aggregation can facilitate the seamless representation of the Phone Number entity across both repositories.

The first step in establishing this layer involves defining the enterprise-wide representation of the Phone Number entity.

The Phone Number entity within the Entity Aggregation layer includes attributes that are unique to each enterprise. The Phone Number entity also includes attributes that are common across both enterprises. Thus, Country Code is included because it is an attribute unique to the EMEA enterprise. Similarly, because Exchange and Number are common attributes across both repository instances, they are also included. Even though Area Code and City Code are unique to each enterprise, their basic representation and purpose is identical. Therefore, the Entity Aggregation layer representation chooses to include the Area Code while using this field to store the City Code information from the EMEA repository.

The next step involves building the physical connections between the Entity Aggregation layer and the back-end U.S. and EMEA repositories. The technology driving these connections depends on the repository being accessed.

Approach

There are two architectural approaches to implementing Entity Aggregation:

  • Straight-through processing
  • Replication

Depending on the architectural characteristics of the entity instances to be integrated, a combination of these approaches may be required.

Straight-Through Processing

A straight-through processing approach fetches information from the respective back-end repositories in real time and correlates the information into a single unified view. This implies that the Entity Aggregation layer has real-time connectivity to the repositories and should be able to associate the disparate instances of the entity.

Replication

The replication of entities for use by the Entity Aggregation layer is required when the following conditions are true:

  • Real-time connectivity to the repositories is absent.
  • Complicated joins across multiple instances of the same entity across various repositories is required to provide a consistent representation.
  • High performance is vital to the solution.

This approach requires a separate physical repository within the Entity Aggregation layer that stores data conforming to the enterprise-wide representation of the entity. The data in each back-end repository is replicated into the Entity Aggregation repository. This replication requires the implementation of supporting processes to enforce the business rules that validate the data being replicated. Replication should be performed both ways between the back-end repositories and the Entity Aggregation repositories.

In many respects, this approach offers capabilities very similar to those supported by a data warehouse. Data warehouses originally were built with the intent of summarizing transactional data that could be used for business intelligence and trends analysis. In many large enterprises today, data warehouses have transformed into yet another repository within the enterprise. They do not always serve as the enterprise-wide unified representation of the data. However, such data warehouses have a good baseline definition for enterprise-level entities, and the enterprise-wide representation of an entity can be built on top of this definition.

Design Considerations

Effective design of an Entity Aggregation layer requires several issues to be given due consideration. These issues may be broadly classified as follows:

  • Data representation. Data representation includes entity representation and schema reconciliation. Entity representation is the definition of the enterprise-wide representation of the entity with its attributes and key relationships to other entities. Schema reconciliation involves reconciling the varied definitions of the underlying schema across repositories. In addition to the data representation being defined, the format in which the representation is stored must be established as well.
  • Data identification. Introduction of a new layer of data representation requires the establishment of an appropriate mechanism that uniquely identifies each entity across all repositories, including the Entity Aggregation layer itself. An entity reference is one such mechanism.
  • Data operation. Data operation includes the manner in which transactional operations are performed on the data. This includes Create, Read, Update, and Delete (CRUD) actions, and it includes any compensatory measures thereof. For more information about this consideration, see "Inquiry vs. Update" later in this pattern.
  • Data governance. Data governance involves establishing ownership of ongoing maintenance and establishing change management processes to direct the ongoing maintenance of entities and their data elements. These processes also help refine the integration requirements by rationalizing the number of data repositories, if necessary.

Each of these issues is outlined in the following sections.

Entity Representation

There are several approaches that could be adopted to defining the enterprise-wide representation of the entity.

Entity representations may have to be custom developed to address the specific needs of the enterprise as whole. This may be the only viable option under the following circumstances:

  • Existing representations within the enterprise represent only a small portion of the enterprise-wide representation of the entity and are not readily extensible to accommodate the needs of the enterprise.
  • Characteristics that are unique to the enterprise cannot be properly reflected within any external representations of the entity.

However, custom representations are not always a financially viable option because they require a regeneration of the existing entities and their relationships.

Instead, a representation that is foreign to all the applications within the enterprise may be a viable approach as long as it still conforms to the core business processes. You could also use current representations that are specific to certain industries for this purpose. In other words, embracing an external representation does not necessarily entail the additional expense of procuring an application.

In other cases, you could choose the representation supported by one of the existing applications within the enterprise. ERP and CRM applications that support and drive the business processes for the enterprise are prime candidates for this approach.

While Entity Aggregation is all about having a single view of entities across the enterprise, entity representations within this layer might have to be adjusted to represent the nuances of individual business units. This is especially true for large international conglomerates that have been forced into being a logical enterprise through acquisitions and mergers of other enterprises that operate as autonomous business units.

Reaching a consensus on the representation within any one of these units can be a challenge. Therefore, reaching a similar consensus across all of these units can be an ambitious goal, if not an impossible one. In these cases, multiple representations (one for each operating unit) might be a more realistic and practical approach.

Schema Reconciliation

Even if the enterprise reaches consensus on the entity representation, the representation within each instance of the entity may still vary across different repositories. Different repositories can hold different schemas for the same entity. The Entity Aggregation layer must harmonize the subtle differences between these schemas in the following ways:

  • Entity Aggregation must account for the representation of all entities held within the different repositories.
  • Entity Aggregation must define a unified schema for all entities which represents a logical consolidation of all the views.
  • Entity Aggregation must effect transformations between each repository's schema and the unified schema.
Note   Sometimes, the term canonical schema is used instead of unified view. This pattern uses the latter term, because canonical schema implies that all the representations share the same schema, which is not always necessary.

Figure 4 shows an example of customer information that is represented in more than one repository. Although the contact repository defines the contact information for a customer, the financial repository defines the credit card details for the customer. The Entity Aggregation layer defines a unified schema that contains all the attributes required for representing the customer entity. The Entity Aggregation layer also defines the mapping between the unified schema and those schemas held by the individual repositories.

Figure 4. Schema reconciliation

References

Entity reference is the information required to uniquely identify an entity. Repositories that store instances of a given entity tend to maintain their own unique identifiers for their respective instances to ensure they have full control over internal data consistency. The Entity Aggregation layer should account for this and should be able to map references that point to a single instance. Apart from references that are held by other repositories, the Entity Aggregation layer might create its own reference for an entity instance. The idea here is that the Entity Aggregation layer maintains its own reference to an entity instance and maps this reference to the individual repository's reference. This reduces the coupling between the Entity Aggregation layer and individual repositories because new repositories can be introduced without affecting the Entity Aggregation layer's unified view.

Master Reference

Entity Aggregation layer uniquely identifies an entity instance by using a reference known as a master reference. A master reference could be:

  • A reference held by one of the repositories. For example, you can designate the reference held by a CRM repository as the master reference for a customer entity.
  • A new reference that the Entity Aggregation layer creates for the entity instance and maps to other references held by different repositories.

Inquiry vs. Update

The technological solutions available today are more robust for inquiring than they are for updating data in the back-end repositories. Updating has the inherent challenges of maintaining the synchrony of data across repositories.

Note   In the context of this pattern, deleting an entity is considered to be a form of update.

An update request usually contains two elements: a reference that uniquely identifies the instance and an update payload that contains information about the updated attributes and their respective values.

The Entity Aggregation layer uses entity references across all the repositories to perform the inquiries and updates. Although the Entity Aggregation layer maintains the entity reference, the references that are unique to each repository have to be determined before the update is made to the back-end repositories. For more information, see "References."

Compensation

The process of performing a compensating action can be manual or automatic. Business process owners have a strong influence on the manner in which compensating actions should be implemented.

If one of the systems fails to handle the update request, the Entity Aggregation layer should be able to handle this business exception by using one of the following approaches:

  • Request a rollback on all the other updates that have already been made.
  • Run a compensating transaction to reverse the effects of the successful updates that were already completed.

Ownership

Although the Entity Aggregation layer represents the unified view of an entity, it is certainly possible to store different fragments of an entity in different systems. Therefore, the system of record is not the same for all fragments.

For example, employee information could be distributed across the payroll and benefits repositories. It is also possible that some information may be owned by multiple systems. For example, attributes such as LastName and FirstName are probably represented in more than one system. In this case, the Entity Aggregation layer should designate a system as an authoritative source for attributes that are represented in more than one system.

This has several implications for the behavior that occurs during inquiries and updates. Attributes will always be fetched from the authoritative source. If the same attribute is represented by another system, those values will be ignored by the Entity Aggregation layer. Updates, on the other hand, have different semantics. When the Entity Aggregation layer receives an update request for an entity, the updates should be propagated to all the constituent systems of record.

Change Management

Processes have to be put in place to coordinate changes across all the repositories and the Entity Aggregation layer. In addition to ensuring active participation from the different business process owners and information technology (IT) representatives for each repository, a key step in this process is to ensure that the integrity of the enterprise-wide representation of the entity is not compromised.

Three types of changes to the underlying repositories can directly and significantly affect the Entity Aggregation layer:

  • Configuration. The repository configuration could undergo changes. This would include redeployment of the repository to a new physical location or to a different server. Configuration parameters such as the scheduled downtime for maintenance could affect connectivity to the Entity Aggregation layer as well. In an ideal environment, only the Entity Aggregation layer is directly affected by this change because connections between repositories usually do not exist. However, the other repositories could be indirectly affected by these changes through their connectivity to the Entity Aggregation layer.
  • Data model. The data model could undergo changes within the repository. The enterprise-wide representation of entities supported by the Entity Aggregation layer is significantly affected by these changes. Other repositories that store information in the same domain are affected also.
  • Data values. Changes to transactional data in a repository have the least impact, if any, on the Entity Aggregation layer and on other repositories. However, changes to reference data that spans repositories or to reference data that is used by the Entity Aggregation layer have a significant impact.

Example

Figure 5 shows a scenario where the Stock Trade entity is partitioned across systems based on geographical constraints. Applications that analyze the trends in a given industry require a complete view of the trades across geographical boundaries and systems.

The Entity Aggregation layer consolidates the view across geographical boundaries so that the partitioning of data across the repositories is transparent to the applications that perform trends analysis.

Figure 5. Stock trades scenario

Resulting Context

Entity Aggregation has the following benefits and liabilities:

Benefits

  • Consensus. Entity Aggregation forces consensus across business and functional units on the manner in which entities are represented at an enterprise-level.
  • Single view. Entity Aggregation enables a single view of key business entities such as Customer, Account, Product, and (in the case of healthcare) Patient.
  • Improved access to information. An enterprise-level view of key business entities enables applications to have immediate access to the information pertinent to these entities. Access to information is not constrained by the repositories that house them.
  • Reduced semantic dissonance. Entity Aggregation eliminates semantic dissonance across existing applications that work on the same data elements from multiple repositories.
  • Central location. Entity Aggregation supports a central location for validating data that is populated into the repositories.
  • Reduced change impact. Entity Aggregation reduces the potential impact of changes to the back-end repositories. Depending on the nature of the changes being made, the Entity Aggregation layer can continue to serve the needs of the applications while these changes are in progress.

Liabilities

  • Additional layer. Entity Aggregation introduces an additional architectural layer that could adversely affect end-to-end performance of the solution.
  • Consensus. Entity Aggregation requires consensus across business units on the definition of the enterprise-wide representation of entities.
  • Reengineering applications. Existing applications that are tightly coupled to a given set of repositories would have to be reengineered to accommodate the new architectural layer. Additionally, it is not always possible to reengineer some applications—especially packaged solutions.

Testing Considerations

The following testing considerations apply when adding an Entity Aggregation layer:

  • Depending on the degree to which the Entity Aggregation layer is adopted, all valid combinations of applications and their back-end repositories would have to be regression tested.
  • Existing test cases may have to be revised to reflect the business rules being exercised in the Entity Aggregation layer.
  • Test data available within a given repository must be repurposed to accommodate the enterprise-wide representation of entities in the Entity Aggregation layer.
  • Simultaneous connectivity from the Entity Aggregation layer to all the back-end repositories has to be tested. In the absence of Entity Aggregation, connectivity would have been a requirement only between the various application-repository pairs.

Security Considerations

The Entity Aggregation layer is effective at providing access to information that is pertinent to business entities at an enterprise level. However, applications might be able to obtain access to repositories that may not have been available prior to the introduction of the Entity Aggregation layer. Even though applications might still operate on the same data elements, they might access new repositories through the Entity Aggregation layer. Access privileges for various roles within these applications have to be managed at the Entity Aggregation layer.

Operational Considerations

There are two separate operational aspects to the Entity Aggregation layer:

  • The Entity Aggregation layer has to be operated and monitored as a repository that houses the enterprise-wide representation of entities. Less maintenance of the underlying data in the Entity Aggregation layer is required for the straight-through processing solution than for the replication solution. In the replication solution, the operational aspects that apply to the data in the repositories also apply to the Entity Aggregation layer. In either case, similar operational aspects apply if the Entity Aggregation layer maintains the entity references that are external to all the repositories.
  • Network connectivity between the applications and the Entity Aggregation layer and network connectivity between the Entity Aggregation layer and the repositories are critical components of the overall solution. The straight-through processing solution, in particular, requires concurrent connectivity to all the repositories.

Known Uses

Enterprise Information Integration is another industry term that is used to identify the enterprise-wide representation of a logical data model that houses the key business entities that have bidirectional physical connections to the back-end repositories where data is stored.

Some companies provide a logical metadata modeling approach that allows enterprises to reuse models and data for real-time aggregation and caching with update synchronization. These companies initially provided query-only capability, but they are slowly beginning to support bidirectional transfer of data between the Entity Aggregation layer and the back-end repositories.

Related Patterns

Given that the Entity Aggregation layer provides a view of data that is distributed across repositories, Data Integration is closely related to this pattern.

Start | Previous | Next