Structured Integration: Message Payload Content Models: specific or generic (part 1)

When modeling payload data structures for messages, events, services interfaces, or API interfaces, there are some hard choices to be made, which have implications on simplicity, reuse, readability, performance, ease of validation, and ease of data mapping.
A very basic and important choice has to do with the specific vs. generic nature of the schemas we define.

A payload schema that's highly specific defines the exact set of data fields required by a service or API signature, and in the case of an event it defines the exact set of data fields that are relevant for that event.
This being said, the actual degree of specificity of the payload schema depends on how semantically specific the service/API/event is: there is a big difference, for instance, between a CustomerUpdated event and a CustomerBlockedForBadCredit event. The content model for the first event may potentially include a very large set of customer master data elements (as it is not a semantically specific event), while the content model for the second could just contain the identification of the customer, and credit related data.

On the other hand, a completely generic schema would just define just an open, flexible set of key/value pairs whose composition varies based on the service/API or event type it is used in, with no semantics implied by the schema definition per se.

This article (divided in two blog entries, of which this is Part 1) will consider a number of options, going from the most specific to the most generic side of the spectrum, using Purchase Order (PO) events as examples. For example, we want to model on our ESB an event carrying the information that a certain purchase order has been released for sending it to the vendor, or another event that the PO has been cancelled (before the vendor has shipped it).

In the examples of both Part 1 and Part 2 XML data models will be used, omitting the use of namespaces in order to make the examples simpler to read. For the same reason, technical envelopes with control fields have been purposefully omitted, although real ESB architectures normally define custom envelopes.
However, element field names are standardized across the examples (as they should be, see article The Importance of Data Glossaries).
Despite the fact that I use XML and XSD here, high level considerations may apply to other formats as well.

In this first part of the article, I present two alternatives, which both lean on the "specific" side, leaving the more generic alternatives to Part 2.

Fully specific definition

A straightforward approach to model a PO Release message would be to use a payload like this:

The root element name in the event payload identifies at the same time the business object to which the event happens and the nature of the event. This is very specific, although the payload does not include much data in the event message beyond the identification of the PO.

To enrich the event with detailed approval data (e.g., for each approval level: approval date, approver user id, approver remarks) it would be necessary to call query service (Content Enricher pattern). The choice between infomation-rich event payloads and "essentials" events would be the subject of a separate discussion.

We would of course define an XML Schema resource in our SOA repository that would include something like the element definition below:

This is excerpt from complete XSD file, with type definitions simplified (in practice most of them would refer to separate simpleType definitions).

Here all child elements of PurchaseOrderSendReleaseEvent are mandatory, as we intend this information to be supplied by the event publisher for every single event of this particular type. The schema could of course define additional optional elements, but nevertheless the content model is tailored to the specific combination of business object and event.
A similar schema for the request message of an update service operation could enforce that all required inputs are present for the operation, as they correspond to mandatory child elements.

Let's see how this approach fares against the criteria listed at the beginning of the article.

The approach is clearly extremely simple, it has very good readability and ease of validation (it is clear from the schema definitions which elements are mandatory in which functional context, and standard XSD validation can be readily applied).

The main drawback is that it scores low in reuse: it is possible to share the simpleType definitions for the individual data elements but nothing more. It is necessary to define and manage many schemas, due to the high number of Business Object / event combinations (for events) and specific service operations (for services).
From the point of view of data mapping, mapping is straightforward, but necessarily many similar mappings need to be maintained for each individual schema. Here, too, there is no reuse and maintenance is heavy.
Performance is good as only data related to the specific event is included into the document.

Entity-specific definition and derived types

A first step in the direction of having a more reusable payload is to use a common root element that only reflects the Business Object affected, while the event (or service operation) could be expressed via an attribute value.

With XML, we can take advantage of the standard xsi:type attribute of the XML Schema to indicate the event type (avoiding the use of a custom attribute for this), and we can have the two payloads instances below for a PO release and for a PO cancellation, respectively:

Here, although the two payloads are composed differently, we would like to use a common schema for mapping and ESB broker publication, while retaining the possibility of validating each payload against a restricted schema that is specific to the event type.

The schema definition could contain a generic PurchaseOrderType, having as its mandatory elements only those elements that are populated in every possible PO integration scenario (e.g., PONumber, CompanyCode, POType, VendorNumber), and the other elements defined as optional, as they may be present in one particular scenario but not in others.

NOTE: this being an example, only three optional elements are shown. But in a real case there would be a very large number of optional elements.

From the above generic or "canonical" definition of purchase order, we can derive restricted schema definitions for each PO event or service based on the generic schema. For example, for events SendRelease and Cancellation:

Compared with the fully specific schema, this approach doesn't seem to have any advantages at first sight: it still uses a restricted complex type definition for every PO event or service, plus the added generic complex type PurchaseOrderType.

However, in most ESB implementations (e.g., Software AG webMethods), it would be possible to perform mapping and broker publication against the generic (canonical) schema definition and additionally:

have broker subscribers based on the same generic schema with filters based on xsi:type
"downcast" an instance of the generic type into an instance of a restricted type via a single map operation (without having to map field by field, which is error-prone and inefficient)
dynamically validate a payload instance against a restricted type
extend the generic (canonical) type with additional optional fields with no impact on the definition of the derived types, the event / services definitions using the derived types, and the existing mappings between generic type and derived types

Thus, this solution is more flexible than the fully specific definition, while keeping similar levels of readability and performance on the wire. Schema extensibility is good, as it is possible to decouple extended versions of a restricted type from extended version of the corresponding base type (as long as the derived type version uses elements that are also defined in the base type version, of course).

The concern remains about having to manage a large number of derived schemas, but with an important difference.
The availability of the generic (canonical) schema definitions makes it much easier in this case to automatically generate the derived schemas based on simple specifications. It would not be too difficult to develop a program that reads the XSD for a generic schema from the SOA repository and also read some kind of data store (as simple as a property file or as sophisticated as a custom DB) that holds the information regarding which elements are mandatory for which derived type, and thus automatically generate the schema definitions for the derived types in the repository.

Conclusion

The two alternatives presented here offer very good functionality for mapping and validation in exchange of maintenance burden (also into account the practical necessity to version the schema definitions).

The upcoming part 2 of the article will describe alternatives that offer a different tradeoff.

Saturday, October 4, 2014

Message Payload Content Models: specific or generic (part 1)

Fully specific definition

Entity-specific definition and derived types

Conclusion

No comments :

Post a Comment