Structured Integration: Message Payload Content Models: specific or generic (part 2)

This post continues the previous one (part 1), and presents further alternatives for modeling payload data structures for ESB event or services, using XML and XML Schema.

The second alternative of part 1 was called "Entity-specific definition and derived types", and relied on the xsi:type attribute for two purposes:

To define a restricted type for a generic entity-related XML Schema type (PurchaseOrderType in my example), which can be used for validation and otherwise
To refine the semantics of the message in the context of the Business Object (the Purchase Order), for example specifying whether it represents a release, a cancellation, a modification, etc.

The payload looked like this:

This approach has several benefits, but its biggest disadvantage is that, like the "Fully specific definition" approach, it still requires the definition of a large number of schema types: the generic, "canonical" type (normally by Business Object), plus a potentially large number of restricted types (by BO/event type or by BO/service operation).

The next (more generic) approach to be examined still uses an entity-based "canonical" definition, but without any derived types.

Entity-specific definition with separate validation

we want to use the generic "canonical" type without the restricted schemas used under the previous approach.
Not using anymore the standard xsi:type attribute (which was necessary to identify the restricted schema type), the schema must now define an explicit custom attribute to identify the semantics of the message. In this example, the attribute is named eventType:

Below is a sample event payload conforming to this schema, for a Purchase Order cancellation event:

As we can see, the message is practically the same as for the previous approach, just with a different attribute and without the XML Schema Instance (xsi) namespace declaration.

All considerations regarding mapping that apply to the previous approach also apply here. All integration logic that has to do with purchase orders will use instances of the common PurchaseOrder document, mapping in each case those optional elements that pertain to the functional context at hand (the functional context identified through the value of the eventType attribute).

However, under this new approach, we do not have the event-specific derived types anymore. In our example, we no longer have the PurchaseOrderCancellationType definition that enforces that CancellationDate and CancellationReason as mandatory elements when eventType="PurchaseOrderCancellation". The inability to validate the payload based on semantic context would a serious functionality gap.

To fill this gap, we need ways to specify validation logic separately from the schema without writing custom validation code in the ESB layer (which is awkward and not very maintainable). We need is to specify constraints and validation rules in a declarative fashion.

In our example, the constraint is simple: the CancellationDate and CancellationReason elements must exist if the eventType attribute of the PurchaseOrder element has a value of "PurchaseOrderCancellation". Nevertheless, there are many several other types of constraints that would be useful for validation in a semantic context, which cannot be always expressed using XML Schema. They are called co-constraints in the literature. Even more complex validation constraints can be those of an algorithmic nature (for instance, checking that for each order line the value of its GrossAmount element is the sum of the values of its NetAmount and TaxAmount elements).

Granted, one can easily get carried away and specify all kinds of complex business rules to be enforced, which brings to the important debate on whether or not (and to what degree) it is appropriate to enforce business rules at the ESB layer (rather than at the level of provider applications and resources). I'll leave this discussion for another blog post and focus on the simple need of validating mandatory elements by semantic event type, which is a totally legitimate need at the ESB level.

XML technologies come to the rescue to check co-constraints and algorithmic constraints. Technologies like Schematron and Relax NG have been available for several years in the form of libraries that can be easily used by an ESB product. However, the most common technology to implement this logic is XSLT (on which by the way most Schematron implementations are based).

A simple XSLT stylesheet to check the presence additional mandatory elements (CancellationDate and CancellationReason) for a PurchaseOrderCancellation event could be the following:

Its logic should be clear: if the PurchaseOrder document contains the two elements, then the output resulting from the transformation will empty (= validation OK), otherwise the output will contain one or more validation errors.
The ESB must, after parsing the document and validating it against its canonical schema (which defines which elements are mandatory in all scenarios), dynamically invoke the appropriate validation stylesheet based on the runtime value of the attribute that defines the semantic context (eventType in our case).

The dynamic invocation of an XSLT transformation is simple to implement in most ESB tools and has the following advantages compared to using restricted XML schema types:

XSLT validation is much more powerful than XML schema validation to express co-constraints and can even express algorithmic constraints if really required
The stylesheets can more compact than XML schema restricted type definitions (which must repeat all mandatory elements of the base canonical type)
XSLT stylesheets can be changed and deployed independently of ESB code and XML schemas (which in most ESB tools get "hardwired" into the ESB code at design time), bringing more flexibility to change validation rules without re-deploying whole ESB components

The disadvantages essentially consist in some additional complexity (XSLT learning curve) and the performance hit of the XSLT transformation, but this approach is anyway a viable one in most cases.

Fully generic payload definition

In business environments where data structure definitions must be highly flexible (for example, in R&D departments), people tend not to want to be bound to fixed schemas for message payloads, but instead favor a fully generic payload structure that can cover arbitrarily composed data entities.

It is true that XML Schema allows open-ended definitions via the <xsd:any> construct, but that has limitations (it cannot be preceded by any optional element definitions) and is not supported by many ESB tools, so it is not considered here.

A very simple but extreme way to implement a generic data structure is to use a collection of key value pairs (equivalent to a hash map) qualified by a semantic context (eventType). A simple definition of such a generic structure in XML Schema could be the following:

A sample instance of this schema (again for the PO cancellation case) is:

Now, the schema definition is no longer entity-specific, and the value of the eventType attribute must convey the complete semantic context of the message.

What are the implications of this approach?

On the plus side:

There is just one very simple canonical schema, which does not even need to be versioned (as opposed to any other type of schema)
All event and service interface just use the common generic schema
Mapping from instances is easy using XPath (looking for just for the elements that are appropriate in the functional context and ignoring the remaining content)
Dynamic validation based on functional context can still be done using XSLT (or Schamatron, or Relax NG), like for the entity-based approach above. In XSLT we could have (note that now all mandatory elements must be validated via the stylesheet):

On the other hand, the fully generic approach also has disadvantages :

Since service contracts and event definitions all use a common generic schema, they are not self-describing in terms of the data they exchange.
Most importantly, components using a generic schema do not have message formats enforced at contract level, but everything is enforced at implementation level. Service consumers, service providers, and event subscribers must be aware at runtime of what data model is coming their way. In practice a version attribute is mandatory (in addition to eventType) so that a message recipient can apply the appropriate validation and mapping based on the combination of eventType and version.
Mapping to a fully generic format is less convenient and cannot leverage the drag-and-drop mapping facilities available in virtually all ESB tools. One needs to call a custom built map(key, value) helper method to add each KVP element to the payload being mapped
Mapping complex, hierarchical documents is more difficult. In practice the easiest approach is to define hierarchical keys (e.g., under a PO document "/items[10]/productId" could identify the product ID of the 10th line of the order)

These points should be carefully taken into consideration before going with the fully generic approach: do you really need that degree of flexibility?

Conclusion

The choices made in the area of message modeling have a major impact on your SOA and your ESB designs. Be sure to consider all pros and cons in the context of your enterprise (degree of agility required, business processes, existing data models, features of available tools).

Sunday, October 12, 2014

Message Payload Content Models: specific or generic (part 2)

Entity-specific definition with separate validation

Fully generic payload definition

Conclusion

No comments :

Post a Comment