Domain and Information Models


						R E • D A C • T I O N

Simplifying the Domain with an Information Model
A note redefining common concepts for practical use

Abstract:

This note outlines a conceptual architecture framework where domain modeling is contained by information modeling. The position of each model in application development is specified. Information modeling is (re)defined as a path to domain model simplicity and from data to information. Similarities to the contrast between W3C Schema and Schematron are noted.

Domain Model is a common and deeply considered concept in software architecture. Information Model (IM) is also widely used, but with less consistency. There seems to be no clear, consistent application of the one term within the scope of the other. This note defines Domain Model by reference, redefines IM in terms of domain model-based architecture, and outlines best practice usage under these definitions.

DEFINITIONS

Domain models have been defined in terms of a software pattern, data management, and practices. The basic task of domain modeling is acquiring and structuring knowledge of a business domain (using the word "business" loosely since a Domain may be more abstract or less economic than that word implies). In brief, a Domain Model is generally represented in the following states:

An abstract model of the elements and associated processes of some real-world thing and its work
(e.g. a catalog and flipping through it)
An application layer translating the abstract model into a software machine
A schema defining the abstract model's data

Domain model-centric architecture is simply system design with the first-order concern being definition of the business domain separate from other aspects of the system. Information systems architecture covers substantial ground. Starting from the inside with the domain is one valid approach—particularly for data structures centered, headless, or product line components. Organizations with strong interests in reuse, system flexibility, or platform change will be more likely to take a domain based approach. Some well-known domain-based architecture friendly concepts include Model Driven Architecture, Product Line Architecture, and Model View Controller (though ironically MVC in the Web application space has often deprecated the model—but that's another story).

IM has not been so clearly defined. Usage ranges from the analog of domain model, to the data validation constraints of a schema, to metadata, ontologies, and beyond. We are going to use a simple and intentionally narrow definition here. An IM is a set of rules-based, value-oriented metadata refining and/or extending a domain model by a mapping against attributes of that domain model.

As a simple example take the domain of a magazine publisher's subscriber management. Each subscriber is modeled as a set of fields collected in a Subscriber entity. The entity is transformed to a class with properties that are mapped to a SQL schema defining tables in relationships. One subscriber must have at least one subscription. A Subscription maps a Subscriber and an Order. In Subscription there is a publication reference. For one reason or another, this information system does not consider particular publications to be first class constructs. Because of this limitation the publication reference is merely a string identifier of 30 characters or less. The domain model is clear, but incomplete. What value can fill this field? If we can not enumerate Publication object where do we get a list of publication names? From the IM entry mapped to the publication field of Subscription.

An IM, as defined, structures one form of metadata. But metadata is a broad concept that may extend beyond the IM to textual description, keywords, qualities, etc. An IM is a set of rigorously defined value-oriented metadata entries mapped to fields in the domain model. The fields of an IM entry are concerned only with the technical and factual correctness of the values in specific fields of the domain model. Consistent use of a well specified IM is a step from data to information—other metadata and structures may then bridge from information to knowledge.

MODEL RELATIONSHIPS

From the example of the publisher we can parse out the most important rule of thumb: information models simplify domain models. An information model simplifies a domain model two ways:

By limiting the form, quantity or values of the domain model's data set
By limiting the extent of the domain

The first of these is straightforward validation. An IM should know how to limit data, field-by-field, the way a good form interface limits data. If a value is found to be invalid, an IM should be able to explain why. If values for a field must come from another part of the domain, the IM should be able to mediate or reflect the values in order to provide a unified interface for validation.

The second task of an information model is to limit the domain itself. Were the publisher's system to explicitly model publications there would be no need for the IM to provide a metadata description of the publication field. Instead to find appropriate values for publication we would query the collection of Publication object. If a domain model is complete an IM is less necessary. Clearly few domain models are complete—perhaps none are. In fact, one tension of system design should be to find the least complete domain model that both completely answers the problem statement and provides a productive, reasonably non-brittle system.

The IM should be used strategically to contain the domain model. System designers must make the trade off between complexity and accuracy—the IM is a means to raising the value of that trade off. To provide high value in containment the IM must be both simple and general. The generality permits one relatively simple structure to stand in at the interface of domain and rest-of-world. The general IM concepts are:

Mapping

An element of the information model must map (apply) to one or more fields of the domain model—given a domain model field the IM must provide an entry

Reflecting

To provide possible values an IM element may reflect the current state of a domain model field other than the one it is mapped to
Characterizing

Each element in the information model must provide sufficient information to limit its mapped fields in the domain model

Each of these concepts points to a dynamism domain models do not have. While a domain model must in the usual case be structurally static, the IM is typically dynamic. Not only is the mapped content of the IM dynamic, but also the contents of each element, including reflective reverse mappings and any reflected values. Runtime look up of dynamic information reflected from the domain model is not a requirement of the information model per se—we have said that where the domain contains the information we look to the domain—but in support of providing a common, lookup interface for validation and advice, we expect most IM infrastructure will support this feature.

Characterization is the most complex concept encompassed by information models. To characterize a field information model element fields may include:

Domain model correspondence ID
Type
Lifecycle advice (e.g. transient, calculated, static, indexed, mutable, latched, etc.)
Possible values
Source of possible values (for reflection)
Simple validating rules (e.g. required, unique, min, max, default, etc.)
Complex validating rules (e.g. required if, is greater than field, contained by, etc.)

This list is not exhaustive. For instance, although not a complete conceptual fit, it is not unreasonable to add permissioning to this list.

It is worth taking a moment to draw a parallel between the concept of IM, as defined here, and a rules-based XML validation language—Schematron. Schematron is an alternative or complement to DTD or W3C Schema (among others) for defining and validating XML instance documents. Just as a domain model is primary and core to an IM, a W3C Schema may be the framework a Schematron extends and limits. Although Schematron has limited mapping or reflecting capability it covers characterization very well using declarative rules. Schematron could work as an information model for an XML-based system, or, with some modification, it could sit at the core of a dynamic object based information model—a direction the Cocoon project has started in.

An IM in some form is most common in areas of high generality, severely limited domain, or user input data with data quality requirements. Containment Modeled systems (e.g. repositories, tree structures, etc.) are a strong fit due to end-users leveraging their simple structural domain in very specific business cases. Many containment modeled domains carry elements of IM as first class properties. An example would be a repository item with a getClassOfContent method, or a version control system's descriptor object with a getRequiredProperties method. In cases like these IM provides a more clear cut, more regular means of classifying fields and limiting the domain. Further the separation of concern simplifies and centralizes control over domain use boosting application development productivity. For a good example of this usage see the Documentum eContent Server's Data Dictionary.

SUMMARY

Information modeling is pervasive in its most rudimentary form as simple input or data validation. Client validation of forms, database validation against DDL, XML validation, etc. Making information modeling a more detailed, explicit, and value-oriented part of application design and implementating IM as a logical component has the potential to:

Raise data quality
Increase application flexibility
Simplify configuration- and run-time administration
Increase development productivity

In addition, by addressing IM as a first order architectural concept system designers may think more effectively about the domain as a narrow, limited problem space. The upshot of IM at this level is a compounding of each of the benefits listed above. And because IM is itself a highly focused, highly general domain with few first class concepts it is definitely possible to find or build IM infrastructure once and reuse many times. Given problem prevalence, conceptual clarity and low cost to benefit ratio in implementation, there is little reason not to pursue an IM approach in your next domain-based architecture.

© 1999-2002, d.kershaw. all rights reserved.			Δ