The Journal Of Open Computing

In This Issue:
à	Welcome from the Editor
à	Open Source Groundswell
à	Jon Hall on Standards
à	The Future of HTML
à	Metadata Update
à	The Productivity Underground
à	Spotlight on O'Reilly Assocs.
à	The Open Group: Clustering
à	The Open Group: Why Unix?
à	Forums
à	Events

Issues in Metadata Exchange

Katherine Hammer

President & CEO

ETI

The importance of an enterprise metadata strategy

Large organizations have long recognized the importance of metadata and now list arriving at an enterprise-wide strategy for metadata management as one of their most important strategic initiatives. However, few

(if any) have successfully devised and implemented such a strategy. There are a number of reasons for this disconnect – the nature of the task, the state of most software products and interchange initiatives, and the sheer complexity of the task, both technically and procedurally. This article describes these issues in some detail and argues for simply accepting the fact that a metadata strategy will most likely have to be evolved rather than designed.

What makes it difficult technically?

There are three major decisions that must be reached when developing an enterprise-wide metadata strategy:

Arriving at the definition of a common metamodel

Choosing the configuration of products which will be used to acquire, manage, and distribute metadata

Defining and enforcing a methodology for the use and maintenance of the above.

What most metamodels don’t represent. Most companies see the need for using an industry-specific metamodel and find little support from repository or CASE vendors. Moreover, the base metamodels shipped with most metadata products are not sufficiently rich to represent the following kinds of information:

Different levels of abstraction

Changes over time

Business rules

Relationships over time and space.

Analogous to the need to separately represent a logical data model as distinct from the physical, is the need to represent metadata at different levels of abstraction for different types of users. The end users of a data warehouse want to know where the data values they are using for decision support come from, but they do not want the same level of detail that is needed by the people who are maintaining the warehouse. For example, end users may want to know that the information related to vendors has been drawn from five purchasing systems, but they do not want to know that three to four fields from each source database were used to determine the particular target value used to represent vendor id. To date, there is no automated way to create this more general type of model from the more detailed metadata required to build and implement the warehouse.

Likewise, the interrelationships between databases are likely to change over time. As the schema for a mission-critical database is changed, it is unlikely that the historical data will be rebuilt, either because the necessary data is not available or – more likely – there is not sufficient downtime on the system in question. In this case, the previous interrelationships are not invalidated, but simply no longer valid moving forward. If one were to port an application off the source database, for example, to a newer ERP packaged application, it would be important for the entire history of interrelationships to be available to the analysts and implementers. Access to this metadata is doubly important when the source database exists in a legacy system where reporting and query (and therefore data sampling) capabilities are limited. As a result, one would like the metamodels that come with design and repository products to include some versioning capability as part of their baseline function.

The specification of business rules and functional logic are another area where there is little consensus among vendors providing metamodels. In general, the users of most products specify business rules using one of the three following means today:

Codeblocks, e.g., fragments of COBOL, C++, BASIC;
SQL or some proprietary control language; or
Documentation strings.

This is unfortunate for two reasons. First, for products that require codeblocks or some 4GL, it means that the user of the product must have some technical training. Even more importantly, it makes it extremely difficult (if not almost impossible) for products to exchange this kind of information.

Finally, there is the problem that in large multinational corporations, the interrelationships between databases are frequently complicated by some time dependency related to the fact that the accessibility to or consistency of feeds reside on different hosts with different periods of peak activity, downtime, etc.

What vendors are doing

In short, creating a metamodel that simply represents the complexities that any large company faces far exceeds the bounds of what’s offered with most metadata management products, even before industry-specific factors are incorporated. As a result, the ability for products to exchange metadata is either fairly primitive or consists of pairwise translators between products or the product and some repository using a universal metamodel.

In general, vendor initiatives can be characterized in one of three ways:

The creation of a standards body like CDIF (CASE Data Interchange Format)

The creation of a de facto standard by a market driver like Microsoft or Oracle

A tactical multi-vendor initiative like that proposed by the Meta Data Coalition.

Standards organizations tend to fail because their goal is to develop an exhaustive model which captures the superset of any functionality currently offered by a vendor, plus any other functionality that seems reasonable or valuable to the organizations driving the initiative. As a result, these efforts tend to take years before the standard is defined or adopted. On the other hand, large market drivers who drive a de facto standard typically favor metamodels that best suit the products they offer, i.e., heavily relational if they are vendors of an rdbms, etc. Moreover, such vendors frequently favor APIs rather than import/export formats since such interfaces are less portable and typically require more development. As a result, smaller vendors who choose to adopt the standard proposed by the market driver are more likely to push to make this standard generally accepted in order to reduce the demands on their development resources.

The Meta Data Coalition is an example of a multi-vendor initiative committed to a tactical solution to metadata exchange. Founded by six companies in the summer of 1995, the Coalition currently has approximately 50 members, most of which are software vendors. Recognizing that what constitutes metadata will evolve as more and more products operate in distributed, heterogeneous environments, the Coalition opted to define a file-based interchange specification that did not attempt to be exhaustive, but allows private metadata to be carried along as part of the exchange. Version 1.1 of MDIS (the Meta Data Interchange Specification) was released in the summer of 1997; to date, seven vendors are compliant with another seven have committed to compliance by the end of 1998.

There is benefit to the Coalition above and beyond the MDIS specification itself and that is the size of its collective voice. When Microsoft announced its repository initiative, the Coalition offered to serve as part of its review process and committed to developing an MDIS-to-MS Repository translator so that any MDIS-compliant vendor would be, with little or no work, MS Repository compliant. The work between the various organizations led to a metamodel which is much closer to what will ultimately be needed than was likely if such collaboration had not taken place.

Taking an iterative approach

The software industry is a long way from full metadata interoperability and, due to the types of technical problems outlined above, will probably remain so for some time. And yet there is sufficient attention being paid to these matters that one can expect significant progress over the next few years. However, because this progress will be evolutionary, the best a company can do is to adopt an iterative approach to developing both its corporate metamodel and its enterprise metadata strategy. Moreover, such an approach is probably the only reasonable course to take since much of an organization’s metadata describing legacy systems is buried in "secret" schema changes - unknown to the legacy data managers and reflected only in the millions of lines of application code that access these databases. The sheer complexity of discovering this information precludes an analytical approach. Rather, this information is typically discovered in the process of implementing some new project such as a data warehouse or application. The most important thing a company can do is retain this information as it is discovered. By looking for products that acquire and retain metadata as a by-product of providing an organization with increased productivity, and by making sure that those products are also able to flexibly export such information, an organization can position itself to move quickly when a standard emerges.