Object Services Architectures

Metadata Structures for Internet Services

Project Summary

Frank Manola Object Services and Consulting, Inc. September 15, 1998

Executive Summary

Today, the World Wide Web is a global information repository of documents primarily represented by syntactically structured HTML tags and MIME extensions. These weak data models do not provide the foundation for command and control situation modeling or enterprise computing, or for a new generation of tools to operate on a more semantically structured, knowledge-based web. Richer base data model(s) are needed that converge the benefits of emerging web structuring mechanisms and distributed object services architectures.

The technical objective of this project was to improve the foundation for Web and object model integration. Several candidate technologies from different communities exist -- for instance, HTML and MIME from the Internet community; Webserver from the DARPA JTF/ATD command and control community; ORBs, IDL, Tagged Data from OMG; Java and ActiveX from the component community; Harvest SOIF and Netscape RDM from the search engine community; ODMG ODL and Tsimmis OEM from the database community; Dublin Core, Warwick Framework, and related work from the Internet metadata and digital libraries communities; two different specifications for document object models from Netscape and Microsoft; and RDF and XML from W3C.

Our approach has been to identify the main contender approaches, identify any deficiencies in each approach, identify a convergence approach centered around the use of XML as a basic representation, fill in (some of) the gaps, and transfer our results and lessons learned directly and incrementally to DoD command and control projects like DARPA Advanced Information Technology Services (AITS) Architecture and industry standards organizations, primarily W3C and OMG.

The results of our work have been the identification of a key technical framework for integrating Web and object technologies, a number of Technical Reports and external publications describing this approach, a prototype illustrating one of the possible Web object construction mechanisms, and the injection of ideas from this work into the activities of OMG and W3C.

Problem Statement

The basic data structure in the Web is HTML. It is generally recognized that HTML is too simple to adequately support the requirements of the increasingly-complex applications being developed with the Web as a base, such as:

applications that require the Web client to function as the front-end to enterprise applications or mediate between multiple heterogeneous databases,
applications that require more flexibility in distributing processing load between Web servers and clients, and
applications that require the Web client to present different views of the same data to different users or in which intelligent Web agents need to tailor information discovery to the needs of individual users.

Proprietary HTML extensions have been developed to address some of these problems, but none deals with all of them, and together they create barriers to interoperability. The same is true of the proprietary data formats used by particular applications. Their use requires specialized helper applications, plug-ins, or Java applets, creating interoperability problems, and difficulty in reusing that data in different applications for new purposes. While use of some specialized formats is necessary in particular applications (e.g., multimedia), in many cases these formats are used to address HTML deficiencies for generalized document and data processing.

There is much ongoing work within both the Web and database communities on data structure enhancements to address these issues. Work on similar issues is ongoing within the Object Management Group (OMG) as well. This work has contributed valuable ideas, and the various proposals illustrate similar basic concepts, generally, movement toward some form of simple object model. However, these similarities are often obscured by detailed representational differences, and the work is fragmented and lacks a unifying framework. As a result, individual proposals often lack key capabilities that are in some cases contained in other proposals. Moreover, in most cases these proposals are not well-integrated with key areas of emerging industry consensus on emerging Web data structuring technologies.

If the Internet is to develop to support advanced application requirements, there is a need for both richer individual data structuring mechanisms, and a unifying overall framework which supports heterogeneous representations and extensibility and provides metalevel concepts for describing and integrating them.

Objective

The technology objective of the OSA/metadata project was to define a unifying framework for Web data structuring representations that

is extensible,
supports integration of multiple types
supports the requirements of metadata, annotations, and database applications,
is based on emerging industry technologies such as XML, RDF, Dynamic HTML, and DARPA I*3 database work including OEM,
provides a formal model for database-like operations on these structures, and
provides a query language based on the formal model.

Our technology transfer objectives were

to transfer results of our work to the DARPA AITS command and control projects, and
to converge these efforts by working with W3C and OMG.

Approach

Our approach was to build on and unify key proposals for richer Internet data structuring and metadata mechanisms, including XML, RDF, Dynamic HTML, OEM and related work, based on an analysis of underlying common principles. These mechanisms will be extended with specific metalevel concepts that reify (make first-class) components of the data structure, allowing them to be self-describing and to better integrate code and data. Some of these proposals already include limited steps in this direction. By building on emerging efforts, the work will be grounded in technology that already has considerable support from major Internet technology providers such as Microsoft, Netscape, and Sun. Use of XML, which is a subset of SGML, also links this technology to the use of SGML within the government (e.g., CALS).

We also identified a potential formal basis for applying database operations, such as query and view operators, to the resulting structures, based on object logics such as F-logic. These logics provide limited second-order capabilities for dealing with the metalevel concepts, while using first-order semantics, which provides for computational efficiency and tractability.

In addition, in an effort to push the integration of the technologies we have identified, we made our results available to DARPA AITS architecture projects, published technical reports and papers, and made our results available to both W3C and OMG both via published results and presentations, and via participation in the technical activities of these groups.

This work, which integrates data, metadata, and object capabilities from both database and key emerging Web technologies, will be crucial in integrating object service, Web, and database technologies in a deep and efficient manner to support increasingly-demanding enterprise-scale applications.

Limitations of Related Work

The Internet and Web communities have developed a number of "object models" or data structuring principles to represent semistructured data. The database community has developed proposals for "lightweight object models," partly driven by attempts to represent metadata for Web resources. All this work has contributed valuable ideas and, taken as a whole, exhibits important common underlying principles based on the use of tagged data items or attribute/value pairs. However, the individual proposals lack important capabilities that are often contained in other proposals. What is required is that this work be integrated and the best ideas merged. The following paragraphs describe work that is most directly relevant to this effort (other work, such as Harvest's SOIF, is also relevant).

The World Wide Web Consortium (W3C) Resource Description Framework (RDF) effort <http://www.w3.org/Metadata/RDF/> extends the PICS technology for labeling Internet content to support more general metadata requirements. Related work includes Netscape's Meta Content Framework (MCF) <http://www.w3.org/TR/NOTE-MCF-XML/> and Microsoft's XML-Data <http://www.w3.org/TR/1998/NOTE-XML-data>. These efforts define what are effectively metadata type systems, based on collections of attribute/value pairs. They provide a core of good ideas for supporting metadata, such as explicit links from pages representing resources to metadata describing them. However, there are important differences among the various approaches, and the approaches are not integrated with other parallel work, such as the Document Object Model (see below).

The RDF and related work define mappings to the Extensible Markup Language (XML) <http://www.w3.org/XML/>, a W3C Recommendation (adopted specification). XML, which is a subset of SGML, allows creation of customized markup languages incorporating user-defined tags and a standardized way of describing those languages (DTDs) that can be understood by generalized clients. XML thus provides direct support for using tagged data items (attribute/value pairs) in Web resources, as opposed to the current need to use ad hoc encodings of data items in terms of HTML tags. XML DTDs are similar in some ways to database schemas, and thus provide a natural target for database information. The linking of resources with their DTDs is similar to the association of a database record with its schema type, and to the association of an object with its type or class definition. The hypertext linking capabilities of XML are greater than those of HTML, including bidirectional and multiway links, and links to spans of text. Work is also underway on tying XML to Java. XML has considerable industry support (e.g., both Netscape and Microsoft). However, XML provides only basic tagged value support. Additional concepts must be added to apply it to extended data and metadata structuring requirements (as illustrated by RDF and related efforts).

W3C's Document Object Model (DOM) effort <http://www.w3.org/DOM/>, based on Dynamic HTML facilities defined by Microsoft and Netscape, extends HTML with an object model allowing scripts or programs to change styles and attributes of page elements (or objects) or even to replace existing elements (or objects) with new ones. This provides a basic way to integrate a page's data with code in the page and provides an explicit metalevel and API. Current W3C specifications provide a DOM for XML as well as for HTML. However, as currently defined, these capabilities are not sufficiently tailorable or general. For example, current specifications lack support for integrating code not co-located on the page (e.g., code that already exists on the client) or for defining application-specific objects based on data on the page, and the work is currently not integrated with metadata work such as RDF.

Stanford's Tsimmis Object Exchange Model (OEM) and related work by others (e.g., U. Penn.) have also based metadata models on collections of attribute/value pairs, together with extensions such as reifying individual attributes by assigning identifiers to them. This work provides a valuable core of ideas for applying database concepts to this type of data. However, the metadata capabilities of these structures are somewhat limited. They do not explicitly consider capturing type and schema information where it exists, or linking that type information to the structures it describes. The work is also not well integrated with emerging Web technologies such as XML, DOM, and RDF that are likely to change the basic nature of the Web's representation. Finally, an assumption behind these database approaches so far, which in part explains their limited technical success, has been that the problem they address is to query largely syntactically structured text bases, the kinds supported by HTML. XML-based approaches provide a higher level, more semantic representational structure, which can start with the assumption that information authors themselves have support to provide more semantic structure information.

Finally, the OMG has identified a number of requirements similar to those found in the context of the Web. An example is a recent Tagged Data RFP. These requirements involve the use of tagged data items to support semantics-based information exchange between applications, and also support for nesting and the ability to locate objects via tags through layers of nesting. Such high-level communication is considered important in OMG's attempts to define Business Object capabilities. OMG's Property Service provides similar capabilities. These are of interest in showing the recognized need for data organizations, similar to those described above, within OMG's object-oriented distributed architecture. However, these are not yet fully coordinated with emerging Web or database representations.

Results

We completed a Technical Report Towards a Web Object Model <http://www.objs.com/OSA/wom.htm>. This report :

described key examples of existing work from the Web, database, and OMG communities (including those mentioned above) that contribute both ideas and technology toward providing the components of a Web object model
identified some key underlying principles behind this work
identified a framework which allows this work to be unified and extended to support the requirements of advanced Web applications for object technology

In particular, the report described how a number of (in some respects) separate "threads" of development in the Web community could be combined to form the basis of a Web object model to address requirements for enhanced Web capabilities. This combination was based on the observation that the fundamental components of any object model are:

data structures that can represent object state
ways to associate behavior (object methods) with the object state
ways for the object methods to access and operate on that state

Extending this idea to the Web environment, the idea is that Web pages can be considered as state, and objects can be constructed by enhancing those pages with additional metadata that allows the pages to be considered as objects in some object model. In particular, Web pages can be enhanced with metadata consisting of programs that act as object methods with respect to the "state" represented by the Web page. The report also identified key Web technologies to support the object model components we identified. We also presented this material at the OMG-DARPA Workshop on Compositional Software Architectures, and to the OMG's Internet and GIS Special Interest Groups.

This Technical Report has been widely read on the Web. As a result, we were asked to write the following invited papers:

F. Manola, Towards A Richer Web Object Model, ACM SIGMOD Record 27(1), March 1998, 76-80, <http://www.acm.org/sigmod/sigmod_record> and
F. Manola, Key Technologies for a Web Object Model (tentative title), to appear as the lead article in a special issue of IEEE Internet Computing on Web Object Models, Jan./Feb. 1999, <http://www.objs.com/survey/wom-ieee.htm>.

We also completed a Technical Report Some Web Object Model Construction Technologies <http://www.objs.com/OSA/wom-II.htm>. This report provides further details about a number of specific technologies that will be important in building Web objects. In particular, it:

adds more detail to the overall approach to constructing objects in the Web introduced in the earlier Technical Report, and discusses general considerations for Web object model design.
describes a number of technologies developed (or under development) in the context of the Web that provide parts of the mechanisms required to construct Web objects.
discusses potential applications for Web objects constructed according to these techniques, and discusses how to construct objects in several "real" object models (e.g., OMG IDL, Java, and JavaScript) using these mechanisms.
presents general conclusions to be derived from these technologies.

This latest report shows that the approach described in our initial Technical Report is definitely viable. Considerable work is in progress on technologies that are relevant to Web object construction, and there are numerous alternative technologies becoming available to address the various parts of the Web object construction problem we have identified. However, further work is required to sort out the various alternative approaches, and integrate the most promising ones into one, or possibly more, workable combinations.

In conjunction with the OSA/Intermediary Architecture subproject, we also developed a prototype of an extended XML parser which can generate application-specific objects from XML documents, in order to experiment with one form of Web object construction mechanism. This prototype uses XML-defined metadata added to XML documents to define associations between object classes and the XML elements in the document. A White Paper <http://www.objs.com/OSA/XML-to-Java-Mapping.html> describing this work was also completed.

We also helped form a Web/OMA Integration Working Group of the OMG Internet SIG <http://www.objs.com/isig/home.htm>, with the general goals of:

identifying the relationships (and overlaps) between specifications being developed in the Web and OMG communities, and reducing unnecessary incompatibilities
examining applications that use combinations of OMG and Web technologies, determining technology shortfalls, and recommending solution approaches

We also participated in the activities of the OMG's Object and Reference Model Subcommittee, which is working to identify and clarify OMG's next-generation object model concepts, and in the activities of several other OMG groups that are beginning to look at Web technologies such as XML.

Our participation in W3C activities has been only moderate (although OBJS is a W3C member), but we have submitted input on coordinating the various W3C metadata-related activities, and participated in technical interchanges on W3C-related email lists.

Lessons Learned

There are numerous "threads" of Web technology development, including such things as scripting languages, stylesheets and other presentation facilities, addressing mechanisms (URLs, XLL), data representations (HTML, XML, MIME types), and protocols. The more complex applications currently being envisioned for the Web require that these threads be combined in complex ways, often exposing both similarities among technologies previously perceived as separate, and unexpected technical gaps. The need to consider new technology combinations mirrors the need to consider new application combinations which integrate aspects of document processing, conventional Web processing, database capabilities, and distributed object architectures. Both the requirements of these new application combinations and the technology combinations needed to address them need to be much better understood than they are now. In particular, applications of a merger between Web and object technologies are still being clarified. While it is easy to hypothesize about how such merged technologies might be used, concrete matching of hard requirements to actual capabilities is at a very early stage. A lot of this is still "technology push".

There is a need to better understand how standards for defining representations, such as XML, and standards for defining interfaces, such as CORBA, can be used together in providing enhanced interoperability. Distributed object architectures such as CORBA have tended to emphasize interface standards, while the Internet has tended to emphasize representation standards. However, the two approaches are clearly complementary, examples being the role of IIOP in providing CORBA interoperability, and the role of the DOM (essentially a set of interfaces) in providing a means to add behavior to Web pages. Moreover, the two forms of standards will increasingly be used together as, for example, CORBA-based systems increasingly deal with data in domain-specific standard representations.

The concept of "objects" in the context of the Web should not necessarily be identical to that of "objects" in a programming language or conventional distributed object system. The Web generally supports a philosophy of "loose-coupling" (e.g., of data and processing), which makes it highly flexible. This essential flexibility should be preserved in the Web's further technical development, given the diversity and heterogeneity both of the applications the Web must support, and the data and processing resources the Web makes available for possible integration. This means, among other things, that technology integration must be modular, and it must be possible to easily alter connections between data and processing resources to adapt to new requirements. The general approach we have identified attempts to take these requirements into consideration.

The Web's standards process is in many respects still maturing. The W3C has made tremendous progress, and done some outstanding technical work, but the incorporation of this work into widely-available commercial products is somewhat spotty. This is to some extent the result of the fact that the demand for standards compliance is still rather lacking as compared with the demand for new features. The increasing use of the Web for larger-scale and enterprise-critical applications will create much of the required pressure for standards compliance.

Next Steps

Additional work is needed within W3C on integrating XML, DOM, and the other technologies we have identified, along the lines identified in our framework, to support a full integration of Web and object capabilities. Corresponding work is required within OMG. At the same time, additional work is needed to better understand the applications made possible by such an integration of Web and object capabilities.

Another obvious next step is the development of database-like capabilities based on Web technologies such as XML, RDF, and DOM. We had originally intended to work on such capabilities (in particular, query facilities) in this project. However, we did not pursue this activity due to a decision to concentrate on Web/object integration, as providing the basic foundation for this and other work. The database community has defined extended query facilities (e.g., Lorel, UnQL) to support their semistructured data representations. The database community has also developed query facilities, together with formal underpinnings, for SGML structures (e.g., OQL-doc). Developments of this type of technology have begun to address Web requirements, e.g., the recent XML-QL submission to W3C <http://www.w3.org/TR/NOTE-xml-ql>, but further work is required in this area. Query-like capabilities also play important roles in both formatting specifications (a limited query notation for identifying parts of SGML structures called SDQL exists within the ISO DSSSL standard for formatting SGML documents, and a similar notation exists within XSL) and more advanced Web addressing mechanisms (e.g., the XML linking capabilities). The possible integration of these query capabilities is worth investigating.

Impact

The Internet today supports a wide variety of data structuring mechanisms, such as HTML, MIME, and many existing and proposed metadata formats (e.g., SOIF, PICS, Warwick Framework). These representations were developed independently for various specialized purposes. The limitations and lack of integration of these mechanisms increasingly creates problems in developing advanced Web applications and in providing advanced services for these applications. These problems are particularly evident in applications which require the Web to support rich structures of data, metadata (data about data), and behavior, e.g., where multiple users, not just authors, contribute to a "knowledge base" of hyperlinked information including both new information, information which comments on or amplifies existing information, and processes (whether in the form of application programs, workflows, agents, or other forms) which act on this information. If we can succeed in extending the Web with object capabilities, it should be possible to not only deal with the problem of representing all this information, but also to add OMG-like object services and database-like functionality required for managing that information.

This project has identified the foundational basis for supporting more complex data structures and services in the Internet without requiring major departures from current emerging Web technology. The work also provides guidance toward rationalizing further developments within the Web and OMG communities for better-integrating Web and object technologies.

A program immediately benefiting from this project would be the DARPA AITS Architecture project, especially its Webserver component, since the approach we have identified is expected to provide the benefits of the current idiosyncratic Webserver architecture but in a form compatible with emerging industry standards. More broadly, our approach provides a sound direction for combining Web and object technologies into a richer knowledge-based representation, which should benefit both a knowledge-based Web and enterprise computing.

This research is sponsored by the Defense Advanced Research Projects Agency and managed by the U.S. Army Research Laboratory under contract DAAL01-95-C-0112. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of the Defense Advanced Research Projects Agency, U.S. Army Research Laboratory, or the United States Government.

© Copyright 1997, 1998 Object Services and Consulting, Inc. Permission is granted to copy this document provided this copyright statement is retained in all copies. Disclaimer: OBJS does not warrant the accuracy or completeness of the information in this document.

Last revised: September 15, 1998. Send comments to Frank Manola.