Some Web Object Model Construction Technologies

Frank Manola
Object Services and Consulting, Inc.
 
24 September 1998


Abstract

The World Wide Web is becoming an increasingly important factor in planning for enterprise distributed computing environments. However, as organizations have attempted to employ the Web in increasingly-sophisticated applications, these applications have begun to overlap in complexity the sorts of distributed applications for which distributed object architectures such as OMG's CORBA were originally developed. Since the Web was not originally designed to support such applications, Web application development efforts increasingly run into limitations of the basic Web infrastructure.

A fundamental direction of efforts to address the limitations of current Web data structuring technology has been attempts to integrate aspects of object technology with the basic infrastructure of the Web. A previous report, Towards a Web Object Model [Man98], identified the relationship of parts of the Web to parts of an object model, described some Web standards related to those object model parts, and suggested an overall approach for combining these standards to produce "objects" in the Web. This report describes a number of new Web technologies currently being developed that further the integration of Web and object technology. In particular, this report:

 


Contents

1. Introduction
1.1 Background
1.2 Constructing Objects in the Web
1.3 Purpose of this Report
2. Web Object Model Construction and Design Issues
2.1 Web Object Model Construction Requirements
2.2 Some Web Object Model Design Considerations
3. Technology Examples
3.1 Scripting-Related Technologies
3.2 "XML-Native" Behavior Attachment Mechanisms (and Extensions)
3.3 XML Object Serialization
3.4 XML as a Programming Language
3.5 Object Interface and Messaging Technologies
4. Using these Technologies
4.1 Applications for Web Object Models
4.2 Constructing Objects in "Real" Object Models
5. Conclusions
References
 


1. Introduction

1.1 Background

The World Wide Web is becoming an increasingly important factor in planning for general distributed computing environments, for example, to support external access to enterprise systems and information (e.g., by customers, suppliers, and partners), and to support internal enterprise operations. Organizations perceive a number of advantages in using the Web in enterprise computing, a particular advantage being that it provides an information representation which However, as organizations have attempted to employ the Web in increasingly-sophisticated applications, these applications have begun to overlap in complexity the sorts of distributed applications for which distributed object architectures such as OMG's CORBA, and its surrounding Object Management Architecture (OMA) [OMG97] were originally developed. Since the Web was not originally designed to support such applications, Web application development efforts increasingly run into limitations of the basic Web infrastructure.

If the Web is to be used as the basis of complex enterprise applications, it must provide generic capabilities similar to those provided by the OMA (although these may need to be adapted to the more open, flexible nature of the Web). This involves addressing the provision of higher level services (such as enhanced query and transaction support) and their composition in the Web. However, the basic data structuring capabilities provided by the Web must also be addressed, since the ability to define and apply powerful generic services in the Web, and the ability to generally use the Web to support complex applications, depends crucially on the ability of the Web's underlying data structuring facilities to support these complex applications and services.

The basic data structure of the Web consists of hyperlinked HTML documents, accessed via the HTTP protocol. It is generally recognized that HTML is too simple a data structure to support complex applications [Bos97], e.g.:

A fundamental direction of efforts to address HTML limitations has been attempts to integrate aspects of object technology with the basic infrastructure of the Web. There are a number of reasons for the interest in integrating Web and object technologies: The HTTP protocol has also shown limitations in its ability to support extended Web applications. The World Wide Web Consortium (W3C) HTTP-NG activity <http://www.w3.org/Protocols/HTTP-NG/> is attempting to address these limitations, also by integrating object technology. Specifically, the activity is developing a new architecture for the HTTP protocol based on a simple, extensible, distributed object-oriented model.

There is much other ongoing work within both the Web and database communities on data structure developments to address Web-related enhancements. Work on similar issues is ongoing within the Object Management Group as well. This work has contributed valuable ideas, and the various proposals illustrate similar basic concepts (generally, movement toward some form of object model). If the Internet is to develop to support advanced application requirements, there is a need for both richer individual data structuring mechanisms, and a unifying overall framework which supports heterogeneous representations and extensibility and provides metalevel concepts for describing and integrating them.

1.2 Constructing Objects in the Web

In a previous technical report, Towards a Web Object Model <http://www.objs.com/OSA/wom.htm> [Man98], we: In particular, the report described how a number of (in some respects) separate "threads" of development in the Web community could be combined to form the basis of a Web object model to address the requirements for enhanced Web capabilities. This combination is based on the observation that the fundamental components of any object model are: Extending this idea to the Web environment, the idea is that Web pages (or parts thereof) can be considered as state, and objects can be constructed by enhancing that state with additional metadata that allows the state to be considered as objects in some object model. In particular, we want to enhance Web pages (or parts of them) with metadata consisting of programs that act as object methods with respect to the "state" represented by the Web page (or part thereof).

A more complete integration of object technologies into the Web provided by a Web object model would provide the basis of both powerful capabilities for integrating all kinds of data and information, and a wide variety of enhanced services, within a distributed architecture that is both widely-available, and easy-to-use and extend.

[Man98] noted that progress toward a Web object model requires:

At the same time, the openness of the Web compared to conventional object models needs to be preserved, due to the distinct requirements of the Web environment for openness, scalability, and support for heterogeneity.

The report also identified, as key component technologies to support these requirements:

These same technologies can also play other valuable roles in distributed object architectures (for example, XML could be used as a serialization representation for moving objects within a network).

In addition to using these emerging Web technologies, the report also proposed taking advantage of other existing aspects of the Web, e.g.:

1.3 Purpose of this Report

Overall, [Man98] identified the relationship of parts of the Web to parts of an object model, described some Web standards related to those object model parts, and suggested an overall approach for combining these standards to produce "objects" in the Web. This report goes into more detail about some specific technologies that further the integration of Web and object technology, and could be useful in building objects in the Web. In particular: As this report will show, considerable work is in progress on technologies that are relevant to object construction in the Web, and there are numerous alternative technologies available to address various parts of the problem. Hence, the integration of Web and object technologies as described here is extremely promising. What will be important will be the emergence of a few relatively popular mechanisms for addressing the various parts of the problem, together with widespread support from the Web infrastructure, products, and standards.

Caveats

The following subsections describing the various technologies include text and specific examples taken from the cited references. As with the previous report, the purpose in doing this is to bring together in one place enough material to illustrate key concepts and the roles they might play in supporting a Web object model, and to give the reader a feel for how generalizations of these concepts might be developed. Hence, this report makes no claims of originality for most of this material, and readers should refer to the cited sources for further details.

In addition, this report does not claim to be totally comprehensive. The specific technologies cited are not necessarily the only technologies in these areas under development; instead, they are cited as examples or illustrations of the types of technologies being developed. New technologies appear quite frequently.

Finally, a number of the sections refer to ongoing activities of the World Wide Web Consortium (W3C). The reader should be aware that in many cases the specifications and capabilities described are works in progress. As a result, some of the details described in this report, as well as the source references, may no longer be completely accurate (or accessible due to changed URLs) by the time this report is read. The latest information on these activities can always be obtained through the main W3C Web page <http://www.w3.org/>.


2. Web Object Model Construction and Design Issues

[Man98] described a general approach for integrating Web technologies to form a Web object model, based on the idea that an "object" in a conventional object model is basically a piece of state with some attached (or associated) programs (methods). In many object model implementations, this idea is exactly reflected in the physical structure of the objects. For example, a Smalltalk object consists of a set of state variables (data), together with a pointer (link) to a class object which contains the object's methods. The structure is roughly:
    Object (state)                Class object
  +---------------+              +-------------+
  | class pointer |------------->| Class data  |
  +---------------+              +-------------+
  | variable 1    |              | method 1    |
  | variable 2    |              | method 2    |
  |   ...         |              |   ...       |
  | variable n    |              | method m    |
  +---------------+              +-------------+
C++ implementations use similar structures. The state is a collection of programming language variables, which (usually) are not visible to anything but the methods (this is referred to as encapsulation). A typical object model has a tight coupling between the methods and state. All the structures (class objects, internal representation of methods and state, etc.) are determined by the programming language implementation, and are created together as necessary. The class (in particular, the methods it defines) defines the way the state should (and will) be interpreted within the system, and hence is a form of metadata for the state. As a result, the link between an object and its class is essentially a metadata link.

Extending this idea to the Web environment, the idea is that Web pages (or parts thereof) can be considered as state, and objects can be constructed by enhancing that state with additional metadata that allows the state to be considered as objects in some object model. In particular, we want to enhance Web pages (or parts of them) with metadata consisting of programs that act as object methods with respect to the "state" represented by the Web page (or part thereof). The resulting structure would, at a minimum, conceptually be something like:

                       +----------+
           +---------->| method 1 |
+-------+  |           +----------+
|  Web  |--+              ...
|  page |--+           
+-------+  |           +----------+
           +---------->| method n | 
                       +----------+
There are already a number of mechanisms used in the Web to integrate code (behavior) with Web pages, so the construction mechanism may be either simple or complex.

Constructing these object model structures requires a number of "pieces" of technology. [Man98] identified the following pieces:

Section 2.1 goes into somewhat more detail on these Web object model "pieces", both identifying additional "subpieces", and also identifying some technologies for implementing those "pieces". Some of these technologies are covered in additional detail in Section 3. Section 2.2 then discusses some considerations that apply to Web object model design.


2.1 Web Object Model Construction Requirements

As noted above, objects consist of a number of different "pieces". Hence, when constructing objects, it is necessary to provide for the implementation of each of those "pieces". This section reviews these pieces, briefly reviews some alternative means for implementing them, and also reviews some additional requirements related to object construction in the Web. The various pieces identified here are in some sense conceptually distinct. However, a given technology may bundle several of these pieces together so that they do not appear as separate concepts.

2.1.1 State Representation

An object's state representation is used to hold various pieces of data that are operated upon by the object's operations, and that retain information from one operation invocation to the next. [Man98] discussed using XML documents as the basic representation for the state of objects in a Web object model, but made no other assumptions about the format of state. That is, there were no assumptions about a specific data/object model for state, about required elements, etc. In general, it is desirable to be able to create objects from any well-formed XML, and hence there should be minimum restrictions on the form of such XML.

These assumptions (or rather, the lack of them) means that the state can have a very general form, given the generality of XML. For example, the state of an object need not be all in one unit (or file), since XML allows a document to refer to external entities that can contain either XML or data in some arbitrary representation. In addition to referencing other data using entities, XML documents, like HTML pages, can also use hyperlinks to other documents or Web resources, and these could be considered as part of the state defined by the document, depending on the semantics that the software that interprets the object attaches to this linked material. At the same time, the state of an object could be defined as being a subset of an XML document, i.e., a given XML document could represent the state of multiple objects. This is so because an XML document represents a tree of separately-marked elements. A given element could be interpreted as being a separate object, with its type and associated methods identified by the element tag. Some of these techniques are discussed in Sections 3.2 and 3.3. (Introductory material on XML may be found in [Man98], or in a number of recent texts, e.g., [Har98, Hol98, Lig97, Meg98, StL98].)

2.1.2 Methods (Behavior)

An object's methods (programs) implement the operations that can be applied to the object. Supporting object methods imposes a number of requirements: Behavior associated with Web pages today is represented as binary executables (e.g., plugins, DLLs, ActiveX controls), Java bytecodes, and scripts in various scripting languages. Each of these forms requires its own interpreter. Binary executables are generally interpreted by the hardware. Java bytecodes require the Java Virtual Machine. A scripting language requires an interpreter for that language. In a situation where methods may have multiple representations or languages, the interpreter that must be invoked to interpret a given piece of code must somehow be indicated. This may be done in various ways, e.g., by the file extension of the file containing the program, or by the LANGUAGE or TYPE attribute in an HTML SCRIPT element. The need to support multiple interpreters is illustrated by Microsoft's Internet Explorer 4 (IE4) browser, which supports HTML pages having scripts in either VBScript or JScript (Microsoft's implementation of JavaScript). The SCRIPT element must indicate which scripting language is being used (a given page can contain scripts in both languages).

Proposals have also been made to represent programming language constructs directly in HTML and XML (i.e., define elements/tags which represent programming language statements for control flow, function calls, etc.) as yet another representation of Web page behavior. Several such approaches are discussed in Section 3.4. These approaches effectively define another scripting notation, where the special tags are interpreted by their own interpreter to generate the specified behavior.

2.1.3 Method/State Relationships

Object construction involves supporting two types of method/state relationships: In non-object software, programs and data are separate, and programs are explicitly passed the data they are to operate on. Objects, on the other hand, involve a more explicit specification of which programs (methods) apply to which state. Methods can be associated with the state to which they apply in a number of ways, and for purposes of this paper, we take a rather loose interpretation of how tight this association must be. This is discussed further in Section 2.2.

[Man98] noted that programs, as descriptions of behavior, could be considered a form of metadata, and hence could be described using either embedded or separate RDF resources. For example, an RDF resource associated with a given Web page could contain OBJECT elements (or some corresponding XML constructs) that identify the programs that act as the page's methods. Alternatively, the RDF resource might refer to programs defined as Web resources using some mechanism other than the OBJECT element, and also include a reference (possibly as an OBJECT element) to a "loader" mechanism capable of accessing those programs and providing them to the client on request. RDF resources contain explicit references to the Web pages for which they define metadata. However, they do not require that the Web pages themselves be aware of the existence of this metadata, and hence do not require that the pages be created with (or modified to contain) references to the metadata. Thus, using an RDF-based approach would allow Web pages to be associated with object methods without the pages themselves having to contain references to the methods.

However, while methods can be considered a form of metadata, and hence could in principle involve RDF, there are a number of alternative (and generally more straightforward) mechanisms that are also available.

Possibly the most straightforward approach to associating methods with a Web page is to embed either the method itself, or a pointer to it, in the page. Scripts are often implemented as directly embedded methods, using the HTML SCRIPT element to contain the script. Applets are usually implemented by embedding a pointer to the file containing the applet in the page, using the HTML APPLET or OBJECT tag. Scripts can be implemented using pointers to external files as well. In particular, SCRIPT elements can be used to associate collections of scripts with a page, thus forming something akin to an object class. Separately linked style sheets also provide a way of associating sets of behaviors with HTML documents.

These mechanisms for associating methods with HTML pages involve the use of special tags, e.g., OBJECT and SCRIPT, along with built-in mechanisms for processing them. XML per se does not define such standard tags. However, XML defines a number of mechanisms that can be used to define elements that contain scripts or other method representations, or pointers to them. These methods could then be processed (interpreted) by separate interpreters (much as Web browsers refer HTML SCRIPT and OBJECT elements to specific interpreters). In addition, W3C's Extensible Style Language (XSL) <http://www.w3.org/TR/WD-xsl> represents work toward a way of associating formatting behavior with XML documents. Additional XML-related technology is also being developed to support other ways of associating behavior with XML pages. These XML-based technologies are discussed in Section 3.2.

Current HTML mechanisms for associating methods with Web pages deal not only with associating methods with pages as a whole, but also with associating methods with specific parts of Web pages. For example, a given script can be associated with a particular HTML element to provide behavior for that specific element (e.g., to define what happens when the user clicks on it). The same script could also be associated with multiple elements. Stylesheets also provide mechanisms for associating styles with specific HTML elements (or possibly all elements satisfying specific criteria). Mechanisms for associating methods with XML pages must provide corresponding capabilities. This is particularly important in the context of XML, since each XML element type (tag) potentially represents distinct semantics, and hence may need to be associated with behavior specific to those semantics.

Another form of method/state relationship is the way the methods gain access to the state (for reading or updating it). The form of this access is a particularly important consideration in building objects from XML documents, which may have a complex structure, as opposed to the simple "flat" state typically found in programming language objects.

The most straightforward approach for providing this access would be to simply provide each method with access to the character string making up the XML state. However, this would require that the method effectively parse the state in order to find the particular part of the XML document it needed to work on. [Man98] identified the W3C's Document Object Model (DOM) as a means by which methods could conveniently access the (XML-based) state. The DOM provides a generic object-oriented API for XML documents, permitting methods to access individual elements (and other components) of the XML without having to parse the document. The use of an explicit DOM interface to the object state representation also provides another advantage, namely an additional degree of generality in choosing the state representation. This is because, if methods access the state through the DOM interface, any representation (not just XML) could conceivably be used in building an object, provided that a DOM (or DOM-based) interface could be defined for it.

The DOM itself defines only a generic API to an XML document (e.g., it presents the document as a collection of objects of types such as Document, Element, and so on). It does not provide for the construction of application-specific interfaces based on the interpretation an application may assign to specific element names (e.g., the DOM does not provide a specialized interface for an AUTHOR object even though the document's outer element tag may be AUTHOR, and those are the intended semantics of the document). Instead, additional processing must be applied to the XML to provide such interfaces, possibly as specializations of the DOM-defined interfaces. Examples of technology to define such interfaces are described in Sections 3.1 and 3.3.

In some cases, providing a method with access to state needs to be done in what might be considered multiple steps. For example, Java applets associated with a Web page have no native way to access the page directly. Netscape's LiveConnect technology, discussed in Section 3.1, provides such a mechanism.

The approaches discussed above are more-or-less "Web-native" ways to associate behavior with Web pages. A conceptually different approach is that used in providing object interfaces to non-object data (such as relational databases) in distributed object systems using object "wrappers" or "shells". In this approach, ordinary objects in some object-oriented implementation technology are created, which use Web data as their state (e.g., they might retrieve data from the Web when they need access to state information). These approaches are not discussed further here, but are discussed in, e.g., [App97, OH98].

2.1.4 Interfaces and Related Metadata

In addition to methods and state, objects typically define interfaces for use by other objects or external clients. An object's interface is an API that determines which object methods and state are revealed to external clients, and that conceals (encapsulates) the details of the actual method and state representations from those clients. Supporting object interfaces creates several requirements: The extent to which an interface is needed in constructing objects in the Web depends on how serious the desire is to create "true objects" (e.g., objects with characteristics similar to those of programming language objects). If all that is required is some loose idea of Web pages with behavior, no specific interface technology is required. On the other hand, if it is desired that the XML plus behavior appear to be an object in some well-defined object model, such as a JavaBean or CORBA object, then specific interface support technologies become more relevant.

Generally, it is possible to define objects in many different object models using the same basic internal components (e.g., methods and state). The W3C DOM specifications illustrate how an XML document can be represented as objects in CORBA IDL, ECMAScript, and Java. Section 3.1 illustrates how XML can be used to construct objects in both Java and JavaScript (despite the similarity of the names, these are two distinct object models) using some specific interface support technologies.

The NCITS Object Model Features Matrix [Man97] identifies many different object models, with widely differing characteristics and ways of defining object interfaces. Different object models could also be used for objects in the Web. The details of the structures to be supported in a Web object model depend on the details of the object model we choose to define. For example, many object models are class-based, such as the C++ and Java object models. Choosing a class-based model for the Web would require defining separate class objects to define the various classes. Other object models, such as JavaScript, are prototype-based, and do not require a class object (each object essentially defines itself).

In typical distributed object systems, such as CORBA, interface definitions are sometimes used by stub generators to generate code for accessing objects, and hence the interface definitions themselves may not be needed at run time in processing method invocations. However, in other situations (such as dynamic invocation) it may be necessary to access the interface definitions at run time. As a result, the interface definitions are saved in some form of Interface Repository so they can be accessed when needed. This requires that an explicit representation for interface definitions be considered. In the context of the Web, a straightforward approach would be to represent object interface definitions in XML. A number of Web technologies described in Section 3 use XML to define interfaces. The XML element types (tag names) used in these XML representations are based on the particular object model chosen.

An important supporting technology in the definition of interfaces for objects in the Web (and in expanded applications for XML in general) would be improvements in the ability to define data type information for XML. XML Document Type Definitions (DTDs) are sometimes characterized as being like database schemas, or object class definitions; i.e., as defining the characteristics of a collection of instances (individual XML documents). However, this is only true to a limited extent. David Megginson, in a recent email message to the xml-dev email list, has noted that XML DTDs actually "bundle" several roles (inherited from SGML DTDs):

Unlike a schema or class definition, the DTD for a given document may be partially contained in the document itself (the internal subset). In addition, DTDs provide only very limited support for defining what would be recognized as data types in an object model. Most XML elements are from the single domain "character string". Although external entities having non-character representations can be identified using XML NOTATION specifications (see Section 3.2), these resemble MIME types rather than data types, in that they can be used to identify a processing application, but do not constrain the data at the individual data element level as a conventional data type mechanism would.

A number of proposals have been made for alternative schema and data type facilities for XML, including:

However, none of these is an official W3C specification. In addition, in related work: It is generally agreed that improvements are needed in XML's structural schema and data typing facilities. There is also considerable agreement that the data typing facilities should be unbundled from the structural schema mechanism. These technologies will not be discussed further here. However, it should be noted that there are plans for W3C's XML activity to begin work on improvements in schema and data typing facilities for XML.

2.1.5 Pulling Objects Together

An interface definition only defines an interface (object API). The effect of the interface (an implementation for it) must also be provided. That is, there must be something that combines the interface, state, and methods into an object (unit of software). In conventional object-oriented programming languages, these "pieces" are compiled into the objects, which themselves (possibly together with some built-in run time support, such as a dispatcher) provide the implementation. In other cases, such as CORBA implementations, the system is designed to allow separate pieces of state and methods to be combined to form objects. In these cases, a conceptually-separate layer of software (in the case of CORBA, an Object Request Broker) is involved. This software (generically speaking) accepts a call on a method of a particular object, finds the state and method code referred to in the request, and invokes the method on the state.

Both these approaches to "pulling objects together" are illustrated by the technologies discussed in Section 3. In some cases, XML-encoded information, possibly together with other object "pieces", is translated into objects interpretable by an existing object interpreter (e.g., into Java or JavaScript objects). In other cases, a separate interpreter or other piece of software is used to interpret the XML and other object "pieces" as objects (e.g., the Scriptlets runtime engine discussed in Section 3.1.4.2).

2.1.6 Additional Requirements

In addition to the "pieces" identified above, creation of a fully-general Web object model requires addressing a number of additional issues. For example, changes in the architectures of Web clients and servers may be required. The WebBroker technology described in Section 3.5.1 involves adding an HTTP server to the client. [StL97] discusses extended architectures for both Web clients and servers, as well as extended file system (or OODBMS) support, for use with enhanced Web technologies.

Other, more general, architectural issues must also be addressed. For example, if the Web is to be used as a generalized information repository (whether objects are used or not), the issue of updating needs to be addressed. The DOM provides a general interface (at least potentially) for updating XML documents on the client side. However, better mechanisms are needed for conveying these updates to the server, and implementing them there. This includes protocol-level work such as WebDAV (Web Distributed Authoring and Versioning) <http://www.ietf.org/internet-drafts/draft-ietf-webdav-protocol-08.txt>, work on improved security mechanisms, and work on higher-level mechanisms for synchronizing concurrent updates. Such work includes work on versioning, simple locking (which is covered in the WebDAV work), and higher-level database-like transaction mechanisms (the WebDAV work uses XML to represent both method input and output information in its protocol). Additional work is also required on improved Web messaging protocols. A number of mergers of Web and object technologies, such as the WebBroker technology discussed in Section 3.5.1, use the existing HTTP protocol for object-oriented messaging. However, work such as HTTP-NG <http://www.w3.org/Protocols/HTTP-NG/> aims at improvements in the Web's support for "purer" object-oriented messaging (e.g., the use of a binary, rather than a text-based, format). Addressing these and other issues, such as support for improved caching schemes, will make the Web an even better vehicle than it is now for efficiently implementing objects.


2.2 Some Web Object Model Design Considerations

[McG98] notes that: "With the addition of purely electronic media, the already nebulous notion of a document becomes even more so. Electronic document content can include sound, video, virtual reality simulations, and so on. Furthermore, such documents can contain behaviour. The emergence of Object Orientation as a development paradigm has lead to a convergence of the concepts of document and object. Documents that can contain behavior (i.e., executable code) plus state information as well as data can be thought of as persistent objects in OO terminology."

However, creating objects from Web resources involves reconciling two organizational philosophies that appear somewhat distinct. On the one hand, the conventional idea in object-oriented programming is that the behavior defined for an object is "mandatory"; an essential component in defining the semantics of an abstract data type. Having only data, even using semantically-meaningful attribute names as in XML, could allow the data to be used inappropriately, in a way that is not consistent with the actual intended type semantics (which are defined by the operations).

On the other hand, the SGML philosophy (from which XML and much of the Web derives) is that the essential semantics are wrapped up in the data alone (in particular, in the element names). In this philosophy, there are a number of good reasons for not binding operations too tightly to the data (if at all). For example, labeling elements instead of explicitly binding them to code avoids locking information into one program, or even one purpose [DeR97]. This is particularly important in the case of a representation (like XML) with possibly-rich semantics, which may be moving between organizations. For example, a Purchase Order represented in XML, and transferred between two businesses, will probably not be processed by the same programs at each end. Instead, the XML would be linked to distinct (company-specific) processing. So, according to this approach, it is enough to tag the document as being a PURCHASE-ORDER, and rely on a separate mechanism to associate a PURCHASE-ORDER with the appropriate application(s). The use of MIME types in the Internet illustrates the value of this general approach, since it allows the use of different programs on the same data, while also providing information (the MIME type) that helps see to it that only programs capable of processing the data are actually used. The use of XML element names provides a similar form of indirectness in associating behavior with data, but at the granularity of individual elements. This same concept is also very much along the lines of the thinking behind the use of a relational database as a generalized information repository which can then be used by multiple applications. The need for care in how tightly behavior is associated with data also shows up in object analysis and design.

In general, it is desirable to support flexibility in how tightly behavior is associated with data, since in some cases a tightly-coupled object-oriented programming style will be appropriate, while in other cases a looser coupling will be appropriate. For example, in some cases it might be appropriate to agree on a definition of each element type, and possibly even to include a pointer to a machine-accessible version of that definition as metadata (e.g., an ontology), but not to require the use of a specific piece of code to process it. On the other hand, in some cases it may be desirable to provide standard pieces of code to process particular elements (e.g., to implement industry-standard semantics where process interoperability is involved, or industry-standard algorithms for certain kinds of processing exist). In addition, it is desirable to support the ability to change the decision as to how tightly behavior is associated with data, to accommodate changes in requirements or other circumstances.

These requirements for flexibility particularly exist in the Web, given its dynamic and heterogeneous nature. At the same time, however, Web technologies provide the necessary infrastructure to support these requirements. For example, the Web infrastructure supports remote access to code, which makes tightly coupling data with "standard" code more feasible. Moreover, XML and related technologies described in Section 3.2 illustrate that a range of mechanisms exist for supporting both loose and tight coupling of data and behavior. Interface implementation mechanisms such as Object Request Brokers provide another mechanism for flexibly associating data and behavior. A Web object model must support these flexibility requirements, and take maximum advantage of the Web's capabilities in doing so.

The above comments have primarily concerned flexibility in the Web at the object construction level. Flexibility is also required at the model level. An object model for objects in the Web itself needs to be flexible, in the sense of providing support for interoperability among the wide range of data and processing resources on the Web, much as a global data model provides this type of support in a heterogeneous distributed database system [SL90]. Section 3.1 describes some existing forms of interoperability between different object models in the Web. The idea of a global object model for the Web is discussed further in Section 4.2.


3. Technology Examples

There are numerous technologies being developed that apply to aspects of the problem of creating objects in the Web (combining state and behavior in the Web context). This section describes some of them. The primary goal is to identify what aspects of the problem they address, what sorts of lessons we can learn from them, and the state of associated technology. These descriptions should be considered as only a limited snapshot, since technology in these areas is progressing at a very rapid pace.


3.1 Scripting-Related Technologies

Currently, the most straightforward way of merging behavior and Web state is via scripting, using the Dynamic HTML (DHTML) facilities implemented in popular Web browsers. This section describes some current mechanisms for using scripting to associate behavior with HTML Web pages. As the name indicates, DHTML is an HTML technology, rather than an XML technology. Corresponding facilities must be defined for XML in order to allow XML-based scripting, and the primary purpose of this section is to give some examples of requirements for those facilities. One such facility is the W3C's Document Object Model (DOM). DOM was described in [Man98]. DOM defines an API to an HTML or XML document, by defining a representation of the document's contents in terms of a collection of objects. These objects can then be manipulated from scripting (and other) languages. The collection of objects defined in DOM is a generalization of corresponding objects defined in current DHTML implementations. Explicit mechanisms for associating scripts with XML documents must also be defined. Some mechanisms for doing this are described in Section 3.2.

This section also discusses some technologies that go beyond straight HTML. Specifically, Section 3.2.3 describes some techniques which explicitly make use of XML, or XML-like extended tags, together with scripting, for creating components (objects), in several object models, which can be embedded in Web pages.


3.1.1 DHTML and Scripting

DHTML provides a means for scripts to manipulate an HTML document's structure, style, and contents in response to events, to produce a dynamic display. The various implementations of DHTML are thoroughly described in widely-available texts, e.g., [Fla97, Isa97, StL97], so this section will only provide a very brief introduction.

DHTML defines objects that represent both an HTML document (and its contents), and also the browser environment, e.g., the browser and the browser window. A script is associated with the document using an HTML SCRIPT element. The SCRIPT element may either directly contain executable code in a specified scripting language, or associate external script libraries with the document by referencing an external file. Other elements on the page can then include specifications of particular events, e.g., onClick to indicate a user selecting the element, together with the script-defined function to be invoked when that event occurs with respect to that element. An important capability of DHTML (and the reason it is called "Dynamic") is that scripts can dynamically change what is seen by the user. Whenever a script manipulates and changes an attribute of an element or modifies the contents, the document is redisplayed with the new information. The exact DHTML facilities provided currently differ among browser implementations, which is the basic reason for the W3C DOM standards activity.

The JavaScript scripting language supported by Netscape and Microsoft (as JScript) provides the usual means for scripting in DHTML (although Microsoft Internet Explorer also supports VBScript). Some simple examples of JavaScript are distributed throughout this report (see the cited texts for more detailed examples). A particularly interesting aspect of JavaScript is its extremely flexible object model (described in the following section). For example, unlike conventional object-oriented programming languages, JavaScript functions and properties can be dynamically added to any object, and predefined behavior can be overridden. An eval method is also available that can evaluate a passed-in string as a script, and return the result.

DHTML (at least in principle) allows elements to be dynamically added to a document, either by creating a new element in memory and associating it with the document, or by directly modifying the underlying HTML contents. DHTML exposes all information about the document, including unrecognized elements and attributes. This feature can be used in some cases to create user-defined behavior based on custom elements and attributes. E.g., by adding unrecognized attributes (attributes not defined in HTML) to existing elements, the elements can be provided with additional behavior (thus providing a way to simulate subclassing of an element). In addition, user-defined elements can be added, and behavior attached to them. (The ability to do this depends on the specific browser involved, and also on the way current HTML browsers generally ignore unrecognized elements and attributes).

The HTML OBJECT element allows controls and applets to be included in a page that extend the available behavior. For example, objects can be defined to embed graphics or even other documents directly into the document.

Motivations cited for the use of scripting languages versus more conventional programming languages (in some cases, including Java) include:

DHTML provides support for a number of the "pieces" required in constructing objects in the Web: Similar facilities will need to be provided to support XML-based scripting. Some of these facilities are defined already (e.g., the DOM for XML); others are discussed in Section 3.2.


3.1.2 ECMAScript Object Model

ECMAScript [ECMA97] is an object-oriented programming language for performing computations and manipulating computational objects within a host environment. ECMAScript is essentially the standardized specification of JavaScript (it is based on both Netscape's JavaScript and Microsoft's JScript scripting languages), and hence was originally designed to be a Web scripting language, providing a mechanism to add behavior to Web pages in browsers and to perform server computation as part of a Web-based client-server architecture. However, ECMAScript can provide core scripting capabilities for a variety of host environments, and therefore the ECMAScript language per se is not intended to be computationally self-sufficient. For example, there are no provisions in the ECMAScript specification for input of external data or output of computed results. Instead, it is expected that the host environment of an ECMAScript program will provide not only the objects and other facilities described in the language specification, but also certain environment-specific host objects, whose description and behavior include properties and functions that provide the necessary additional facilities, and which can be called from an ECMAScript program.

For example, a Web browser (client) provides an ECMAScript host environment for client-side computation which includes objects that represent windows, menus, pop-ups, dialog boxes, anchors, frames, history, cookies, and input/output (these objects are defined by DHTML's Document Object Model, and the DOM being specified by the W3C). Further, the host environment (DHTML in this case) provides a means to attach scripting code to events such as change of focus, page and image loading, selection, form submission, and mouse actions. Scripting code appears within the HTML, and the displayed page is a combination of user interface elements and fixed and computed text and images. The scripting code is reactive to user interactions and there is no need for a main program. A Web server would provide a different ECMAScript host environment for server-side computation which includes objects representing requests, clients, and files, and mechanisms to lock and share data. By using browser-side and server-side scripting together it is possible to distribute computation between the client and server while providing a customized user interface for a Web-based application. Each Web browser and server that supports ECMAScript supplies its own host environment, completing the ECMAScript execution environment.

ECMAScript is object-based: basic language and host facilities are provided by objects, and an ECMAScript program is a cluster of communicating objects. An ECMAScript object is an unordered collection of properties. Properties are containers that hold other objects, primitive values, or functions. A function stored in the property of an object is called a method. A property can also have 0 or more attributes which determine how the property can be used--for example, when the ReadOnly attribute for a property is set to true, any attempt by executed ECMAScript code to change the value of the property has no effect. A type in ECMAScript is a set of data values. A primitive value is a member of one of the following built-in types: Undefined, Null, Boolean, Number, and String; an object is a member of the remaining built-in type Object.

ECMAScript defines a collection of built-in objects which completes the definition of ECMAScript entities. These built-in objects include the Global, Object, Function, Array, String, Boolean, Number, Math, and Date objects. ECMAScript also defines a set of built-in operators which are not, strictly speaking, functions or methods, including various unary operators, mathematical operators, bitwise operators, relational operators, assignment operators, and so on.

ECMAScript syntax intentionally resembles Java syntax, but is relaxed to enable it to serve as an easy-to-use scripting language. For example, a variable is not required to have its type declared, nor are types associated with properties; the values stored in properties can change type at runtime.

ECMAScript does not provide classes such as those in C++, Smalltalk, or Java, but rather, supports constructors which create objects by executing code that allocates storage for the objects and initializes all or part of them by assigning initial values to their properties. All functions including constructors are objects, but not all objects are constructors. Each constructor has a Prototype property which is used to implement prototype-based inheritance and shared properties. Objects are created by using constructors in new expressions, for example, new String("A String") creates a new string object. Invoking a constructor without using new has consequences that depend on the constructor. For example, String("A String") produces a primitive string, not an object. Storage management is handled by a garbage collector.

ECMAScript supports prototype-based inheritance. Every constructor has an associated prototype, and every object created by that constructor has an implicit reference to the prototype associated with that constructor. A prototype may have a non-null implicit reference to its prototype, and so on; this is called the prototype chain. When a reference is made to a property in an object, the reference is to the property of that name in the first object in the prototype chain that contains a property of that name. In other words, first the object mentioned directly is examined for the property; if that object contains the named property, that is the property to which the reference refers; if that object does not contain the named property, the prototype for that object is examined next; and so on. All objects that do not directly contain a particular property that their prototype contains share that property and its value.

Unlike class-based object languages, properties can be added to objects on the fly simply by assigning values to them. That is, constructors are not required to name or assign values to all or any of the object's properties.

There is a unique global object which is created before control enters any execution context. Initially the global object has the following properties:

As control enters execution contexts, and as ECMAScript code is executed, additional properties may be added to the global object, and the initial properties may be changed. This, among other things, provides support for global variables.

ECMAScript's flexibility, as well as the fact that it is a "native" (to Web browsers) way of specifying behavior, allows it to play the role of an integrating technology in associating behavior with Web pages. This will be illustrated in several of the subsequent sections.


3.1.3 LiveConnect

LiveConnect is a technology introduced in Netscape 3.0 for providing an interface between JavaScript and Java [Hal98]. Such an interface is important because, while a JavaScript script has direct access to the page in which it is contained, an ordinary Java applet does not. This is so even though the applet is referenced from the HTML document, and its output might be displayed within the resulting page. LiveConnect allows applets to manipulate frames and windows, control images loaded from HTML, and do other things previously restricted to JavaScript, while also allowing JavaScript to access the capabilities of applets. In addition to its use in Netscape products, LiveConnect software can also be supported within Microsoft Internet Explorer 4.0.

One "direction" of using LiveConnect allows Java applets to be accessed from JavaScript. For example, if JavaScript needs to perform a complex computation, it may be easier to write this as a "hidden" applet called from JavaScript than as a native JavaScript function. JavaScript can access applets via the document.applets array, which is an array of objects representing each applet in the document, and is defined as part of DHTML's Document Object Model. (The W3C's DOM (HTML) Document object has a corresponding applets collection). If the applet is named, it can also be referred to by name, i.e., document.<appletName>. Any public method of the applet can be called by JavaScript. In the following example (from [Hal98]), the applet Acoustics supports a way of computing sound propagation through water, and a Web page is being created that requires that computation. The applet would be included in the page using:

<APPLET CODE="Acoustics" WIDTH=10 HEIGHT=10 NAME="acoustics">
</APPLET>
A JavaScript function calling the public getSignalExcess method of the acoustics applet would then be defined as:
function signalExcess(...) {
   return(document.acoustics.getSignalExcess(...));
}
The JavaScript signalExcess function could then be used wherever needed to compute the required values (in effect, the applet call is wrapped by the JavaScript function). In this case, the applet is referred to by the name document.acoustics.

The other "direction" of using LiveConnect allows applets to access JavaScript. This allows the use of Java syntax to access all JavaScript objects, read and set all available properties, and call any legal method. To use this approach, Netscape's JSObject class must be obtained and imported into the Java applet. This class supports a static method getWindow that allows the applet to obtain a reference to the window that contains it (as an object of type JSObject), as in:

JSObject Window = JSObject.getWindow(this);
The Window object's getMember method can then be used to read properties of the main window, and via the values of these properties, other parts of the window (JavaScript objects) can be accessed.

The ability to access page contents as data (i.e., an API to the page) is clearly a necessity in adding behavior to pages. LiveConnect provides this capability for HTML and Java, and some such capability is also required for XML. The most straightforward means of providing this would be for XML-supporting browsers to directly support DOM interfaces in a range of programming languages (including Java).

3.1.4 Script Components

Two relatively recent technologies, JavaScriptBeans from Netscape, and Scriptlets from Microsoft, illustrate a more thorough integration of Web pages with behavior and interfaces, by using XML (or XML-like) markup, together with scripts, to define components. These technologies combine:
3.1.4.1 JavaScriptBeans
JavaScript Beans (henceforth JSBs), also referred to as JavaScript components, are described in an online article Picking the Newest Crop of Beans, by Emily Vander Veer [VV97] <http://www.sigs.com/publications/objm/9711/vanderveer.html>, and in [Nic98]. JSBs are packages of JavaScript code that support an interface exposing properties, events, and methods. JSBs are implemented by text files with a .jsb extension. The files can be created by hand, or by using Netscape's Component Builder. Each .jsb file defines a JSB's properties, methods, and events, using HTML- (or XML-) like tags.

The following example <http://www.sigs.com/publications/objm/9711/vanderveer.code.html> defines a non-persistent client-side JSB that displays a configurable scrolling message in a browser's status bar. The JSB_PROPERTY and JSB_METHOD elements define the properties and methods, respectively, exposed at the component's interface. To use the AWT 1.1 event model, the additional elements JSB_EVENT and JSB_LISTENER would be used. If a method has parameters, each parameter is specified using a JSB_PARAMETER element inside the corresponding JSB_METHOD element. The JSB_CONSTRUCTOR element contains the JavaScript code (omitted here) for each defined method, and a constructor function (the last function defined) for the JavaScript component. The constructor code of a JSB is similar to the code that would be run to construct a user-defined object in JavaScript. Once the component is instantiated, it's scroll(), stop(), and reset() methods will be exposed for invocation from other components.

<JSB>
      <JSB_DESCRIPTOR NAME="netscape.peas.ScrollingBanner" 
            DISPLAYNAME="Scrolling Banner" 
            ENV="client" 
            CLASS="" 
            CUSTOMIZER="" 
            SHORTDESCRIPTION="JavaScript Scrolling Banner">
      <JSB_PROPERTY NAME="msg" 
            DISPLAYNAME="Message" PROPTYPE="JS" TYPE="string" 
            READMETHOD="" WRITEMETHOD="" 
            SHORTDESCRIPTION="Text message to scroll in status bar">
      <JSB_PROPERTY NAME="speed" 
            DISPLAYNAME="Scroll Speed" 
            PROPTYPE="JS" TYPE="number" VALUESET="0:200" 
            READMETHOD="" WRITEMETHOD="" 
            SHORTDESCRIPTION="Scrolling speed">
      <JSB_PROPERTY NAME="position" 
            DISPLAYNAME="Start Position" 
            PROPTYPE="JS" TYPE="" 
            READMETHOD="" WRITEMETHOD="" 
            SHORTDESCRIPTION="Starting position of message">
      <JSB_METHOD NAME="scroll" TYPE="void"> </JSB_METHOD>
      <JSB_METHOD NAME="stop" TYPE="void"> </JSB_METHOD>
      <JSB_METHOD NAME="reset" TYPE="void"> </JSB_METHOD>
      <JSB_CONSTRUCTOR>
            function scroll(){ //snip script }
            function stop(){ //snip script }
            function reset() { //snip script }
            function netscape_peas_ScrollingBanner(params){ //snip script }
      </JSB_CONSTRUCTOR>
</JSB>
Tools that read JSBs, like Netscape's Visual JavaScript, use the JSB file as the basis for automatically-generated JavaScript code, essentially translating the JSB into conventional JavaScript which can be interpreted by the browser. Since the result of a JSB is actually JavaScript, the JSB can be linked with other components on the page (or, using LiveConnect, with applets). JSBs can run on both Netscape Navigator or Microsoft Internet Explorer clients, and on Netscape's Enterprise Server (the JSB definition provides control over whether the bean can run on the client, the server, or both).

If a JSB has no support files, it can be used in a single .jsb file. However, JSBs can be made up of more than one file, including JavaScript code or external Java .class files. In this case, all the files can be packaged into a Jar file, for distribution as a unit.

[Nic98] notes that Visual JavaScript allows CORBA objects to be used in much the same way as JavaBeans and JSBs, through the use of JSBs. To do this, a file containing the IDL definitions for the CORBA objects is loaded into the Visual JavaScript tool. During this process, each IDL interface definition in the file is converted to a JSB, which serves as a wrapper or proxy for the (potentially remote) CORBA object. IDL attributes are converted into JSB properties. Other properties are also added by Visual JavaScript, e.g., an ObjectURL property, which allows the user to specify a URL which identifies the actual location of the CORBA object. At runtime, the CORBA object is found using the value of this property. IDL operations are converted to JavaScript method declarations (with data types converted to Java data types). Again, these methods serve as wrappers for the methods of the CORBA object. At runtime, the component will be connected to an actual CORBA object on the server. The JavaScript program references this object through a property called corbaObject.

Another technology for defining interfaces between JavaScript and CORBA objects is the CORBA Component Scripting response (orbos/97-11-27) from Netscape, Oracle, Visigenic, Sun, and Genesis to the CORBA Scripting Language RFP. This response proposes JavaScript (ECMAScript) as a CORBA scripting language, and is based on the use of LiveConnect technology. The approach involves constructing a JavaScript reflection of CORBA objects, which allows the CORBA object to be manipulated as if it were a JavaScript object. Property access and function calls in JavaScript are translated to accessors and method calls on the corresponding CORBA object. To construct the reflection, the CORBA IDL interface is mapped to a JavaScript object which serves as a prototype. Instances of that interface are represented by a JavaScript object whose prototype is the interface object. Attributes are mapped to properties of the JavaScript object. Methods are mapped to (JavaScript) function-valued properties of the JavaScript object. Invoking these functions from JavaScript causes the appropriate interface method to be invoked through the CORBA Dynamic Invocation Interface.

The response defines a complete language mapping from IDL to JavaScript. This closely parallels the LiveConnect language mapping from Java to JavaScript, which already permits JavaScript to act as an implementation language for JavaBeans and Java objects. No reverse JavaScript to IDL mapping is needed. JavaScript objects, because of runtime typing, can masquerade as objects implementing arbitrary IDL interfaces. This is implemented by using CORBA's Dynamic Skeleton interface to handle CORBA operations with JavaScript function calls or property references. Circumventing CORBA's type system in this way means that some errors will not be detected until runtime. If the JavaScript object does not have the necessary properties to implement an invocation on the interface, the CORBA::NO_IMPLEMENT exception is raised.

3.1.4.2 Scriptlets
Scriptlets <http://www.microsoft.com/sitebuilder/magazine/xmlscript.asp> (originally called "Server Scriptlets") are a Microsoft technology that allows COM components to be written in a combination of XML and a scripting language (see also [deB98, Esp98a,b]). These components can be used in the same way as any other COM component, e.g., they can be used by COM clients such as Microsoft Office, or embedded in Web pages like ActiveX components. Scriptlet support is defined as part of Microsoft's Internet Explorer 5.0 (IE5.0) Developer Preview <http://www.microsoft.com/ie/ie5/overview.htm>.

A scriptlet is defined by a file with a .sct extension, which contains script code and XML markup (using a specialized tag set). An example (from [Esp98a] is:

<SCRIPTLET>

<REGISTRATION
        Description = "Factorial"
        ProgID = "Factorial.Scriptlet"
        Version = "1.00"
        ClassID = { ***UUID for the class*** } >
</REGISTRATION>

<IMPLEMENTS ID="Automation" TYPE="Automation">
        <METHOD name="Calculate">
                <PARAMETER name="num"/>
        </METHOD>
</IMPLEMENTS>

<SCRIPT language="JScript">

function Calculate(num) {
        if(num<2)
                return 1;
        return num*Calculate(num-1);
}

</SCRIPT>

</SCRIPTLET>
The <IMPLEMENTS> element can also define component properties as well as methods, using a <PROPERTY> tag. A property can be defined either by specifying its name, as in
<PROPERTY name="Text"/>
or by explicitly specifying GET and PUT methods (which refer to functions written in the scripting language), as in
<PROPERTY name="Text">
        <GET internalname="DoGetText"/>
        <PUT internalname="DoPutText"/>
</PROPERTY>
Explicitly specifying GET and PUT methods allows for custom read and write functions, and also allows the definition of read-only and write-only properties (by only specifying a GET and PUT method, respectively).

The <REGISTRATION> tag defines the information needed to identify the COM object defined by the scriptlet. The CLSID is used instead of the file name to identify the component (e.g., in referring to it from other files). The association between the component and the file name is stored in the Windows Registry during the registration process.

Part of the implementation of scriptlets is a DLL called scrobj.dll. This is a runtime engine that interprets and executes the XML code of the scriptlet. It also acts as a broker between clients and the scriptlets. A module in the DLL provides a default implementation for the standard COM server interfaces IUnknown, IClassFactory, and IDispatch. The DLL also handles registration and dynamic instantiation.

An application uses the object's ProgID to identify the object. The ProgID is matched to a Registry entry that stores the path both to scrobj.dll and the .sct file that defines the component. Each time the client calls one of the component's exposed methods, scrobj.dll loads the proper script from the linked .sct file. A scriptlet can be inserted in an HTML page by referencing it like an ActiveX control, and invoking its methods from JavaScript.

IE5.0 also defines a mechanism called DHTML Behaviors which allows scriptlets to be associated as behaviors with specific document elements using CSS stylesheets (see Section 3.2.2.2). Scriptlets used to implement DHTML behaviors do not require registration to be used within Web pages (they are downloaded as needed). By adding a behavior handler (by specifying <IMPLEMENTS TYPE="Behavior">), such scriptlets can also expose custom events to the page, access the containing page's DHTML (DOM) objects, and receive event notifications.

Multiple scripting languages can be used for a single scriptlet's methods (a separate <SCRIPT> element is required for each language); the runtime module calls the appropriate script language interpreters as necessary.

Advantages cited for scriptlets are:

3.1.4.3 Discussion
JSBs and Scriptlets illustrate what is really the essence of object construction using Web technologies: how objects in models conventionally used for programming language objects (Java and COM, respectively) can be defined using specialized XML markup and scripting as the representation for the object state, methods, and interfaces, and how these XML-based objects can become parts of more general object architectures. The two technologies illustrate a lot of similarity in their approaches regarding the XML tags used, how interfaces are defined, how methods and properties are exposed, etc. In JSBs, the XML is effectively "compiled" into JavaScript, so the processor/interpreter is the script interpreter. In Scriptlets, the XML is interpreted at run time by the scrobj.dll engine, which invokes script interpreters as necessary.

With the appropriate interpreters, this same general approach could also be used to directly construct objects in other object models, such as CORBA IDL (Section 3.1.3.1 also noted how what are effectively JSB or JavaScript "wrapper" objects could also be used to support CORBA objects). A further generalization of this approach could also support objects implemented using more general XML structures than the relatively simple ones allowed in JSBs and Scriptlets.


3.2 "XML-Native" Behavior Attachment Mechanisms (and Extensions)

As noted in Section 2, SGML emphasizes an approach which separates specific processing programs from the marked-up data, associating behavior with the tags as needed in external "applications". However, SGML also includes mechanisms (which have been carried over into XML) that can make the association of various kinds of processing with documents more explicit. This section describes these mechanisms, together with some extensions of these ideas currently being investigated for use in the Web.


3.2.1 External Entities and Notations

XML documents have both a logical and a physical structure. The logical structure of an XML document is represented by its elements (delimited by tags). The (abstract) physical structure is represented by a collection of entities. Entities allow the document to be composed of separate physical pieces. An XML document starts with a single top-level entity. The document's DTD may declare additional entities, and, once declared, references to those entities may then be contained in the document's content. A complete logical document is created by starting with the top-level entity, and combining any additional referenced entities.

There are two general types of entities:

Entities may also be either internal (the entity content is defined in the declaration itself) or external (the entity content is located elsewhere). For example, the following entity declaration defines an internal general entity containing the text "Extensible Markup Language" [Meg98]:
<!ENTITY xml "Extensible Markup Language">
Given this definition, a reference to it of the form &xml; can be included anywhere in attribute values or mixed content. When it is processed, the reference will be expanded to the full text "Extensible Markup Language".

Here, our interest is primarily in general external entities. For example, the following entity declaration (from [Meg98]) defines an external entity containing the text of a document:

<!ENTITY chapter1 PUBLIC "-//megginson//TEXT Chapter 1//EN" 
                         "chap01.xml">
This example illustrates the two ways XML provides for associating the entity name chapter1 with the physical objects that store the entity text: public identifiers and system identifiers. A system identifier in XML is a URI which may be used to retrieve the entity. A public identifier is a logical name (formally defined in ISO 8879, the SGML standard), and is intended to be resolved by looking it up in a catalog to find a corresponding URI. The SGML/Open consortium has defined standard syntax for catalogs to be used in this translation from public identifiers to system identifiers. An informal proposal for XCatalogs <http://www.ccil.org/~cowan/XML/XCatalog.html>, a corresponding specification for use with XML, has been discussed on the xml-dev email list. In the example above, the entity declaration specifies both a public identifier -//megginson//TEXT Chapter 1//EN and a system identifier chap01.xml (when a system identifier is specified by itself, it is preceded by the keyword SYSTEM; when both a system identifier and a public identifier are specified, the public identifier comes first, preceded by PUBLIC, and the system identifier follows, not preceded by SYSTEM).

The entities defined above are examples of parsed entities. The XML processor will attempt to parse the contents of the entity at the point where the entity reference appears in the document text. XML also provides a means to declare general unparsed (non-XML) entities, by specifying the entity as NDATA, and referencing a separate NOTATION declaration, as in the following example from [Bry97]:

<!ENTITY fig1 SYSTEM "fig1.eps" NDATA postscript >

where postscript is defined by:

<!NOTATION postscript SYSTEM "eps.bat" >

The NOTATION declaration is used to assign a unique name to the notation, language, or format of the external entity (postscript in this example).

In this case, the system identifier eps.bat is intended to identify a file to be activated to process the non-XML data (this invocation, and the return to the XML parser, happens in a system-defined way). [DeR97], in discussing the use of this convention in SGML, notes that many creators of DTDs believe that the external identifier in a NOTATION declaration should point to a program that can interpret the notation. Many software products for processing SGML support this by calling the specified program when an entity in the given NOTATION is to be presented. However, in principle, the intention is that the pointer should identify a specification of the notation (e.g., its documentation). [DeR97] notes: "There is a sense in which any program that (perfectly) implements a processor for some notation can be thought of as a "definition" for that notation, but that is quite a broad sense, and not in harmony with the examples SGML gives." A convention for providing both pieces of information is to specify a public identifier that points to the specification, and a system identifier that points to a local processor for it. In the Web, the need for this distinction is partially blurred, since providing a complete URL for the system identifier potentially allows the resource to be accessible from anywhere in the Web, hence making the system identifier more portable. However, the use of a public identifier would support the use of alternative sources for the same processor, as well as handle the problem of the processor changing its location (other mechanisms, such as URNs, would also solve this problem).

The general mechanism provided by XML external entities and notations can potentially be used in a number of ways in supporting object construction in the Web, since GIF, Java, JavaScript, etc. are all potentially examples of "notations" that could be used in XML documents. For example, a GIF image could be included by defining a notation and entity such as:

   <!NOTATION GIF89a PUBLIC 
       "-//CompuServe//NOTATION Graphics Interchange Format 89a//EN">
   [...]
   <!ENTITY fig3 SYSTEM "figure3.gif" NDATA GIF89a >
The same approach could be used to define an entity using any other binary notation (including a user-defined notation).

It is important to note that there is a difference between external entities that are declared in a document, and separate documents that may be referenced from that same document (using, e.g., XML's linking facilities), even though both may be identified by URLs. Unlike separate documents, external entities are intended to be parts of the document in which they are referenced, and a validating XML parser will process these entities in determining whether the document has a valid structure. This difference can be significant when delivering XML documents containing external entities to clients which use non-validating XML parsers. A non-validating parser is not required to process external entities (which are declared in the DTD), and hence the client may miss some of the document's content if the client's parser does not process them. A number of suggestions have been made for dealing with this problem, including:

For representing links to separate documents, the W3C XML Linking Language (XLink) specifications <http://www.w3.org/TR/WD-xlink> are being developed. Together with the XML Pointer Language (XPointer) specifications <http://www.w3.org/TR/WD-xptr> for addressing the internal structures of XML documents, XLinks provide more powerful facilities for defining hyperlinks between (and into) XML resources.

An important use of notations in the context of constructing objects would be the incorporation of scripts into XML documents. One approach to doing this would be to define each script as an external entity having a NOTATION of whatever its scripting language was, e.g., JavaScript, as in:

<!NOTATION JavaScript PUBLIC "+//IDN netscape.com//NOTATION Java
Script//EN" >
This NOTATION definition associates the local name JavaScript to the formal specification identified by the public identifier. As noted above, a SYSTEM identifier could also specify a processing engine (interpreter) to process the script data.

Another approach to doing this, which more closely resembles the way scripts are contained in current HTML documents, was described in a posting to the xml-dev email list by W. Eliot Kimber, using the following example:

<?XML version="1.0"?>
<!DOCTYPE MyDoc [
  <!NOTATION JavaScript PUBLIC "+//IDN netscape.com//NOTATION Java
Script//EN" >
<!-- Script element contains JavaScript code: -->
<!ELEMENT Script (#PCDATA)>
<!ATTLIST Script notation NOTATION (JavaScript) JavaScript>
...
]>
<MyDoc>
 <Script>This is a JavaScript script</Script>
</MyDoc>
This approach uses script elements rather than entities. Again, the NOTATION definition associates the local name JavaScript to the formal specification identified by the public identifier. A Script element is then defined to contain the actual scripts. An attribute notation is defined for this element to hold the name of the notation which identifies the specific scripting language used (JavaScript is the only possible value specified in this case). Script elements can then be included in the document to hold whatever scripts are needed (in the example, JavaScript is also defined as the default notation, so that it does not explicitly appear in the example Script element).

By analogy with the way many SGML processors deal with notations, the idea is that when the document is processed, the XML processing application sees that the Script elements are defined as having the JavaScript notation, and knows that the named notation governs the interpretation of these elements. The processor would look up the notation information, extract its external identifier, use it to find the processing program for that notation (a dll, plug in, JavaBean, or other executable; the script interpreter in this case), and pass the element to it. The notation processor would then do its processing, returning whatever the interface between the XML processor and the notation processor requires or allows.

Kimber notes that this approach is essentially what is done in Web browsers to process objects with different MIME types. The only difference is that notation types are not pre-defined or necessarily registered anywhere, but are instead defined in XML documents. A similar approach could be used to associate individual (object) methods with specific elements, provided system identifiers could be specified to point to the programs implementing the methods, and the interfaces between the XML processor and the methods were appropriately defined.

Taking full advantage of this approach in creating objects in the Web would require some additional support. For example, XML processors would have to be implemented so as to support this capability. In addition, XML does not currently require that a notation declaration actually reference a processor. While this is a convention in many SGML applications, it is not required there either. Standard syntax is required to associate an external processor with a specific element. This could be done either by expanding the semantics of notations, or by alternative syntax to explicitly deal with this requirement. This would allow arbitrary processing to be associated with various elements, rather than simply hardwiring support for specific notations into the XML processor (along the lines of HTML browsers' support for GIF). One model for this would be to use XML syntax to express the sort of information used in HTML browsers to assign plug ins or helper applications to specific MIME types. Ideally, this information would be considered a form of metadata, rather than as part of the document content itself.

Another requirement is for well-defined interfaces between element processors and XML processors, and between element processors and the elements passed them. These could be defined along the lines of plug in interfaces, possibly using a DOM-based approach for the element interfaces. In addition, it would be desirable for the XML processor (browser) to cache processing programs, rather than always following a URI specified in the metadata. The processor would only retrieve the processor for the specified element if it did not find a local processor for the element.


3.2.2 Stylesheets

3.2.2.1 Introduction
Unlike HTML, XML per se provides no facilities for defining the presentation aspects of documents (e.g., whether certain text should be in a specific size or color). Instead, the presentation aspects of XML documents are intended to be described using separate stylesheets (and, as a result, stylesheets play a much more important role in XML than they do in HTML). The W3C is currently specifying the Extensible Style Language (XSL) for defining XML stylesheet capabilities. XSL stylesheets enable formatting information to be associated with elements in a source document to produce formatted output. XSL is still very much a work in progress. Until recently, the only XSL specification was a submission <http://www.w3.org/TR/NOTE-XSL> to W3C from Microsoft, Inso Corporation, and others. Microsoft also made an XSL processor available, based on that submission, that provided an idea of some of the facilities that could be expected in XSL processors, together with an XSL Tutorial <http://www.microsoft.com/xml/xsl/> (see also [Car98]). However, recently (August 18, 1998), the W3C published an initial Working Draft <http://www.w3.org/TR/WD-xsl> of the XSL specifications. This Working Draft (WD) is based on the original submission, but differs substantially in a number of important respects. The WD also omits some capabilities included in the original submission, and is explicitly incomplete in some areas. These missing capabilities may be included in later versions of the WD. As a result, some of the discussion below refers to capabilities described in the WD, while other discussion refers to capabilities contained in the original submission, or the Microsoft XSL processor (as illustrating potentially relevant capabilities not yet explicitly supported in the WD).

XSL is based on concepts similar to those defined in the ISO standard Document Style Semantics and Specification Language (DSSSL) [ISO96] used in formatting SGML documents. Following DSSSL, in XSL the conceptual model of formatting an XML document is that of transforming an input tree structure into an output tree structure. The input tree structure is, roughly speaking, the hierarchical element structure of the XML document produced by parsing (in DSSSL this structure is called a grove, which stands for Graphical Representation Of property ValuEs; the DOM defines a similar element structure, and the WD defines a similar conceptual model). Hence, roughly speaking, each node represents an element in the input document, and its properties.

As defined in the WD, the stylesheet defines the formatting of a document through a set of template rules (DSSSL and the submission call them construction rules), which specify how the input tree (element) structure of the document is to be converted to a result tree. XSL can be used for both generalized document transformation, and for formatting. When XSL is used purely for structural transformation, the result tree consists of the same types of nodes as were contained in the original tree. However, these may be filtered and reordered, and additional structure may be added. When XSL is used for formatting, the rules are used to specify a transformation from the input tree to a result tree of special formatting objects (DSSSL and the submission call them flow objects; the WD includes both flow objects and some additional kinds of formatting objects). A formatting object has a class, which represents a kind of formatting task, together with a set of named characteristics, which further specify the formatting to be performed. The XSL WD includes definitions of a standard set of formatting object classes (a subset of those defined by DSSSL), instances of which are used to construct the formatting object tree. These formatting objects are expressed using a formatting vocabulary of special XML tags defined in the WD. In other words, formatting is performed by transforming the original XML structure into another XML structure which uses tags that have predefined formatting semantics. Once the formatting object tree is created, the final formatted output (e.g., the display in a Web browser) is produced by processing the tree of formatting objects, and performing the specified formatting tasks (a task performed, e.g., by a Web browser's "rendering engine"). A conforming XSL processor is required to understand the formatting vocabulary and semantics defined in the WD, and implement those formatting semantics.

A template rule contains a pattern, which identifies specific elements in the source document to which specific formatting is to be applied, and a template, which defines a resulting subtree in the result tree. These rules somewhat resemble the situation-action rules used in rule-based expert systems. The stylesheet processor recursively processes source elements to produce a complete result tree. An example of a simple XSL stylesheet (from the WD) is:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/TR/WD-xsl"
  xmlns:fo="http://www.w3.org/TR/WD-xsl/FO"
  result-ns="fo">
  <xsl:template match="/">
    <fo:page-sequence font-family="serif">
       <xsl:process-children/>
    </fo:page-sequence>
  </xsl:template>
  <xsl:template match="para">
    <fo:block font-size="10pt" space-before="12pt">
       <xsl:process-children/>
    </fo:block>
  </xsl:template>
</xsl:stylesheet>
XSL uses XML namespaces <http://www.w3.org/TR/WD-xml-names> to distinguish elements that are instructions to the XSL processor (prefix xsl:) from elements that specify literal result tree structure (prefix fo:). An XML stylesheet contains an xsl:stylesheet document element. This element contains xsl:template elements that specify template rules. The above stylesheet constructs a result tree for a sequence of para elements. The result-ns="fo" attribute indicates that a tree using formatting objects is being constructed. The rule for the root node specifies the use of a page sequence formatted with any font using serifs. The para elements become block formatting objects which are set in 10 point type with a 12 point space before each block (see the WD for further description of the details of stylesheet specification).

The original submission allowed XSL stylesheets to support scripting using ECMAScript. It is likely that some such capability will ultimately be supported in later versions of the WD as well. Stylesheet scripting is supported by some current Web browsers, and DSSSL provides somewhat similar capabilities using a dialect of the Scheme programming language. A future level of the DOM is intended to define a stylesheet object model, and define functionality for manipulating the style information associated with a document, which would make the stylesheet accessible to a scripting language. These scripts could serve a number of different purposes. For example, within a template (an action, using the terminology of the submission), an eval statement containing an ECMAScript expression could be used to compute generated text (e.g., section numbers) during stylesheet processing. In addition, scripts (and other behavioral elements such as <OBJECT> elements) could be specified that are to be inserted into the document, for execution by the browser at run time.

XSL also intends to support an extensibility mechanism. The WD does not provide any details about this mechanism, but the original submission indicated that this mechanism could, e.g., allow the specification of new flow object classes and their characteristics, by defining:

The mechanism for associating stylesheets with XML documents is not yet officially defined. A W3C note <http://www.w3.org/TR/NOTE-xml-stylesheet> describes one possible mechanism, using an XML processing instruction whose target is xml:stylesheet (this processing instruction would have the same semantics as <LINK REL="stylesheet"> in HTML 4.0. Thus, to associate the XSL stylesheet "mystyle" with a given XML document, the following processing instruction would be included in the document:

<?xml:stylesheet href="mystyle.xsl" type="text/xsl"?>

where the type specifies a stylesheet language (xsl, css), and href is a system identifier such as a file name or URL. Multiple processing instructions can be used to associate multiple stylesheets with a given document in this approach. However, alternative linking mechanisms are also possible, so this mechanism may not be the one ultimately adopted by the W3C. For example, as discussed in [Man98], it would also be possible to use the various PICS/RDF mechanisms, defined for associating ratings and other metadata with Web content, to link stylesheets with documents.

The general model of XSL stylesheet processing is that an XML page is combined with additional associated information (style information in this case) by a specific processor (a "rendering engine" in this case) that uses the additional information to do something with the XML (in this case, producing a display), where this "something" can involve adding behavior. Consideration has frequently been given to generalizing this model to handle additional types of processing. Two ways of generalizing this model can be thought of as either (a) generalizing the "rendering" itself (the process that is applied), or (b) generalizing what the "rendering" produces. These types of generalization are discussed in the next two sections.

3.2.2.2 Stylesheets as a Behavior Attachment Mechanism
As noted above, XSL could allow scripts to be specified that are intended to be part of the document content, i.e., inserted into the document, for execution by the browser, as HTML scripts are now. In the original submission, this could be done by outputting a <SCRIPT> flow object (one of several flow objects specially defined in the submission to allow the production of HTML output) during stylesheet processing. For example, the following XSL rule (from the Microsoft XSL tutorial; note the difference in syntax from that used in the WD) generates a <SCRIPT> element in the HTML document output that allows the user to highlight an element by clicking on the output from that element.
<xsl>
 <rule>
  <root/>
  <HTML>
   <HEAD>
    <SCRIPT LANGUAGE="JSCRIPT"><![CDATA{
     function hiLite(e) {
      if (e.style.backgroundColor != 'yellow')
       e.style.backgroundColor = 'yellow';
      else e.style.backgroundColor = 'white';
    } ]]></SCRIPT>
   </HEAD>
   <BODY>
    <children/>
   </BODY>
  </HTML>
 </rule>

 <rule>
  <target-element type="item"/>
  <DIV id='=tagName + formatNumber(childNumber(this),"1")'
    background-color="yellow"
    onClick='="hiLite("+ tagName + formatNumber(childNumber(this),"1")+")"'>
    <children/>
  </DIV>
 </rule>
</xsl>
This effectively defines a mechanism for associating behavior with XML pages (and specific elements in pages) by using the stylesheet as a repository of object behavior, and dynamically associating it with specific parts of the pages during stylesheet processing. The stylesheet acts as a form of object "class", since the same stylesheet can be associated with multiple pages. This approach involves the use of HTML as an intermediate output format, and so is not "pure XML", but the generation of intermediate HTML may be a common scenario in commercial Web browsers for some time.

Behavior to be included as part of the output document content could also be implemented using languages other than ECMAScript, and techniques other than scripting. For example, such behavior could also be specified by defining stylesheet rules that cause links to, e.g., Java objects (corresponding to HTML <OBJECT> elements), scriptlets, etc. to be included in the output page. Use of these techniques in "pure XML" requires an XML-native target representation for scripts or other behavioral components (e.g., the use of external entities with appropriate NOTATION specifications, as described in Section 3.2.1), together with appropriate processors for XML containing those behavioral components.

A recent Netscape submission to W3C takes this general approach further by specifying the concept of Action Sheets <http://www.w3.org/TR/NOTE-AS>. The submission notes that many current Web pages look more like full-fledged software programs than declarative specifications of document structure, due to the fact that an increasing fraction of the content of these pages consists of blocks of script, most of which run in response to user events. Action sheets attempt to provide a mechanism for consolidating the script-encoded behavior of document elements in a reusable package, separate from the structural definition of a document. The concept is particularly useful for XML since, as noted above, while HTML currently contains a SCRIPT element, and HTML elements may contain attributes that specify event handlers (onClick, onMouseOver, etc.), XML currently mandates no specific way of integrating an external scripting language. In the same way that external stylesheet rules can associate presentation properties with specific XML elements, external action sheet rules can associate arbitrary event handlers with specific XML elements or classes of elements.

An action sheet contains a set of productions (rules) somewhat similar in form to XSL template (formatting) rules. Simplifying somewhat, a rule contains a selector (pattern) which defines the document elements to which the rule applies, and an action, which specifies the script to be run for a given action (e.g., event such as onClick). Action sheets would be associated with XML documents in the same way as stylesheets.

While action sheets are conceptually similar to stylesheets, there are also important differences:

The use of action sheets would make it unnecessary to modify existing DTDs to provide mechanisms for including scripts with documents. Moreover, action sheets allow script-encoded behaviors to be factored and used for elements across multiple documents, similar to the way object classes allow behavior to be shared by multiple objects.

Due to the similarity between stylesheets and action sheets, there is some question as to when authors would use stylesheets and when they would use action sheets. This question will be investigated within the W3C groups working on stylesheet capabilities.

Microsoft has also defined a related technology in Internet Explorer 5.0 called DHTML Behaviors <http://www.microsoft.com/sitebuilder/magazine/ie5behave.asp>. This allows a scriptlet (see Section 3.1.4.2) to be associated with a particular document element using a Cascading Stylesheets (CSS) stylesheet, to provide default behavior for that element. Microsoft has indicated that this technology is being submitted to the W3C. This technology could be extended for use with XSL as well as CSS, since the W3C has indicated an intent to coordinate CSS and XSL properties and objects, with the goal of defining a common underlying formatting model.


3.2.2.3 Stylesheets as an Object Construction Mechanism
An alternative to considering XSL stylesheets as containers for object behavior is to consider them as a way to define a transformation that converts an XML document containing both state and behavior (or containing links to state and behavior) into an arbitrary collection of objects at the client. This is based on the fact that, as described above, XSL stylesheets provide: The idea is that by extending the set of formatting objects that could be referenced as output in construction rules, and by extending how the "rendering" of those formatting objects takes place, the idea of "stylesheet processing" could be generalized to produce almost anything desired. For purposes of object construction, the desired output is an object or set of objects formed from the XML that implement application-specific semantics. The idea, then, is to characterize the construction of a set of application-oriented objects from information in an XML document by defining a set of formatting (or other result) object classes appropriate to the construction of application objects (instead of objects appropriate to the construction of formatted page output), and defining the appropriate transformations. This would integrate object construction into a general document processing architecture.

For this purpose, the XSL would define the overall transformation between the input XML and the result objects, while the combination of the XML and the result object classes would contain the information to be used in the construction (the state and behavior). The construction mechanism could be entirely declarative, using XSL construction rules, or the construction could be augmented by behavioral specifications in the form of scripts and/or result object methods to perform additional construction activities. "Rendering" in this case could be simply the instantiation of the result objects themselves, together with the provision of an API so they could be accessed at run time. A browser could serve as the "host" for these result objects, and provide access to them via either DOM-based or application-specific object interfaces. Alternatively, specialized "rendering engines", referenced by the result objects (and made available to the browser, e.g., as plug-ins) could be invoked to do further processing on the result objects.

This approach is not unrealistic, since one possible form for an XSL processor would consist of a core which implements the tree transformations, together with a number of specialized backend processors for rendering the XSL-defined flow object classes to various media (e.g., one backend for rendering flow objects to a display, another for rendering flow objects to printers, etc.). All that would be required would be for such a processor to provide a way to augment the "flow object classes" it recognized, and the specialized backend processors it called to "render" them.

A step along these lines is a W3C submission from Hewlett-Packard describing Spice <http://www.w3.org/pub/WWW/TR/NOTE-spice>. Spice is a combination of ideas from DSSSL, CSS, and JavaScript. Spice is an extended version of ECMAScript, designed to make it simple to apply style and behavior to XML documents (it can also be applied to HTML). Spice supports cascading style rules using the same syntax as CSS. These rules name the flow object class to be used to format each element. However, Spice supports not only the predefined set of CSS flow objects, but also downloadable sets of extended flow objects which can be written in Spice itself, or in Java or ActiveX. These flow objects can exploit the full capabilities of the Document Object Model in processing the document content. To further control behavior, event handlers can be written to script flow objects. This allows the document to be dynamically altered after it has been loaded. (Spice differs from XSL primarily in using CSS syntax for style rules and properties.) The W3C Spice submission indicates that HP is developing a Spice compiler as a plug-in.

Further along these lines, discussion on the xml-dev email list has mentioned the idea of a general mechanism for declaring which flow objects a given stylesheet is using, and where to acquire a "rendering engine" (processor) for those objects. Specific industry groups or vendors could then define specialized sets of flow objects, and rendering engines for them, which could be used as the basis for complex applications. In this approach, for example:

 


3.3 XML Object Serialization

Section 3.2.2.3 discussed the use of stylesheets as a way to "attach behavior to XML" by generalizing the concept of "rendering" to create application-specific objects from the combination of the XML document and information in the stylesheet. In this section, we describe several other technologies for constructing application-specific objects using XML documents, but which do not involve the use of stylesheets.

These technologies represent another point on the spectrum of techniques for associating behavior with the elements in XML documents, based on element semantics. The simple end is for a separate application to use the XML document as input, and to perform appropriate actions based on the element tags. What might be thought of as a "mid point" in the spectrum is the W3C Document Object Model, which involves the generation of a collection of objects of generic types, with generic behaviors, based on the contents of an XML document. What we are discussing here (and what was discussed in Section 3.2.2.3) are techniques for creating, from an XML document, one or more objects whose types are based on the specific element types contained in the XML document. This can be thought of as specializing the DOM to produce application-specific objects (as discussed in [Man98]). However, creating these specialized objects need not necessarily involve actually generating the DOM objects first, and creating the application-specific objects from them. Instead, the specialized objects could be created directly based on the XML elements.

These ideas can be illustrated by the example of an XML document which represents a poem. The collection of DOM objects representing this document would have roughly the following structure:

document |-> Element |-> Element |-> Element 
                     |           |-> Element 
                     |           |-> Element |-> Element 
                     | 
                     |-> Element |-> Element 
                                 |-> Element 
                                 |-> Element 
                                 |-> Element
The Element entries are generic DOM objects, all of which have the same behavior (although they would each have distinct state, in particular, it would be possible to determine the distinct element type (the tag) of each element.

What we would like to have instead is an object structure where the object types are based on the specific element types in the XML document, and where the resulting objects contain type-specific behavior corresponding to those element types. In this case, the resulting structure might be:

    poem |->     |-> front   |-> title 
                 |           |-> author 
                 |           |-> revision-history |-> item 
                 | 
                 |-> body    |-> stanza 
                             |-> stanza 
                             |-> stanza 
                             |-> stanza
where poem, front, body, title, author, revision-history, and stanza are classes with type-specific behavior. Note that, in this case, the type-specific (poem) structure is in (roughly) one-to-one correspondence with the original XML element structure above. However, in general this need not be the case.

The section heading refers to "object serialization" because these techniques can be considered as ways to use XML as a representation for serialized objects (and object serialization is sometimes cited as one of the motivations for these techniques). In the case of OOXML, the analogy with "object serialization" is very straightforward: the XML constitutes the direct representation of the object, including elements that contain the object's state, and other elements that contain the object's methods. A method interpreter interprets the method code, which refers to the elements containing state when access to state is necessary.

In the case of MONDO and Coins, the XML acts as a sort of serialized representation for information that allows programming language objects to be constructed from it, using an XML parser together with one or more other components (e.g., a "builder", or "constructor" of some kind). However, "object serialization" is just one way of viewing these latter approaches. They can also be considered as alternative implementations of the generalized "rendering" process discussed in the Section 3.2.2.3, in that they start with a document as a "web" of connected state and code, and build objects based on that information. This is also a form of "compilation", in that the process does not interpret the XML as objects directly, but rather builds objects in a different form that are then interpreted by a separate interpreter (whatever can interpret the resulting objects, e.g., the Java Virtual Machine, or the native hardware).


3.3.1 OOXML

OOXML <http://www.digitalairways.com/NiS/ParisXML98/> [Sil98] defines a straightforward way to define objects in XML, by directly adding elements containing method code to XML-represented data. The approach is illustrated by the simple example below, which shows a View method added to a PatientInfo element:
<PatientInfo>
  <FirstName>Donald</FirstName>
  <LastName>Duck</LastName>
  <Gender>Male</Gender>

  <Method>
    <Name>View</Name>
    <Author>NiS</Author>
    <Code>(lambda (this) (print "A Patient Info"))</Code>
    </Method>

</PatientInfo>
As the example illustrates, methods are added simply by using special tags to contain the information associated with the methods. Some of these tags (<Method>, <Code>) are required, so that the method interpreter can find the proper information. However, other metadata (such as <Author>) can also be added. Each XML element is considered an object. Hence, methods are objects too, and can have methods themselves. Lisp is used as the method language.

Method objects can be defined separately from the objects they belong to. This allows object methods to be collected in Stylesheet-like documents to form class-like aggregations of methods. To do this, a <MemberOf> element is added to the XML, as in:

<Method>
  <Name>View</Name>
  <MemberOf>PatientInfo</MemberOf>
  <Author>NiS</Author>
  <Code>(lambda (this) (print "A Patient Info"))</Code>
  </Method>

<PatientInfo>
  <FirstName>Donald</FirstName>
  <LastName>Duck</LastName>
  <Gender>Male</Gender>
</PatientInfo>
Inheritance is supported, using a <isa> data member (element):

In this case, if a called method is not found in the PatientInfo object, the Patient object will be searched to find it.

<PatientInfo>
  <isa>Person</isa>
  <FirstName>Donald</FirstName>
  <LastName>Duck</LastName>
  <Gender>Male</Gender>
</PatientInfo>
As the examples illustrate, the mechanism for attaching methods to XML used in OOXML is rather straightforward. The cited paper does not discuss many of the details required for a complete implementation. For example, the paper does not discuss object interfaces, how the interpreter for the code is invoked, how the methods access the XML (or other aspects of the invocation context), etc. However, given an appropriate interpreter for this representation, working these issues out could also be relatively straightforward. For example, a Lisp method could be given a DOM-like interface to the object of which it is a part. In this case, the implementation would be much like the implementation of the script-based components discussed in Section 3.1.4, with Lisp playing the role of the scripting language.


3.3.2 Coins

The idea behind Coins <http://www.jxml.com/coins/> is the use of coupling between XML elements and JavaBeans as a program construction mechanism. Although Coins originally defined an XML-based object serialization mechanism, Coins is pursuing a more general idea, namely the concept of an interconnected web of XML documents, where the documents themselves define the various activities and states of a distributed processing system. In Coins, a program consists of a collection of "runtime documents", each document having a tree of component instances. The components may reference other components in the same document, or components in other documents. Coins uses basically uses XML both to describe individual components, and how a set of such components should be composed into a complete application.

According to recent information from its developer, Bill la Forge, Coins currently has several parts:

A Mint utility is also provided. This is a programming tool that supports the creation of application-specific coins classes. The input to Mint is an xml document which is loosely based on XSchema (see Section 2.1.4), and describes the coin to be created. The output is the Java source code for a coin class which has the following: After generating a coin class with Mint, a programmer then extends that class (subclasses it), adding application-specific logic and validation requirements. The programmer must then also update the application bindings document to map from a given element tag name to the application specific class.


3.3.3 MONDO

MONDO <http://www.chimu.com/projects/mondo> is a generalized architecture for encoding, modeling, and processing information which integrates object-oriented information modeling and descriptive markup. MONDO describes a set of components (e.g. ObjectBuilder, DomainModel, ObjectEncoder), their responsibilities, and the interfaces among those components. It is meant to be open and language neutral. One of the things MONDO can do is serialize and deserialize Java objects to human readable encodings either in XML, or a slightly different (but similar) markup language called OML (Object Markup Language)

MONDO has three major subsystems: the ObjectBase, the ObjectBuilder, and the ObjectEncoder. An ObjectBase captures the objects needed to represent a particular part of the world in a computer. The representation is divided between the DomainModel, which captures the static properties (e.g., associations and operations--the classes) of the ObjectBase, and the DomainObjects, which capture the current dynamic state (the instances) of the ObjectBase.

MONDO recipes are instructions for building DomainObjects, which together represent knowledge within a computer. Recipes are serialized as XML or OML text files. MONDO creates a recipe from one of these files by parsing it.

The job of the ObjectBuilder is to build an ObjectBase from an external source. Generally this source will be one of these XML or OML files. Using such a file, the ObjectBuilder reads the text file, parses the text to create a recipe (defining what objects to build, and what pieces to use), and uses the recipe to construct objects within the ObjectBase. The job of the ObjectEncoder is to save sufficient information about an ObjectBase to an external repository so that a similar ObjectBase can be recreated later using an ObjectBuilder.

Construction of runtime objects by the ObjectBuilder usually involves the use of Factories that know how to build particular types of objects. A Factory is an object that can build other objects. Factories support a special Builder-to-Factory interface that allows them to be used by the ObjectBuilder to perform specialized object construction. This is done by configuring the ObjectBuilder to use mappings from recipe names to Factories. As the ObjectBuilder processes the recipe, it calls the appropriate Factory.

A straightforward use of MONDO is as an object serialization mechanism. The following is a form of simple MONDO recipe (in OML) that represents a Java class:

<JavaClass
    name        = "COM.chimu.kernel.basicTypes.Period"
    version     = "v0.1"
    vmRequired  = "1.1"
    description = "This is a simple Period which uses 
                   java.util.Dates as its start and end values"
    bytecodes   = <Binary encoding=hex [[cafebabe...2000a]]>
>
Due to the use of explicit markup, this is readable to both Java and non-Java systems. A non-Java system may not understand the bytecodes, but it can understand everything else and usefully work with that information. Other declarations could be used to associate this class with a particular element type as its implementation. Other forms of implementation (e.g., a DLL) could also be represented in this way. A mapping between OML and XML has been defined; this is relatively straightforward, as the above example suggests.


3.4 XML as a Programming Language

Sections 3.1 and 3.2 discussed how behavior could be associated with HTML and XML pages using scripts written in various text-based scripting languages, such as ECMAScript, which are processed by an interpreter (e.g., in a Web browser) to produce runtime behavior. Another approach to defining behavioral specifications using a textual representation has also been investigated, namely using specialized (XML or other) markup to directly specify constructs in some programming language, then compiling and executing, or interpreting, that markup to produce runtime behavior. This section describes examples of this approach.

An issue, of course, is why XML should be used as the language with which to express programming language statements, as opposed to using a specialized language, such as a scripting language, contained as data in specialized tags (e.g., in <SCRIPT> tags), which references XML as data. An advantage that has been cited for using XML directly is that it is then possible to mix arbitrary amounts of code, comments, metadata, pragmas, etc. (e.g., safety and progress properties, preconditions and postconditions of method calls, stability properties of collections, temporal properties of dependencies) in an inherently extensible way. As the need/ability to use this extra information changes, it could be either considered or excluded from consideration by an interpreter in actually producing an execution. This approach could lead to highly reusable and adaptable behavioral specifications. The approach also represents a move toward a more declarative programming style. Of course, one could also define other types of behavioral specifications, such as state machines or workflows, in XML (and, with an interpreter, execute them if the specifications are detailed enough, or if the definitions only specify control among already-specified executable components).

Similar extensibility ideas apply to the use of XML as a representation for interfaces. Unlike a conventional IDL, the corresponding XML markup would be easy to extend. For example, if a given interface definition language represented by XML markup did not initially include the events raised by an object as part of the interface definition, it would be easy to add them by adding information in a new set of tags. These ideas are further explored in [Pre97]. This paper points out that, using XML for interface descriptions, "additional architectural constraints may be provided which currently are not enforced by any programming language", including such things as protocols (permissible sequences of method invocations) and design patterns.


3.4.1 Meta-HTML

Meta-HTML <http://www.metahtml.com/> is an example of how specialized tags can be used to represent programming constructs. The basic idea behind Meta-HTML is to consider ordinary HTML as a scripting language meant to be interpreted by an interpreter called a "browser". In this metaphor, the HTML tags define actions that specify and control output, receive input from the user, etc. Meta-HTML extends this metaphor by adding tags that specify iteration, data structures, flow control, etc. Meta-HTML tags are treated as functions, with attributes specifying arguments and return results. The intended use is to ease the writing of programs which produce HTML as their primary output format, by mixing control statements within native HTML statements. The Meta-HTML interpreter takes the mixture of statements, interprets non-HTML statements, and passes HTML to the browser to be processed in the normal way.

Meta-HTML includes statements (tags) for, among other things:

For example, in:
<set-var foo="Hello, world!">
<get-var foo>
the <set-var> function returns an empty string as its value (and hence does not affect what is displayed on the containing page), but as a side-effect assigns the string value to the variable foo. The <get-var> function referencing that variable returns the value of that variable to the output (the HTML displayed on the page), with the result that the page displays the string "Hello, world!".

Extending this example, the following "program" illustrates the representation of control flow statements in displaying "Hello, world!" in bold 5 times:

<set-var repeating-string="<strong>Hello, <em>world</em>!</strong>"
         count=5>
<while <gt count 0>>
  <get-var repeating-string>
  <br>
  <set-var count=<sub count 1>>
</while>

3.4.2 Curl

Curl <http://curl.lcs.mit.edu/curl/> is a language for creating Web documents with content ranging from simple formatted text to complex interactive applets. Curl provides a rich set of formatting operations similar to those implemented by HTML tags. However, unlike HTML, the Curl formatter can also be extended by users to provide additional functionality, from simple macros to direct control over the positioning of subcomponents. Using a Tk-like interface toolkit of interactive components, Curl makes it easy to build simple interactive web pages. Interactive objects like buttons or editable fields can be viewed as extensions to the basic formatting operations; the same syntax is used to create interactive documents, without the need to learn a separate scripting language.

Other components of an interactive document may require more sophisticated mechanisms than are provided by Curl's interface toolkit. These components can also be developed using Curl, since Curl is fundamentally an object-oriented programming language. Curl expressions, class definitions, and procedure definitions embedded in a Web document are compiled to native code by a built-in on-the-fly compiler, and then executed without the need for an interpreter. Curl provides many of the features of a modern object-oriented programming language, including multiple inheritance, extensible syntax, a strong type system that includes a dynamic "any" type, safe execution through encapsulation of user code, and extensive checking performed both at compile and run time. By using a simple, uniform syntax and semantics, Curl avoids the discontinuities experienced when having to combine HTML, JavaScript, Java, Perl, etc. to create complex Web functionality.

Curl resembles HTML because it can be used as a declarative language for text formatting. Arbitrary text is a valid Curl program, as in HTML, but markup is done using a Lisp-like prefix notation based on curly braces, i.e.,

{example {paragraph This text is {bold bold}.}}
The meaning of an expression in curly braces is determined by the value of the first symbol, bold in the example, which is obtained from the lexical environment in which the code appears. The Curl language processor looks this symbol up in the environment and locates a processing routine for that symbol. Several different kinds of processing routines are frequently used.

If the symbol is defined to be a procedure, then the expressions that follow are parsed as variable names and constants which are passed by value to the procedure, as in an ordinary programming language.

If the symbol is defined to be a form, then the text until the closing curly bracket is parsed into a sequence of strings and values determined by how many curly brace markups are found. In the above example the processing routine for paragraph would receive a vector of three values:

{enumerate
{paragraph "This text is "}
{paragraph <whatever bold returns>}
{paragraph "."}
}
Because a goal of Curl is to elevate the status of I/O compared to traditional languages, all Curl expressions are displayable. A file of Curl code has an implicit paragraph around it. The expression
{code The value is {+ 4 5}}
will display as
The value is 9.
Curl allows (almost) arbitrary combinations of text and program fragments. Semantically, Curl program fragments represent a statically typed, object-oriented language with semantics similar to C++.

Curl documentation notes that an equivalent could be obtained in XML by replacing Curl curly braces with XML markup, e.g., replacing

{bold Curl}
or
{+ 4 5}
by
<bold>Curl</bold>
or
<add>
   <arg>4</arg>
   <arg>5</arg?
</add>
or by using XML attributes, as in
<add arg1=4 arg2=5/>
The basic idea is the same: an "application" in XML terms (an external interpreter) processes the markup, and returns a display. In some cases, the interpreter interprets the markup as simple text; in other cases, as instructions to perform computations.


3.5 Object Interface and Messaging Technologies

Previous sections have discussed XML and other Web technologies that primarily have to do with supporting the state and behavioral aspects of objects (although some of the technologies, such as Scriptlets, support the creation of complete objects, including their interfaces). This section describes some technologies that illustrate the use of Web-native technologies as a way to support other aspects of distributed object systems, specifically interface definition and inter-object messaging. The first two technologies, WebBroker and WIDL, use HTTP as their messaging protocol, and XML both as a way to describe object interfaces and as a representation for object request and response messages. The third technology, HTTP-NG, does things the other way around. Instead of building a distributed object system on top of the Web, HTTP-NG builds a distributed object system under the Web, and converts the current Web to an application of that distributed object system.


3.5.1 WebBroker

DataChannel's WebBroker <http://xml.datachannel.com/WebBroker/> represents an attempt to build a complete Web-native distributed object computing model, based on the use of XML and HTTP. The technology has been submitted to the W3C <http://www.w3.org/TR/1998/NOTE-webbroker>.

WebBroker defines DTDs for XML documents that represent serialized object method call and return messages between software component objects. A calling component sends an objectMethodRequest to another component, and receives an objectMethodResponse in return. WebBroker also uses XML to represent interface definitions for these objects.

In WebBroker, software components become URL-addressable HTTP resources. The Web client contains a Java applet which acts as a client-side broker for remote requests generated by local Java applets. This applet generates XML request messages from these requests and sends them to a server using the HTTP POST method. Request messages include a callback URL to identify the client. A Java servlet on the server formats the XML request into a call to the appropriate server resource. When the response is ready, it is formatted into an XML response message and sent back to the client using an HTTP POST method to the callback URL. The client contains an httpd server (a local HTTP server). This client-side httpd server accepts this response and passes it to the client-side Java applet. The intent is to be able to deal with both COM+ and CORBA objects using this approach.

The following (from the W3C submission) shows an example request message. This particular request is equivalent to a DCE RPC Request PDU.

POST /WebBroker/Application7/Class28/Instance980223 HTTP/1.0
Content-Length: 12345
Accept: text/x-WebBroker

<objectMethodRequest version="0.9">
  <baroque>
   <logicalThread causalityID="12345678-1234-1234-123456789ABC" />
  </baroque>

  <methodCall methodName="isSomethingInStateX">
     <int>-1234</int>
     <stringArray length="2">
        <string>blah blah</string>
        <string>and further more...</string>
     </stringArray>
  </methodCall>
</objectMethodRequest>
The <baroque> element contains information corresponding to the information in an HTML <HEAD>. In this case, the element contains information from a DCOM Object RPC protocol header. The value of the causalityID is a UUID. This identifies the thread that issued the request (so that the response can be associated with the proper thread when it is returned). <methodCall> contains the serialized input parameters which are being sent to the destination component over the Web. The methodName specifies the name of the method being called. Currently, the individual parameters are not individually named. Instead, the proxy and stub generated from the interface definition must marshal data correctly.

The example also illustrates the HTTP POST method used to send such a request to an objectReference. The object reference is a reference to a software component object somewhere on the Web, and is basically a URI (usually a URL), together with a type definition.

A objectMethodResponse message contains either the marshalled output parameters or an exception. XML-Data may be used in future versions to more strongly type the request and response document structures.

The W3C submission describing WebBroker notes several advantages to using XML in a distributed object architecture. For one thing, by using XML, both the CORBA Interface Repository and the COM+ type information in the Windows Registry could be redefined as a collection of interlinked XML documents available on a Web server, eliminating an unnecessary distinction between this metadata and other information. Another advantage is that currently both the (Windows) TypeLib APIs and the CORBA Interface Repository APIs define a very granular interface to information about software components. As a result, remotely accessing information via these APIs can require multiple calls, and considerable overhead. Recasting this information as structured XML documents would allow the Repository server to provide the same information in a single round trip. Using XML could also reduce the amount of code needed in lightweight Web clients to handle object messaging, since they will probably be able to process XML already, and it eliminates the need for extra code to support DCOM or CORBA syntax.

The submission also notes that WebBroker's basic interface definition ideas come from [Pre97], also cited in Section 3.4. The paper discusses construction of code generators to build interfaces to objects in various languages using SGML interface descriptions. The paper also discusses the association of methods with SGML documents (using SGML to define a scripting language along the lines described in Section 3.4).

WebBroker essentially provides some of the basic facilities found in OMG's CORBA technology, but using Web protocols and data structures. Specifically, it illustrates

Like CORBA, in WebBroker the objects "stay put"; i.e., messages are sent to remote objects, rather than moving HTML (or XML) to the client as in conventional Web processing. Also like CORBA, WebBroker does not primarily deal with object implementations, but rather with defining their interfaces and supporting their messaging requirements.

UserLand Software has developed a similar technology, called XML-RPC <http://www.scripting.com/frontier5/xml/code/rpc.html>, for using XML messages and the HTTP POST method as the basis of remote procedure calls, as part of its Frontier 5 Web content development and management environment. In addition, Microsoft is developing a related protocol, called the Simple Object Access Protocol (SOAP), together with UserLand Software and DevelopMentor [InfoWorld, July 13, 1998]. The cited article states: "According to those who saw early demos, SOAP bridges Component Object Model (COM) and Distributed COM objects across the Web and runs natively in Windows NT, Windows 95, and Windows 98. Microsoft has also built SOAP connections to Internet Explorer and to Java, sources said." DataChannel is said to have expressed interest in working with Microsoft and the other vendors on a single XML-based RPC protocol. The idea is that development of a simple XML-based RPC protocol could create the basis of a widely-available "universal ORB" capable of interacting with objects in a wide range of different object models.


3.5.2 WIDL

The Web Interface Definition Language (WIDL) is commercial technology from webMethods, Inc. It is described in a submission to W3C <http://www.w3.org/TR/NOTE-widl>, as well as in [KR97]. WIDL is an application of XML which allows interactions with Web servers to be defined as functional interfaces. These interfaces can be accessed by remote systems using standard Web protocols, and provides the structure necessary for generating client code in languages such as Java, C/C++, COBOL, and Visual Basic. WIDL was described in [Man98]; a subset of that material is included below for completeness.

WIDL allows programmatic interfaces to be defined and managed for Web resources such as:

WIDL definitions provide a mapping between such Web resources and applications written in conventional programming languages such as C/C++, COBOL, Visual Basic, Java, JavaScript, etc., enabling automatic and structured Web access by compatible client programs, including mainstream business applications, desktop applications, applets, Web agents, and server-side Web programs (CGI, etc.). Using WIDL, programs can request Web data and services by making local calls to functions which encapsulate standard Web access protocols and utilize WIDL definitions to provide naming services, change management, error handling, condition processing and intelligent data binding. A browser is not required to drive Web applications. WIDL requires only that target systems be Web-enabled (there are numerous commercial products which allow existing systems to be Web-enabled).

A service defined by WIDL is equivalent to a function call in standard programming languages. At the highest level, WIDL files describe the locations (URLs) of services, input parameters to be submitted (via Get or Post methods) to each service, conditions for successful processing, and output parameters to be returned by each service. In much the same way that DCE or CORBA IDL is used to generate code fragments, or 'stubs', to be included in application development projects, WIDL provides the structure necessary for generating client code in languages such as C/C++, Java, COBOL, and Visual Basic. WIDL also provides XML and HTML parsing and pattern matching facilities to identify and extract specific data elements from Web documents.

The following example illustrates the use of WIDL to define a package tracking service for generic Shipping.

<WIDL NAME="genericShipping" TEMPLATE="Shipping"
      BASEURL="http://www.shipping.com" VERSION="2.0">

<SERVICE NAME="TrackPackage" METHOD="Get" 
         URL="/cgi-bin/track_package"
         INPUT="TrackInput" OUTPUT="TrackOutput" />

<BINDING NAME="TrackInput" TYPE="INPUT">
   <VARIABLE NAME="TrackingNum" TYPE="String" FORMNAME="trk_num" />
   <VARIABLE NAME="DestCountry" TYPE="String" FORMNAME="dest_cntry" />
   <VARIABLE NAME="ShipDate" TYPE="String" FORMNAME="ship_date" />
</BINDING>

<BINDING NAME="TrackOutput" TYPE="OUTPUT">
   <CONDITION TYPE="Failure" REFERENCE="doc.title[0].text" 
              MATCH="Warning Form" REASONREF="doc.p[0].text" />
   <CONDITION TYPE="Success" REFERENCE="doc.title[0].text" 
              MATCH="Foobar Airbill:*" REASONREF="doc.p[1].value" />
   <VARIABLE NAME="disposition" TYPE="String" REFERENCE="doc.h[3].value" />
   <VARIABLE NAME="deliveredOn" TYPE="String" REFERENCE="doc.h[5].value" />
   <VARIABLE NAME="deliveredTo" TYPE="String" REFERENCE="doc.h[7].value" />
</BINDING>

</WIDL>
In this example, the values defined in the TrackInput binding are passed via an HTTP Get message as name-value pairs to a service at http://www.shipping.com/cgi-bin/track_package. Object References such as doc.title[0].text are used in the TrackOutput binding to a) check for successful completion of the service, and b) extract data elements from the document returned by the HTTP request.

Like WebBroker, WIDL essentially provides a CORBA-like technology, but using Web protocols and data structures.


3.5.3 HTTP-NG

The goal of the W3C's HTTP-NG project <http://www.w3.org/Protocols/HTTP-NG/> is "to design, implement, and test a new architecture for the HTTP protocol based on a simple, extensible distributed object-oriented model." HTTP-NG represents a longer-term solution to the Web's expansion to include more general distributed applications, based on the idea that layering these applications on top of the current HTTP will result in problems due to unnecessary performance costs, and lack of functionality and generality. By moving the Web to a generic distributed object system as a base, the HTTP-NG project hopes to enable these applications to use this generic distributed object system directly. In particular, the project would like the generic distributed object system to be simple, yet rich enough to meet the semantic and performance requirements of CORBA, DCOM, and Java RMI (without, however, unifying the object models of CORBA, DCOM, and Java RMI).

Specific issues being addressed by HTTP-NG include:

The activity involves two working groups. The Web Characterization and Testing working group is characterizing the kinds of tasks actually performed using HTTP, and the kinds of applications that are being deployed using it now and in the future. The Protocol Design and Prototyping working group is essentially testing the hypothesis: "Can a generic distributed object system be used as the foundation of the Web?" The group is working at three layers: At the transport layer, the group has defined a MUX layer over TCP/IP to support multiple connections over a single TCP/IP connection, and bi-directional use of the connection for callbacks.

The document HTTP-NG Architectural Model <http://www.w3.org/TR/WD-HTTP-NG-architecture> describes the concepts, terminology, and a type system (object model) for HTTP-NG. The object model defined is relatively straightforward, supporting classes, interfaces described by an interface definition language, and interface inheritance, together with distributed garbage collection.

At the RPC layer, the group has defined a simple messaging model, together with an efficient binary wire protocol. This is described in the document HTTP-NG Binary Wire Protocol <http://www.w3.org/TR/WD-HTTP-NG-wire>.

At the Web interface layer, The document HTTP-NG: Web Interfaces <http://info.internet.isi.edu/in-drafts/files/draft-larner-nginterfaces-00.txt> describes a set of formal (object) interfaces (in terms of the HTTP-NG object model) that captures current Web functionality. The Architectural Model document also describes how other applications, such as WebDAV (distributed authoring), could be supported using this approach. WebDAV is seen as particularly interesting, in that it changes the nature of the Web from one where clients primarily read resources to one where clients also author them, and entails such issues as managing multiple simultaneous users trying to read and write the documents.

Success of HTTP-NG would mean a more complete insertion of object technology at the heart of the Web. One consequence of this would be the full integration of object interface concepts into the Web architecture, along with the (more or less existing) object messaging nature of the protocols. This would allow representation and implementation changes to take place behind these interfaces, and allow incorporation of other distributed object ideas more easily. A simple example of the benefits of including interfaces is the effect of defining DOM interfaces instead of just dealing directly with the HTML (or XML) representation. Concepts such as building documents on the fly, using non-HTML implementations/representations behind the interfaces, and supporting and accessing DOM objects on the server then become much more feasible. HTTP-NG would also provide a uniform object-oriented messaging protocol, on top of which arbitrary distributed object applications could be constructed (instead of mapping them into HTTP, as in WebBroker and WIDL), together with additional protocol-level efficiencies. Finally, HTTP-NG provides the potential for the Web to more efficiently (and directly) support higher-level integrations of Web and object technologies, such as the formation of objects using separate Web state and behavioral resources.


4. Using These Technologies

4.1 Applications for Web Object Models

In a general sense, we have been considering an object in a Web object model as being some unit combining state and behavior in the context of the Web, without being too specific about the nature of such units, and it is important to consider reasons for constructing these combinations. The addition of behavior to Web pages for at least some purposes needs little justification. The original (and still the most usual) behavior applied to Web pages was to display them. However, to increase the utility of Web data, people began to want to apply processing to that data, for the same reasons that people apply computer processing to any other kinds of data. Moreover, it was found most desirable to distribute the processing behavior along with the data, as a unit, largely for the same reasons that object technology has been found desirable in other applications: it guarantees that the intended processing semantics are applied to the data.

As a result, people have increasingly been adding behavior to Web pages for a wide range of purposes. Spicing up the presentations is one reason for this. However, increasingly people want to perform serious data processing on this data (perform calculations, validate entries, and other manipulations). In addition, people want to deliver services, not just data, e.g., order entry services, travel services, etc., involving access to data and operations potentially located remotely. For various well-known reasons, some of these services benefit from having at least some of their processing performed at the client side, e.g.:

In a conventional client-server architecture this would require permanent storage of the necessary behavior at the client. However, it is desirable to allow as-required delivery of behavior as well as data, and provide a convenient means to access that behavior. Java objects by themselves provide this type of capability; however, so do mechanisms for attaching behavior to Web pages, including both Java (again) and scripting.

A typical object-oriented 3-tier architecture used in enterprise computing applications is shown below. It uses objects at the top two (client and application server) levels, and a (typically relational) database at the bottom level. Relationships between object shells in the top tiers and database state is maintained by primary keys. Objects on the client are used for presentation purposes, while objects on the application server define business logic.

+---------------+               +------------+   +----+
| normal client |---------------| App Server |---| DB |
+---------------+               +------------+   +----+
                                      A
[Man98] noted that a straightforward form of Web/object integration is the use of Web pages to deliver objects to client processes. Orfali [OH98] shows how this would be done in a 3-tier client/server architecture which also uses CORBA technology. Orfali's main application example is a travel reservation system ("Club Med"). It is a typical 3-tier client/server architecture, in which a Web client is used to provide the client presentation. The client presentation (under the control of a Java Bean) provides multiple subforms (under control of individual Beans). Application logic resides on the 2nd tier, in the form of CORBA server objects, with a relational database on the 3rd tier. The server objects access information on the relational database as needed to perform application processing, and store it persistently in the database. The purpose of the Beans on the client is to permit client-side processing of various activities such as display of options, validation of input data, and running cost calculations, until the transaction to make the reservation is submitted.

Key reasons cited for using Beans and CORBA are:

The Web is increasingly being substituted for, or added to, the client tier in enterprise architectures to, e.g., provide generalized interfaces to enterprise applications to end users. This can be done using either the approach described in [OH98], or by providing access from the Web server to existing applications via the CGI interface (this is often used when legacy systems are involved), as shown below.
                                      W
+------------+                  +------------+  +-----------+
| Web Client |------------------| Web Server |--| Web pages |
+------------+                  +------------+  +-----------+
                                      | CGI
                                      V
+---------------+               +------------+   +----+
| normal client |---------------| App Server |---| DB |
+---------------+               +------------+   +----+
                                      A
In supporting this type of integration, there is a tendency to merge the Web server (W) and the application server (A) in some way (either loosely or tightly). Oracle's Web Application Server <http://www.oracle.com/> is an example of this approach.

There is certainly a need for these types of architectures, and they are increasingly being used. The integration of the Web and CORBA distributed object systems, as described in [OH98], is particularly important, since it would be undesirable to have to unnecessarily reinvent such technology as CORBAservices (e.g., transactions) and Business Objects in the Web. However, these approaches integrate the Web and objects at a relatively shallow level.

For one thing, these types of architectures, and their associated data representations, do nothing to help integrate enterprise information captured in (typically relational) databases with either the vast amount of data on the Web, or with other forms of enterprise or Internet-accessible data. Generally, many people approaching the Web from an enterprise perspective tend to ignore the Web as a source for operational data (and Web-related representations as data representations for that data), considering the Web as only a presentation mechanism (either as a way to access operational data in enterprise databases, or to provide presentation-only data such as catalogs).

A deeper integration of Web and object concepts provided by a Web object model would allow us to approach the idea of using the Web to integrate all computer accessible data, and make it widely available to anyone needing it. The Web has already shown a considerable amount of success in moving in this direction already, integrating not only database data, but also structured documents, email, images, etc. XML provides an increased capability for accomplishing this integration, and its use (or proposed use) in supporting an increasing range of data types and applications supports this idea. A Web object model based on XML plus associated behavior would provide the basis for a general object and data interoperability mechanism, combining the Web as a universal distributed database with an object service infrastructure. The analogy of the Web as a database suggests the importance of dealing with WebDAV-like issues, i.e., the ability to write and update Web data as freely as it is read today. The need for the transaction, versioning, and security mechanisms to support this is a major reason for wanting to integrate an object service infrastructure with the Web.

Such a Web object model would also provide a more flexible means of coupling behavior with that Web data than simply reading the data into a JavaBean or other conventional object when necessary (although this capability is needed too). This is desirable because it would allow us to treat Web behavioral content on the same par with Web data content, in terms of its ease of creation, ease of access (and movement), and ease of composition, both with state and with other behavior. This requires the flexibility of being able to view state and behavior as separate pieces of content, but with a mechanism available for associating them as required. The use of Web technologies such as XML and scripting, together with components such as JavaBeans or script-based components, would allow the creation (and composition) of behavioral content by a wider variety of Web content creators and business application developers, with less need to use conventional programming. Web technologies also allow the required behavior to be delivered via the Web with the data, from virtually any source.

Finally, the use of objects in Web object models, such as coupled Web pages and behavior, should be pervasive throughout the architecture, rather than being restricted to client-side presentation aspects. As noted above, today's typical 3-tier Web architectures separate presentation (client interface), business logic, and data levels. Logically, this is perfectly reasonable. However, the same architectures emphasize separate technologies at each level:

This use of separate technologies at the various levels prevents flexible movement of functions between the various levels if this becomes necessary or useful. For example, it eliminates the possibility of downloading, in addition to just the client-side functionality, objects representing some or all of the server-side (or middle tier) functionality to a single machine, for use in disconnected operations.

The use of a general Web object model concept throughout the Web would provide for more flexible operations. This flexibility requires two basic capabilities:

All this can be provided with flexible capabilities for creating objects in Web object models, together with the connectivity and delivery capabilities provided by the Web. Note that this does not necessarily require that the data and behavior be distributed together (although they could be). Rather, the behavior and data may be distributed at different times, and at different frequencies, provided that they can be associated as necessary. For example, behavior used with multiple units of data (e.g., pages) could be cached where it is most frequently required, much as classes are essentially "cached" at the client in typical object DBMS architectures.


4.2 Constructing Objects in "Real" Object Models

Many of the technologies described in Section 3 support the idea of adding behavior to Web pages in a general sense. Moreover, we can reasonably consider Web pages with attached behavior as forms of "weak objects", because they have identity (URLs), behavior, and state. These have shown themselves to be extremely useful in numerous applications. However, there may be additional advantages to be gained by forming "complete" Web objects, e.g., complete with interfaces, and defined according to some "real" object model such as JavaScript, Java, or CORBA IDL.

For example, ordinary Web pages with attached behavior are not internally very modular (although they could be created in a more modular fashion, e.g., locating scripts together on a single page). Providing "real" objects complete with interfaces using Web technologies could assist in addressing (in some instances) the modularity of data and code inside the page, together with coupling and interface issues. This is one of the reasons cited for the development of the script components described in Section 3.1.4. The DOM illustrates some of the advantages of being able to break up a Web page into multiple objects, and further developments of this technology could provide better support for the definition of application-specific, rather than generic, clusters of objects based on data contained in Web pages.

The use of interfaces has other advantages. Ordinary Web pages have no interfaces allowing their behavior to be directly called from other objects (e.g., remotely). The ability to form objects (units to which you send messages to invoke services) with interfaces would improve flexibility, by making it more transparent whether code is moved to the data, data to the code, message to a remote object, etc. In addition, the use of interfaces provides increased implementation independence (as in ordinary object technology) because "clients" become dependent only on the interfaces, not on how the functionality defined at those interfaces is implemented. For example, as noted earlier, a potential advantage of the DOM in providing a way to view a document as a collection of objects is that it allows multiple representations to be used in implementing a document. This is not to say that standard object interfaces should replace standard representations such as XML in the Web. Rather, standard interfaces and standard representations can both play useful roles in distributed computing architectures, in supporting interoperability and implementation flexibility. This is illustrated by the fact that not only are object interfaces being defined in Web technologies such as DOM, but XML is also being proposed for use in a number of recent OMG specifications. For example:

On a more global scale, a Web object model (or several interoperable Web object models) could be used as the basis of a unifying object model for the Web, along the lines of a global (or canonical) data model in a heterogeneous DBMS [SL90], allowing the whole Web to be viewed as objects in this model. Certainly the DOM suggests that this might be in some sense feasible, since it allows arbitrary Web documents to be viewed as sets of objects in a single model (and the DOM interfaces have been mapped to CORBA IDL and Java, two "conventional" object models). Moreover, HTTP-NG is pursuing the line of a single underlying object-oriented infrastructure for the Web (although this work is actually working only at the infrastructure level, and does not propose unifying all object models).

The Java, CORBA IDL, and JavaScript (ECMAScript) object models would be obvious candidates for such a global model, if this idea were pursued. Due to its flexibility, the JavaScript object model might match the requirements for an overall Web object model better than than the "stricter" object models of conventional object-oriented programming languages. Certainly DHTML (and other technologies described in this report) illustrate the flexibility with which JavaScript can be used in associating Web state and behavior, and in pulling different "pieces" together (the individual pieces could be conventional programming language objects).

Whether this line is pursued or not, the technologies in Section 3 show that constructing objects in "real" object models from Web "pieces" is feasible. For example, JavaScript Beans (JSBs) and Scriptlets (Section 3.1.4) illustrated how objects in models conventionally used for programming language objects (Java and COM, respectively) can be defined using specialized XML markup and scripting as the representation for the object state, methods, and interfaces, and how these XML-based objects can become parts of more general object architectures. With the appropriate interpreters (or compilers), the approaches used in these technologies could also be used to directly construct objects in other object models, such as CORBA IDL. A further generalization of this approach could also support objects implemented using more general XML structures than the relatively simple ones allowed in JSBs and Scriptlets.

However, while it might be somewhat easier for objects to interoperate if there were one global model, current bridge and other interoperability mechanisms might well make this unnecessary. For example, LiveConnect (3.1.3) demonstrated support for interoperability between Java and JavaScript objects, and also showed how what are effectively JSB or JavaScript "wrapper" objects could be used to provide interoperability with CORBA objects). WebBroker and WIDL (3.5) also illustrated basic wrappering concepts that could be used for objects in different object models. The sort of interoperability provided by these mechanisms is being increasingly used in the Web.

In the near term, there is no need to insist on a single implementation technique for the various "pieces" of Web object models, or even a single object model (although this could certainly be done in the context of a single organization's implementation strategy). This "narrowing down" should probably await further Web technical development, and the market technology "sorting out" process. What is necessary is that the various types of objects constructed using Web technologies must be able to flexibly interoperate, both with themselves and with conventional programming language objects. The technologies described in Section 3 illustrate increasing support for the type of interoperability required.


5. Conclusions

On the basis of the technologies described in this report, we can draw a number of general conclusions about Web object model construction.

There are a number of different ways to construct objects from Web components, and relevant technology exists to address the various "pieces" of the problem. Moreover, there is considerable development taking place along these lines, and relevant technology is being developed at a very rapid pace.

In particular, there are a number of pieces of Web technology with somewhat overlapping capabilities, and whose relationships with each other are not yet entirely clear, for example, XML DTDs, RDF, XML-Data, and XSchema as metadata representations, and embedded applets and scripts, stylesheets, "action sheets", etc. as behavior representation and attachment mechanisms.

In many cases these technologies represent separate lines of development, addressing only some of the required "pieces". However, the gaps between originally-separate lines of development are gradually being identified and filled in, ways of using them together are being developed, and additional organization is gradually being imposed on their use. At the same time, though, in many cases it is not yet clear what technology(ies) will "win".

The emphasis on "loose binding" and ease of content creation shown in SGML, XML, and the Web in general is important. The associated capabilities need to be considered carefully in integrating the Web and more conventional distributed object technologies such as CORBA.

Considering objects in terms of their various "pieces", as discussed in Section 2, provides a useful basis for considering more general object construction strategies. It provides a better picture of "objects" in the context of the Web, and is helpful in illustrating additional design options.

Technologies exist for constructing objects in "real" object models using Web technology. However, there is no need, at least in the near term, to insist on a single implementation strategy or object model for objects in the Web. Among other things, this could overly-constrain development of new technology. Interoperability is the real requirement, and mechanisms exist to deal with many of the interoperability requirements that exist. However, further integration and rationalization of the numerous technologies that exist for dealing with the various "pieces" would ease the task of actually creating such objects, by simplifying design, and reducing the need to spread development over multiple technologies.


Acknowledgements: The author would like to express his thanks to the members of the OBJS team for discussions on various aspects of this report, and to participants in the xml-dev email list for enlightening discussions on various aspects of XML-related technology.


References

[App97] Apple Computer, Inc., Getting Started With WebObjects, [7009.02], 1997.

[Bos97] J. Bosak, XML, Java, and the Future of the Web, <http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm>, 1997.

[Bry97] M. Bryan, SGML and HTML Explained, Addison Wesley Longman, 1997.

[Car98] D. Carlson, "Document Objects with Style", Object Magazine, 7(12): 14-15, February 1998.

[deB98] M. de Bruijn, "Internet Explorer 5.0--for Intranets Only?", WEBBuilder, 3(9), Sept. 1998, 25-28.

[DD94] S. J. DeRose and D. G. Durand, Making Hypermedia Work: A User's Guide to HyTime, Kluwer, 1994.

[DeR97] S. J. DeRose, The SGML FAQ Book, Kluwer, 1997.

[ECMA97] ECMAScript: A general-purpose, cross-platform programming language, Standard ECMA-262, European Computer Manufacturers Association, June 1997. Available via anonymous ftp from ftp.ecma.ch, library ECMA-ST, files E262-DOC.EXE or E262-PDF.PDF. See also http://www.ecma.ch.

[Esp98a] D. Esposito, "Server Scriptlets", Microsoft Interactive Developer, May 1998, 12-21.

[Esp98b] D. Esposito, Instant DHTML Scriptlets, Wrox Press Ltd., 1998.

[Fla97] D. Flanagan, JavaScript: The Definitive Guide, 2nd. Edition, O'Reilly, 1997.

[Hal98] Marty Hall, Core Web Programming, Prentice Hall, 1998.

[Har98] E. R. Harold, XML: Extensible Markup Language, IDG Books, 1998.

[Hol98] S. Holzner, XML Complete, McGraw-Hill, 1998.

[Isa97] S. Isaacs, Inside Dynamic HTML, Microsoft Press, 1997.

[ISO96] International Standard ISO/IEC 10179:1996(E), Information Technology--Processing languages--Document Style Semantics and Speecification Language (DSSSL).

[KR97] R. Khare and A. Rifkin, "XML: A Door to Automated Web Applications", IEEE Internet Computing, 1(4), July-August 1997, 78-87.

[Lig97] R. Light, Presenting XML, Sams.net Publishing, 1997.

[Man97] F. Manola (ed.), "NICTS Technical Committee H7 Object Model Features Matrix", X3H7-93-007v12b, May 25, 1997, http://www.objs.com/x3h7/h7home.htm.

[Man98] F. Manola, Towards a Web Object Model, <http://www.objs.com/OSA/wom.htm>, February 1998.

[McG98] S. McGrath, ParseMe.1st: SGML for Software Developers, Prentice Hall, 1998.

[Meg98] D. Megginson, Structuring XML Documents, Prentice Hall PTR, 1998.

[Nic98] D. Nickerson, Official Netscape JavaBeans Developer's Guide, Ventana Communications Group, 1998.

[OH98] R. Orfali and D. Harkey, Client/Server Programming with Java and CORBA (2nd Edition), John Wiley & Sons, 1998.

[OMG97] Object Management Group, A Discussion of the Object Management Architecture, June, 1997, http://www.omg.org/library/omaindx.htm.

[Ous98] J.K.Ousterhout, "Scripting: Higher-Level Programming for the 21st Century", IEEE Computer, 31(3), March 1998, 23-30. See also http://www.scriptics.com/people/john.ousterhout/scripting.html.

[Pre97] P. Prescod, "Software Component Interface Description in XML", SGML/XML '97 Conference Proceedings, Graphic Communications Association, December 1997. See also <http://itrc.uwaterloo.ca/~papresco>.

[Sil98] N. Silberzahn, "Dealing with the Electronic Patient Record Variability: Object Oriented XML", presentation to the workshop "SGML/XML in Healthcare", GCA SGML/XML Europe '98 Conference, Paris, May 1998 <http://www.digitalairways.com/NiS/ParisXML98/>.

[SL90] A. Sheth and J. Larson, "Federated Database Systems for Managing Distributed, Heterogeneous and Autonomous Databases", ACM Computing Surveys, 22(3), Sept. 1990, 183-236.

[StL97] S. St. Laurent, Dynamic HTML: A Primer, MIS:Press, 1997.

[StL98] S. St. Laurent, XML: A Primer, MIS:Press, 1998.

[VV97] E. Vander Veer, Picking the Newest Crop of Beans, http://www.sigs.com/publications/objm/9711/vanderveer.html.


This research is sponsored by the Defense Advanced Research Projects Agency and managed by the U.S. Army Research Laboratory under contract DAAL01-95-C-0112. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of the Defense Advanced Research Projects Agency, U.S. Army Research Laboratory, or the United States Government.

© Copyright 1998 Object Services and Consulting, Inc. Permission is granted to copy this document provided this copyright statement is retained in all copies. Disclaimer: OBJS does not warrant the accuracy or completeness of the information in this survey.

This page was written by Frank Manola. Send questions and comments about it to fmanola@objs.com.

Last updated: 9/24/98 fam