CALTECH INFOSPHERES PROJECT

POSITION PAPER

K. Mani Chandy, Adam Rifkin, and the Infospheres Group,
California Institute of Technology

Introduction

The Infospheres project researches compositional ways of obtaining high confidence in dynamically-reconfigurable scalable distributed systems.

Overview. This paper provides a brief description of the four main attributes of the distributed systems we are exploring: compositionality, scalability, dynamic reconfigurability, and high confidence. We present some of the issues we are considering in our research of such systems, and provide links to our further explorations of these issues.

Compositionality. We believe in the near-future application of the Internet as a communication substrate for the provision of a worldwide pool of distributed persistent interacting objects. Distributed systems constructed on-the-fly from a pool of globally available objects are naturally compositional: objects that can "discover" each other can become the components of temporarily collaborating applications that can be restructured before and during task execution. We are exploring ways to support the dynamic composition of objects over the Internet.

Scalability. We envision that everyone who uses a computer will "own" dozens of objects providing an assortment of personally-customized functionalities; the result will be billions of objects in the worldwide pool that can potentially interact with each other. We foresee the worldwide pool of objects developing as a controlled anarchy grounded in open standards, much in the same way a worldwide pool of documents continues to grow in the World Wide Web. We are investigating the provision of scalable solutions by supporting the decentralized development of objects and their interactions.

Dynamic Reconfigurability. The familiar yet difficult problem of legacy systems is exacerbated in a worldwide pool of potentially long-lived objects, because the environment in which an object is active can change significantly over the course of that object's lifetime. Objects living for years may have to adapt to interact with new classes of objects. We are researching ways to reconfigure objects (and the connections between objects) while they execute.

High Confidence. We consider the rapid adoption of the telephone to be due in part to consumer confidence in its usability (achieved by having a small number of interfaces between telephones) and in its reliability (achieved by having a small number of companies making and installing telephones). Similarly, the rapid adoption of distributed object (or agent) technology also requires that the public gain confidence in the usability and reliability of the technology.

Reliability is more difficult to achieve in the case of a worldwide pool of interacting objects than in telephony, both because objects encapsulate many more potential functionalities, and because the object marketplace itself is more anarchic and dynamic. The desired "cottage industry of software components" will need to allow for many different groups --- from individual high-school students to large software houses --- to add classes and objects to the pool.

Indeed, systems with the attributes that we are investigating (compositionality, scalability, and dynamic reconfigurability) make the provision of high confidence more difficult; however, the integration of these attributes with high confidence is essential to user and developer adoption. To take one example, we require that our systems adapt to environment changes, but we must ensure that they evolve according to the constraints imposed by designers, and not in an ad hoc manner. We are researching ways in which high confidence can be engendered in collaborations between components drawn from a worldwide dynamic (and often anarchic) pool.

Theory and Implementation. Our group works on both theory and implementation. We use temporal logic for reasoning about the correctness of distributed object systems and stochastic processes for reasoning about performance and reliability. In June 1997, we released Infospheres 1, a preliminary version of our infrastructure for developing global distributed systems using Java and Internet technologies. In 1998, we will release a more flexible, robust Infospheres 2; although the implementation uses Java, XML, and Internet technologies, the ideas are applicable to distributed object systems based on other tools such as CORBA and DCOM.

Research Problems

Although space considerations prevent us from discussing all of the problems that we are investigating, we briefly describe some of the key issues in our current research.

Discovery. In a Web with millions of classes and billions of objects, an important issue is adding classes and objects to the Web so that other objects can find the classes for downloading and can find the objects for interactive computing. The research questions we are addressing are: how can items be named uniquely, how can items be published for easier remote discovery, and how can items be described so that they can be found?

Specifications. An object searching the Web for a class carries out the search using a specification, however informal, of the desired class. The specification may range from a set of desired attributes to a temporal logic formula. We are studying methods of specification to support mechanical searching and retrieval of classes and objects.

Verification. One such approach is to publish specifications on the Web in addition to publishing classes and objects. We use relationships called conformances between classes and specifications, and refinements between specifications. An individual or organization can assert, using formal verification or testing, that a class conforms to a specification, or that one specification is a refinement of another. The conformance relationship also indicates the identity of the organization that asserts the conformance, and other information such as the methods used to determine conformance. When an object finds a class that satisfies a specification, it can use the degree of trust in the organization that asserts conformance to decide whether to use that class.

A specification can be written using XML or OCL, allowing it to be readily parsed by objects, and each specification has a URN as its unique name. The document associated with a specification may, in turn, point to the URNs of the specifications of other components. For our purposes, two specifications are equivalent if and only if are identical documents that reference the same URNs. This approach avoids the problem of equivalent ontologies; later, we plan to use results from other groups researching ontological issues to come up with more general operational definitions of equivalence.

Dynamic Checking. When we compose objects in traditional programming, we can test the objects extensively to ensure that they satisfy their specifications, or we can use formal reasoning to verify them. In contrast, when objects find each other on-the-fly and form their own compositional structures, an object may not have the luxury of testing other components extensively before forming relationships with them. Verification by anyone other than the implementor may not be possible because the source code of the object may be non-disclosable intellectual property. The key question here is: how can we achieve added confidence in a composed object structure given these restrictions?

An approach that we are researching uses certificates and checkers . A certificate is a temporal logic formula, and a checker is an object that probes the information flowing into and out of an object and checks whether the object obeys its certificates. The eventually operator in temporal logic is replaced by a timing requirement; as a result, we replace "every request gets a response eventually" by "a request gets a response within a specified period of time."

Selective Assurance. In some situations, we need particular assurance that a component behaves according to specifications for our interactions with it , even if it may be faulty for other interactions. For instance, in a temporary collaboration with an object that inverts matrices, we need to be assured that this object correctly inverts the matrices that we provide to it, even though it may fail for some other matrix. The Web of classes, objects, and specifications also includes checkers that can be composed with objects, and we study ways of composing these different entities systematically (and, in some cases, automatically).

Reliability. UML and object modeling languages have been used to design object-based systems carefully. We are researching extensions of UML and OCL specifically to help in obtaining high confidence in dynamic compositions of distributed objects found on the Web. A modeling language for this class of applications has to deal with timeouts, hardware failures, and situations that are of only secondary concern on systems that run within a single machine.

There are many more reliability aspects to Infospheres 2 than we have space for here. Our research also explores fault tolerance, security, and timing in the context of quality of service guarantees.

Composable Middleware Communication Mechanisms. Objects in Infospheres 2 communicate by remote method calling, by message passing, or by event posting and listening. Each communication mechanism can be implemented in terms of the others, but we have all three in our infrastructure so that we can study different compositional structures using each of them. Here, we have space to discuss only one idea.

Consider a distributed resource management system with resources, providers of resources, and requestors of resources. Traditionally, a requestor queries different providers for different types of resources; for example, if you want to go on a ski trip, you have to call different airlines, car companies, and hotels, to piece together the best deal. The key issue we are researching is: what compositional structures do we need to allow you to merely announce your requirements (for example, ski trip, price, kind of slopes, and hotels) and have the pool of providers respond by giving you proposals?

A specific research area we are investiating is that of instant "middle man" collaborations: an intermediate object can search for objects that provide cars, or hotels, or planes, and create a collaboration between them to put together a complete package for you. Some collaborative structures are best suited for event-based, announce-listen multicast communication, while others are better suited for point-to-point remote method calls. We are studying compositional structures using different communication mechanisms, with a specific emphasis on structure reusability.

The idea of your announcing a requirement --- "I want a ski trip with the following attributes" --- may sound farfetched, but the same application in metacomputing, where requestors announce requests for computational resources such as processors, is meaningful today. [See our paper: A General Resource Reservation Framework for Scientific Computing].

Aggregatable Parts. In Infospheres 2, a part is a router that directs the method calls made by an object to other method calls on target objects that can be physically distributed. Parts can be changed on-the-fly by adding, deleting, and redirecting methods as well as changing method implementations. To deal with the complexity of owning many parts, users can construct a hierarchy of parts stored locally in a directory structure.

Composition of parts is achieved by aggregating methods; for example, an object can call a part with method M, and that part then calls a collection of objects with method M' and returns a vector incorporating all of results. This is particularly useful for applications such as dynamic workflow for a task force whose members work for multiple organizations; one object can call a part acting as a secretary that collates the results from all of the other members of the task force, even as the task force membership changes on-the-fly as the situation warrants.

One of our challenges is to provide behavioral reliability as objects are changed on-the-fly. With parts, we can ensure at runtime that if a method M on a part is routed by the part to call method M' on an object, then the semantics of M' on that object correspond to the semantics of M on that part. To achieve this, we employ a "semantics understanding" by exporting assertions and other semantic information through XML.

On-the-Fly Mediation. Finally, one of our biggest research challenges is the problem of automating the connection of different parts when their interface specifications are compatible but not the same. For example, if you find out that you have one method, and the object you call has another method with the same semantics, then you should be able to create a part on-the-fly that intermediates between the two. We are investigating the decentralized generation of protocol stacks to allow different parts to communicate automatically during a collaboration.

Links to Other Papers

For more information about our project and some of the ideas we have discussed in this document, please investigate these papers.
  1. Home Page: Caltech Infospheres Project. (http://www.infospheres.caltech.edu)
  2. Abstract: Specification, Composition, and Validation of Distributed Components. (html)
  3. Abstract: A Distributed, Persistent Component Repository. (html)
  4. Paper: Systematic Composition of Objects in Decentralized Distributed Internet Applications: Processes and Sessions. Published, Oxford University Computer Journal, October 1997. (html)
  5. Paper: A General Resource Reservation Framework for Scientific Computing. Accepted for publication, the First International Scientific Computing in Object-Oriented Parallel Environments (ISCOPE) Conference, December 1997. (html)
  6. Paper: Designing Directories in Distributed Systems: A Systematic Framework. Published, High Performance Distributed Computing, August 1996. (postscript)
  7. Paper: Weaving a Web of Trust. Published, World Wide Web Journal, summer 1997. (html)
  8. Paper: Webs of Archived Distributed Computations for Asynchronous Collaboration. Published, Journal of Supercomputing, summer 1997. (html)
  9. Paper: Capturing the State of Distributed Systems with XML. Published, World Wide Web Journal, fall 1997. (html)
  10. Paper: Composing Processes Using Modified Rely-Guarantee Specifications. (compressed postscript)
  11. Paper: Toward High-Confidence Distributed Programming with Java: Reliable Thread Libraries. Published, International Conference on Systems Engineering, July 1996. (compressed postscript)
  12. Paper: Providing Easier Access to Remote Objects in Distributed Systems. Accepted for publication, Hawaii International Conference on System Sciences, January 1998. (html)
  13. Paper: Developing Peer-to-Peer Applications on the Internet: the Distributed Editor, SimulEdit. Accepted for publication, Dr. Dobb's Journal. (html)
  14. Paper: A Framework for Structured Distributed Object Computing. Submitted to Parallel Computing. (html)
  15. Project Report: Virtual Swap Meet: A Distributed Agent Marketplace. (html)
  16. Position Paper: Caltech Infospheres Project. Accepted to Joint W3C/OMG Workshop on Distributed Objects and Mobile Code, June 1996. (html, html slides)

K. Mani Chandy, Adam Rifkin, and the Infospheres Group;

mani@cs.caltech.edu, adam@cs.caltech.edu;
Caltech Infospheres Project, November 21, 1997.