Characterizing Computer-Related Grid Concepts

Frank Manola

Object Services and Consulting, Inc.
fmanola@objs.com

March 30, 1999


Contents


Executive Summary

The term grid is increasingly appearing in computer literature, generally referring to some form of system framework into which hardware or software components can be plugged, and which permits easy configuration and creation of new functionality from existing components. The "killer applications" for these grid concepts include computational challenge problems (e.g., codebreaking) requiring supercomputing capabilities, universal availability of customized computing services (e.g., access to one's individual desktop and application suite anywhere in the world), and global integration of information, computing, and other resources for various purposes.  Several DoD and industry programs contain some form of grid concept. However, the idea of a "grid" is a relatively new architectural idea, and not very well understood.  Sometimes the term grid is used loosely in describing systems connecting some collection of distributed resources, while in other cases it is clear that some more advanced set of capabilities is involved.  The purpose of this study is to better characterize the architectural concept of a grid, and describe some grid construction technology and interoperability issues.  In particular, this paper is intended as a step toward defining the generic idea of a grid more carefully.  It also suggests some possible meanings for calling an architecture a grid at various levels of abstraction, the purpose being to identify general technical capabilities which need to be added to more conventional system concepts in order to form "grids".

Fundamentally, a grid is an integrating mechanism or concept.  This concept can be applied at different technical levels of computer systems, e.g.:

Grid-like ideas exist at several of these levels already.  Grid concepts at the level of interconnected computers include computational grids intended for supercomputing applications, and computing fabrics intended to provide ubiquitous computing capabilities.  Grid concepts have also been defined at the level of agents, e.g., the agent grid being developed in DARPA's CoABS program.  In addition, DoD concepts such as the Advanced Battlespace Information System (ABIS) have defined grids at different functional levels, including information, sensor, and engagement grids.

The definitions of "grid" found in dictionaries generally imply some concept of a "network" or "mesh".  This is certainly the generic idea of a grid.  Many things can be referred to as "grids" in this simple sense, including, in the context of computer systems, the Internet, the Web, or the objects in a CORBA-based distributed object system (which form an interconnected network by virtue of the references the objects have to each other).  However, the grid concept used in computational grids, computing fabrics, and the ABIS and CoABS grids implies additional requirements, a stronger cohesiveness.  Typically, such "true grids" are formed by starting with networks of distributed resources, and adding capabilities or services that help further integrate the interconnected resources.  The integration found in such "true grids" involves such things as:

Computational grids and agent grids are not the only levels at which "true grids" can be formed.  Data/object grids could be formed by starting with database systems, the Web, and distributed object systems, and providing additional mechanisms for composing resources, enhanced metadata and reflective facilities, and other support mechanisms that support greater levels of resource integration.  The development of grid concepts at these various technical levels reflects the fact that simple interconnection technologies at these levels are becoming relatively mature (even though there is still much work to do on these technologies).  The emphasis now is on techniques for combining the interconnected resources to solve increasingly complex problems.  In particular, there is emphasis on: Additional work is needed to identify the details of the added functionality required to go beyond simple distributed collections of resources to the formation of "true grids".  Grids have been defined at the computation level already.  At other levels, more advanced integrating mechanisms have been defined (e.g., federated DBMSs at the data level), but these do not yet approach the level of grids.  To advance the use of the grid concept, the general idea of a grid needs to be applied to each of these levels, in order to identify in detail the specific technologies (some of which may already exist, e.g., active DBMS capabilities at the data level) which would enable the creation of grids from collections of distributed resources at each of these levels.

Grids at these individual levels are useful by themselves, but the maximum advantage comes when these different levels of grid capabilities are combined.  There is a need for additional work to develop a unifying technical grid architecture which incorporates these separate grid levels, and identifies mappings between them.  The requirements of true grids at the higher levels (e.g., agent) probably require grid-like functionality at the lower levels (e.g., data and computation) anyway, hence these mappings are needed to guide the implementation of the higher levels in terms of the lower ones.  These functional mappings are similar to the types of end-to-end mappings being investigated in providing Quality of Service guarantees in distributed systems [Man99b].  Building such a combined grid, for example, would appear to involve all the computational grid issues of system management (and associated metadata), distributed computation and load balancing, mobile code, security, etc., as well as the data-, object-, and agent-level versions of those issues (issues which, e.g., reflect the semantics of the entities on whose behalf the agents are functioning, or the resources which the agents wrap). This requires a way of describing resources and capabilities, and resource requirements and tasks, and a way to map between them, at all these levels. The combined grid also involves the need for a way to define higher-level goals, i.e., a way to define the goals of the grid itself, that are optimized by the load-balancing, etc. that is going on. These goals are presumably at a higher-level than those of individual agents (although these might also be characterized as the goals of higher-level agents, or agents of higher authority, rather than goals of the grid per se).

In such an integrated architecture, in addition to the technical levels already mentioned, there is also a need to define additional forms of organization on the available resources.  These include such things as the use of multiple functional tiers, the use of Common Schema concepts or enterprise-level ontologies (enterprise-wide agreements on common semantics), and specialized schemas/ontologies for use by specialized user communities (together with mappings to the common definitions where possible).  Semantics-based mappings between the different technical levels in such an architecture are also required.  Such additional levels of organization provide the basis for more interoperability among the various technical levels of resources contained in the system, and hence help enhance the ability of these resources to operate as a "true grid"

All this work requires additional analysis not only of technical issues, but also of application requirements.  All sorts of technology already exists that is potentially useful in integrating distributed resources.  However, further detailed understanding of application requirements is needed to drive the detailed selection of the combination of technical features needed to form various types of grids.  Work on the CoABS grid is an example of ongoing work of this type.


1.  Introduction

The term grid is increasingly appearing in computer literature, generally referring to some form of system framework into which hardware or software components can be plugged, and which permits easy configuration and creation of new functionality from existing components. The "killer applications" for these grid concepts include computational challenge problems (e.g., codebreaking) requiring supercomputing capabilities, universal availability of customized computing services (e.g., access to one's individual desktop and application suite anywhere in the world), and global integration of information, computing, and other resources for various purposes.  Several DoD and industry programs contain some form of grid concept. However, the idea of a "grid" is a relatively new architectural idea, and not very well understood.  Sometimes the term grid is used loosely in describing systems connecting some collection of distributed resources, while in other cases it is clear that some more advanced set of capabilities is involved.  The purpose of this study is to better characterize the architectural concept of a grid, and describe some grid construction technology and interoperability issues.  In particular, this paper is intended as a step toward defining the generic idea of a grid more carefully.  It also suggests some possible meanings for calling an architecture a grid at various levels of abstraction, the purpose being to identify general technical capabilities which need to be added to more conventional system concepts in order to form "grids".

The grid concept is being applied to computer systems at several different "levels" (e.g., to both systems of computers and systems of agents).  As a result, this study attempts to identify some general characteristics which seem to apply to all sorts of grids, in order to provide a "big picture" in terms of which grid concepts can be better understood, rather than presenting the details of technical issues associated with specific grid concepts (although this is also important).  In Section 2, we give examples of several "grid-like" concepts, in order to provide a background for understanding the grid concept.  In Section 3, we identify some general grid characteristics, based on common characteristics of these examples.  We also look at some important types of computer systems, such as database and distributed object systems, examine the extent to which they resemble grids, and identify some types of facilities which, when added to those systems, would cause them to be considered more "grid-like", based on the characteristics we have identified.  We also discuss the need to combine these various types of grids together into unified architectures, and describe the start of a general approach to doing this.


2.  Grids and Grid-Like Ideas

2.1  Computational Grids

The basic concept of a computational grid is defined in [FK99b]. The term grid is used to indicate an analogy with the electrical power grid. Just as a power grid links sources of electrical power together, and provides for widespread access to and distribution of that power (with associated load-balancing and other services), a computational grid is "a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities". The concept of a grid as an infrastructure is important because "…a computational grid is concerned, above all, with large-scale pooling of resources, whether computer cycles, data, sensors, or people. Such pooling requires significant hardware infrastructure to achieve the necessary interconnections and software infrastructure to monitor and control the resulting ensemble". The intent of a [computational] grid is to provide [computational] services that are dependable, consistent, pervasive, and inexpensive.  The term "grid" is only slowly becoming associated with the concept of creating a giant computational environment out of a distributed collection of files, databases, computers, and external devices. The term "metacomputing" is also frequently used.

The paper identifies five major application classes for computational grids:

For application development, a grid must provide both appropriate programming models and a range of services. The paper notes that there is currently no consensus on what programming model is the most appropriate for a grid environment. Models that have been proposed include: Services that must be provided include: Relevant technologies come from areas such as distributed file systems and databases, distributed operating systems (particularly, such areas as load balancing and process and data migration), parallel and distributed programming, and network management.

The paper also notes that "computational infrastructure, like other infrastructures, is fractal, or self-similar at different scales. We have networks between countries, organizations, clusters, and computers; between components of a computer, and even within a single computer." The paper describes systems at the scales of end system, cluster, intranet, and internet, the basic idea being that these constitute different scales at which similar computational services should be provided (mimicing those provided at the smallest scale, in the individual computer).  Of course, it is then necessary to look at how those similar services must be provided as the scale changes, since different technologies must typically be employed.

[GFLH98] and [FK99b] describe several projects developing technology for computational grids.  A simple example is PVM (Parallel Virtual Machine) <http://www.epm.ornl.gov/pvm/pvm_home.html>.  PVM is a software package that permits a heterogeneous collection of Unix computers hooked together by a network to be used as a single large parallel computer.  PVM allows users to exploit existing computer hardware to solve large computational problems at minimal additional cost.  PVM is very portable, and the source code has been compiled on a wide variety of machines.  PVM is very widely used, and is a de facto standard for distributed computing world-wide.  A wide range of PVM-related links is available at the PVM home page cited above.  Related facilities are provided by MPI (Message Passing Interface) [GLS94], a community-generated standard for message passing used to interconnect multiple machines.  UCLA's Project Appleseed <http://exodus.physics.ucla.edu/appleseed/appleseed.html> is an example of how MPI can be used to link together a cluster of computers (Macintoshes in this case) to provide "a plug and play parallel computer" in support of numerically-intensive processing.  The Appleseed Web site also contains pointers to further information on MPI.

Legion <http://www.cs.virginia.edu/~legion/> [GG99] provides an environment in which a collection of workstations, vector supercomputers, and parallel supercomputers connected by LANs and larger-scale networks appears to the user as a single very powerful computer.  Legion uses object-oriented design techniques to "simplify the definition, deployment, application, and long-term evolution of grid components".  The Legion architecture defines a complete object model that includes object abstractions for compute resources (called host objects), storage systems (called data vault objects), as well as other object classes.  Users can use inheritance to specialize the behavior of these objects to support specific requirements, as well as to develop new objects.  Legion supports PVM's libraries via emulation libraries.  Legion aims to provide a single, coherent, virtual machine addressing scalability, programming ease, heterogeneity, fault tolerance, security for users and resource providers, site autonomy, multilanguage support, and interoperability.  The use of reflection (the representation of parts of the underlying system as objects that can be directly operated on to access and change system behavior) is particularly important in Legion.  For example, host objects represent Legion processors.  One or more host objects run on each computing resources included in Legion.  These objects create and manage processes for application-level Legion objects.  Object classes invoke the operations of host objects to activate their instances on the computing resources that the host objects represent.  Representing computing resources as Legion objects abstracts the heterogeneity of different host computing platforms, and allows resource owners to manage and control their resources within the context of the system.  (Reflection is also an important technology in providing systemic properties (sometimes called ilities) such as reliability, survivability, and security, and quality of service characteristics, in large-scale computer systems [Man99b]).

Globus <http://www-fp.globus.org/> [FK99c] is developing basic software infrastructure for computations that integrate geographically distributed computational and information resources. Globus is based on the assumptions that:

Globus thus focuses on defining a toolkit of low-level services for security, communication, resource location, resource allocation, process management, and data access.  These services are then used to implement higher-level services, tools, and programming models.  According to [GFLH98], "Globus has withstood many tests, including a recent one involving battlefield simulations distributed across more than 30 machines and representing the independent activity of more than 100,000 tanks, trucks, and other units."

[FF97a,b,c] discuss the concept of "High-Performance Commodity Computing", the idea that computational grids should be based on emerging commodity network computing technologies such as CORBA, DCOM, and JavaBeans, together with the Web and conventional networking approaches.  The papers discuss a three-tier architecture which integrates these technologies.  This approach is in contrast with the more specialized grid architectures proposed in Legion and Globus (although these could be integrated to support lower-tier services).  The authors particularly emphasize the importance of the emerging "Object Web", integrating the Web, distributed objects, and databases, in the development of computational grid technology.

The focus of much of this work appears to be on large-scale computing problems, although the technology is clearly not limited to those applications.  Other grid concepts discussed below extrapolate ideas in distributed supercomputing to more complex applications.  For example, in distributed supercomputing, the paradigmatic application is often that of a single large computing "job".  The program is run, and a result is produced.  A grid is required simply because the job is too large for a single machine.  In other grid concepts, the application is of a more continuous nature.  This means it must be possible for participants to enter and leave the grid, load distribution is even more dynamic (because the load and its requirements change more dynamically), etc.  The next section describes a new twist on more familiar applications supported by computational grid concepts.


2.2  Computing Fabrics

Another grid-related "vision" is presented in a series of articles describing what is referred to as Computing Fabrics <http://www.infomaniacs.com/>.  As described in these articles, the Computing Fabric consists of nodes, which are packages of processors, memory, and peripherals, linked together by an interconnection facility.  Within the Fabric are regions of nodes and interconnections that are so tightly coupled that they appear to be a single node.  These are called cells.  This tight coupling is obtained using hardware, software, or both.  Cells in the Fabric are then loosely coupled with each other.  The coupling between cells appears differently from the coupling between the components of a node.  The Fabric as a whole, or each cell in it, can grow or shrink in a modular fashion, by adding or removing nodes and links.  Nodes from the Fabric surrounding a cell can join that cell, and nodes within a cell may leave that cell and join the surrounding Fabric.  In addition, cells can divide and merge.  Each cell presents the image of a single system, even though it can consist of many nodes.

The articles give ubiquitous network computing as an example of an application made possible by the Fabric.  The first aspect of the application is network computing:  each user can access their individual "desktop" (configuration, including all applications, data, etc.) from anywhere on the network.  To this is added ubiquitous computing, in which processors, displays, and input devices are everywhere.  Users are tracked by sensors, and their location information is used to direct their applications and data to the appropriate devices that are located where the user is located.  This changes as the person moves.  There is no need for users to explicitly login to access their computing spaces, they are just "there".  The Fabric helps avoid the need for the universal presence of sufficient computing power, displays, and input devices necessary to run whatever applications the user wishes to run locally.  In this scenario, processors are located all over, e.g., throughout buildings ("as populous as wall sockets, perhaps more so"), and are interconnected by low latency, high bandwidth connections.  When the user is stationary, the user's tasks run on a local cell, consisting of processors in the general vicinity, which work together as a single system.  If the tasks require it (and they can be paid for), additional processors can be added (thousands of them, if necessary);  the computing resources are configured as required to run the software the user wants to run.  As the user moves, their cell moves with them.  Processing nodes leave the user's cell as their distance makes their communications latencies more than some threshold level, and are replaced by nodes that enter the cell as the user gets near them.  A new generation of wearable processor, display, and input devices rounds out the picture.

Technically the concept of Computing Fabrics involves ideas that are somewhat similar to those of the computational grid, but the application focus is somewhat different.  Technologies relevant to the creation of the Computing Fabric concept include:

The authors note that full exploitation of the Computing Fabric concept requires the integration of distributed object technologies and database technologies.  For example, technologies such as Microsoft's Millennium and Sun's Jini support code developed using object technology being automatically distributed using a distributed object infrastructure running atop massively distributed clusters.  Large-scale DBMSs already exploit parallelism and multi-system clustering.  DBMSs need to be further exploded into interoperable components that can more fully utilize Fabrics.  Logical-level models and views become increasingly important as data and processing are distributed over the Fabric, and as data is organized on increasingly large scales.  The Web (and XML) will also need to be included, as representing a large scale distributed data store.  (It is probably worth noting that Computing Fabrics was listed #1 on the Wired January 1999 issue's "Hype List" (p.68)).
 


2.3  DoD C4ISR Grid Concepts

The DoD's Advanced Battlespace Information System (ABIS) [ABIS96] concept describes a set of information services, technologies, and tools to support C4ISR.  The ABIS concept was produced by a task force composed of operational and technical personnel from all Services, the JCS, and major DoD agencies involved with C4ISR systems.   ABIS is described as acquiring, processing, and delivering information, as needed, to enhance decision making at all echelons involved in operational functions such as sensor-to-shooter correlation, real-time battle management, and multi-dimensional battlespace awareness.  The ABIS does not describe an actual system architecture, but it does describe a "capability framework", which organizes the system's functions.  The ABIS framework is organized into three layers: The ABIS information grid is described in the Grid Capabilities Working Group Results section of [ABIS96] <http://www.dtic.mil/dstp/96_docs/abis/volume5/abis501.htm>.  The ABIS grid is conceived of as an information environment including communications, processing, information repositories, and value-added services that provide users with an ability to find information, obtain processing services, and exchange information.  Warfighters will be able to connect to this grid anywhere and at any time, and will be able to craft their own information environment by selecting the types of services, information, and interfaces that are appropriate to their missions and styles of operations.  The grid will provide connectivity and information that will adapt to changing situations and be responsive to the warfighter's need for knowledge.  It will adapt to the constraints imposed by connectivity at the tactical levels and will be able to organize resources within the global infrastructure to service the needs of the warfighters.  It will provide access and security controls and information warfare defenses that are matched to the operational situation, and it will be managed in accordance with operational needs and priorities.  In its initial stages, the grid will integrate existing networks and processing facilities to begin to establish an integrated information environment that spans the existing systems of the services and CINCs.  Capabilities will be added in the near term to help manage the total information and the end-to-end services, including extensions to tactical users on the move.  Additional capabilities, in the form of automated intelligent agents, will be added to assist the users in finding and retrieving information so that they are not overwhelmed with the massive amount of information and sources available in the grid.

Overall, the ABIS grid is a federated, heterogeneous system-of-systems.   Participants in the grid may include civil, commercial, and foreign organizations.  In the grid, ownership and management of information and services will be structured according to the needs and prerogatives of the participants.  Grid functionality will extend to all types of users in joint and combined operations.  As a result, the grid must cope with the heterogeneity of the commercial world, and of allies and potential coalition partners.

[J695] describes a related concept, called C4I For The Warrior (C4IFTW).  C4IFTW sets forth a 21st century vision of a global information infrastructure referred to as the global grid that will provide virtual connectivity from anywhere to anywhere instantaneously on warrior demand.  This grid connects commanders, sensors, weapons systems, etc., and is made up of a web of computer controlled telecommunications grids that transcends industry, media, government, military, and other nongovernment entities.  The C4IFTW global grid essentially corresponds to the ABIS information grid.  In addition, C4IFTW identifies a sensor/surveillance grid, layered on top of the global grid, and roughly corresponding to the ABIS battlespace awareness capability.

The concept of Network-Centric Warfare [CG98, DC498] develops these ideas somewhat further.  As described in the references, Network-Centric Warfare is a derivative of network-centric computing.  Just as network-centric computing is being exploited to provide competitive advantage in the commercial business sector, the emerging concepts of Network-Centric Warfare exploit information superiority to provide a competitive edge in warfare.  Grid concepts are key elements in Network-Centric Warfare.  In addition to making use of the information grid and sensor grid concepts of ABIS and C4IFTW, Network-Centric Warfare introduces a third element (effectively present in ABIS and C4IFTW, but not explicitly called out as a grid in these descriptions), called the engagement grid.  Specifically, Network-Centric Warfare includes the following grid concepts:

An example of an existing operational architecture that employs network-centric operations to increase combat power is the U.S. Navy's Cooperative Engagement Capability (CEC).  CEC networks the sensors, command and control, and shooters of the Carrier Battle Group's platforms to develop a sensor grid and an engagement grid.  The mission-specific sensor grid generates a high level of battlespace awareness by fusing data from multiple sensors, enabling quantum improvements in track accuracy, continuity, and target identification over standalone sensors.  The CEC engagement grid exploits this awareness by extending the battlespace, and engaging incoming targets in depth with multiple shooters with increased probability of kill.

The Information Superiority Chapter of the 1998 Joint Warfighting Science and Technology Plan [DDRE98] describes the composition of the information, sensor, and engagement grids to form a C4ISR grid that supports DoD's Information Superiority concept (the "degree of dominance in the information domain that permits the conduct of operations without effective opposition").  The plan also identifies high-level functional capabilities required for Information Superiority, which of them are supported by the C4ISR grid, and key technologies the grid must support, including:

All of these DoD grid concepts are very ambitious and powerful, embodying the concept of being able to integrate not only global computing and communications resources, but also sensors, weapons, etc., in extremely flexible, custom-tailored combinations to achieve mission objectives.


2.4  CoABS Grid (Agent Grids)

Another grid concept is found in DARPA ISO's Control of Agent-Based Systems (CoABS) program.  Here, the grid concept is applied to agents, since a key "vision" of the program is the concept of a grid as a means of making agent-based systems more interoperable and pervasive.  General characteristics of the CoABS grid are described in the CoABS read-ahead document [CoA98], including: Further developments of CoABS grid ideas are provided in [Ket98, Pis98b].

These bullets suggest that the CoABS grid knows not only about agents, but about their computational requirements (e.g., how they can be broken up into processes, so they can be distributed across multiple computers), and about available computational (and other) resources.  Hence, the CoABS grid concept appears to incorporate both the concepts of "grid" as used in Section 2.1, and Computing Fabric as used in Section 2.2, in the sense of providing a unified, heterogeneous distributed computing environment in which computing resources are seamlessly linked. In addition, the CoABS grid extends the idea upward to the agents that are the "applications" of this distributed computing environment. Agents become both applications whose computations can be distributed within this computing environment, and also resources that can be used by this environment. At the same time, there appears to be an interface between these two layers, so that at least some agents, e.g., those that do load balancing, can operate on the computing level grid.  Furthermore, since the CoABS grid is defined as encompassing other resources (e.g., forces), CoABS grid ideas also appear to be consistent with aspects of the DoD grid concepts of Section 2.3 (although this relationship has, at least so far, not been particularly emphasized).  For example, agents are explicitly mentioned as components of these DoD grid concepts,  and agents could well be the implementations of choice for many of the applications incorporated in these grids.  Agents could also serve as wrappers of resources (and mediators between them) in these architectures.

Building the grid suggested by the above bullets would appear to involve all the computational grid issues of system management (and associated metadata), distributed computation and load balancing, mobile code, security, etc., as well as the "agent-level" versions of those issues (issues which, e.g., reflect the semantics of the entities on whose behalf the agents are functioning, or the resources which the agents wrap). This requires a way of describing resources and capabilities, and resource requirements and tasks, and a way to map between them, at both agent and computational levels. This grid also apparently involves the need for a way of defining higher-level goals, i.e., a way to define the goals of the grid itself, that are optimized by the load-balancing, etc. that is going on. These goals are presumably at a higher-level than those of individual agents (although these might also be characterized as the goals of higher-level agents, or agents of higher authority, rather than goals of the grid per se.)

All this suggests that one view of the CoABS grid could be that of a combination of a computational grid and an Agent System architecture (or at least a form of one, aimed at federating conventional agent system architectures).  This would mean that it would need to incorporate typical agent system architecture services.  A typical list of these services is given below (see, e.g., [Pis98a; KT98; Paz98a,b; Tho98a (slide 13)].

It may be noted that the requirements above define a sort of "agentized" Object Services Architecture (OSA).  This in turn raises a number of issues, such as: Characterizing the Agent Grid [Tho98b] contains a collection of additional issues connected with the CoABS grid.  These issues include both detailed technical issues and broader organizational issues.  For example, the CoABS grid concept involves reconciling the normal autonomy assumptions of agents with the idea that they should be prepared to cooperate with the grid. It will be necessary to determine whether or not individual agents (or other resources made known to the grid) can "hold out" some proportion of their capabilities without making them available to the grid, can assert internal priorities over their use, can require the grid to negotiate for their use, etc.  The CoABS grid characteristics cited above suggest a high degree of centralization of information and control, in the sense that the grid is supposed to have a large amount of knowledge about, and control of, grid components. Normally, a military organization involves a hierarchy that filters detail as information flows up the hierarchy, and a fair degree of autonomy of lower levels in carrying out assigned missions (on the basis of assigned goals and assigned resources, and, obviously, reporting specific details up the chain of command). There is an issue of how much centralization should be assumed, given that there might be communications outages or sudden situation changes. These might make it difficult to obtain the degree of coordination originally assumed, or to reconfigure fast enough to deal with a new situation (e.g., to return "borrowed" resources to their rightful owners).  At the same time, agents that respond to external "orders" are not the only ones that cede some autonomy to others (or to a centralized control mechanism);  agents in systems based on "market" or "economic" models (in which agents have "money", services are "sold", and components engage in buying and selling transactions in order to carry out their activities) effectively cede some autonomy too.  Agents that have economic incentives must interact and cooperate with others to get what they want or need, just as people must work to get money for food and other necessities;  that is, the market mechanism creates a situation in which the agent cannot function with complete autonomy.   Nailing down the specific details of the control and other mechanisms to be incorporated in the CoABS grid is an ongoing process, and there is still considerable work to be done.  [Ket98] is a focus for the evolution of some of these ideas.


2.5  Other Uses of "Grid"

A number of other DARPA activities refer to the idea of a grid.  For example, the ATAIS architecture [BFHH+98] mentions "grid" in a number of places.  The references are rather generic:  in some cases the reference is to an "information grid", in others it could be interpreted as a "communication grid" (in the sense of the Internet), while in others it could be interpreted as references to a full-scale computational grid.  However, since ABIS and related concepts are mentioned in the report (although the use of grids in these concepts does not seem to be explicitly mentioned), it seems reasonable to assume that these ATAIS grid references are based on the ABIS grid concepts described in Section 2.3.  Characterizing the Agent Grid [Tho98b] mentions a number of other examples of grid-like concepts.  However, as with the references in the ATAIS architecture, many of these concepts tend to be somewhat generic as compared with those described in Sections 2.1-2.4.  Section 3 discusses these different senses of the term "grid" in more detail.


3.  Discussion

3.1  The Problem of Defining "Grid"

The descriptions of various grid-like concepts in the previous sections (and in the references cited) help to convey a general idea of what a "grid" might be, although they do not really define what a grid is in any detailed sense (e.g., in the sense of identifying the distinguishing characteristics a system must have in order to be called a grid).  Attempts to come up with a precise definition of "grid" run into difficulties similar to those found in trying to come up with a precise definition of "agent" (similar definitional difficulties have surrounded the word "object", although these have been to some extent reduced by its operational definition in various object systems).  [Bra97b] observes that attempts to define "agents" have taken two approaches: ascription and description.  Definition by ascription recognizes the fact that, while there is often little commonality among the details of various "agent" concepts, they all have a "family resemblance".  This leads to the idea that "agent-ness is in the eye of the beholder".  In other words, definition by ascription says that agent-ness "cannot ultimately be characterized by listing a collection of attributes, but rather consists fundamentally as an attribution on the part of some person" [VV95].  As [Bra97b] notes, "This insight helps us understand why coming up with a once-and-for-all definition of agenthood is so difficult:  one person's 'intelligent agent' is another person's 'smart object';  and today's 'smart object' is tomorrow's 'dumb program'."

The problem with ascription is that it allows practically anything to be described as an agent, making communication about agent concepts difficult among people who do not share the same point of view.  A useful "filter" for using "agent" to describe a piece of software is that it should be useful to do so;  that is, calling something an agent should in some useful sense distinguish it from concepts we already understand.  For example, [Bra97b] quotes [Sho93] as observing:

"It is perfectly coherent to treat a light switch as a (very cooperative) agent with the capability of transmitting current at will, who invariably transmits current when it believes that we want it transmitted and not otherwise;  flicking the switch is simply our way of communicating our desires.  However, while this is a coherent view, it does not buy us anything, since we essentially understand the mechanism sufficiently to have a simpler, mechanistic description of its behavior."

A descriptive definition of an agent, on the other hand, typically involves a set of attributes, which a given agent might have to a greater or lesser extent, one such set being:

However, other sets of attributes exist [Bra97b], and there is much discussion about which attributes best characterize agents.

A similar situation exists in attempting to precisely define "grid".  We can get a general idea of what "gridness" is from the "family resemblance" of the grid examples presented in Section 2.  Further examples of grid-like ideas are presented in Characterizing the Agent Grid.  In addition, that report contains a set of general grid properties which could be used in a descriptive definition of a grid.  [FK99a] contains other sets of grid attributes.  In addition, considering an agent grid as a generalization of an agent system architecture, the list of services in Section 2.4 could be used as descriptive attributes of agent grids, together with sets of attributes given in [HS98].

The definitions of "grid" found in dictionaries generally imply some concept of a "network" or "mesh".  This is certainly the generic idea of a grid.  Many things can be referred to as "grids" in this simple sense, including, in the context of computer systems, the Internet, the Web, or the objects in a CORBA-based distributed object system (which form an interconnected network by virtue of the references the objects have to each other).  However, the grid concepts described in Section 2 imply additional requirements, a stronger cohesiveness.  If we are going to use the term "grid" in a computer context, the example of "mis-ascription" cited above becomes relevant:  in the same sense that it buys us nothing to refer to a light switch as an "agent", it buys us nothing to refer to the Web as a "grid", even if it might be technically accurate to do so.  In other words, if we are going to use a new term such as "grid" to describe particular computer-based systems, it would be helpful to explicitly identify the properties we want to associate with those systems that distinguish them from computer-based systems we are already familiar with (such as the Internet, the Web, distributed object systems, etc.), and for which we already have other names.

In addition, a problem with current descriptions is that the grid concept is relatively new.  As a result, the focus of descriptions is on individual grid concepts and applications, and little attempt has been made to provide a "big picture" that might help unify the various concepts and related technologies.  For example, what is the relationship between a computational grid or computing fabric, the Web (as a form of "information grid"), distributed object systems, and agent grids?  In addition, it is clear that multiple kinds of grids will in some cases be integrated to form more extensive "grids".  This is illustrated by integration of information, sensor, and engagement grids in the DoD architectures described in Section 2.3.  Similarly, while the CoABS grid does not (at least not yet) consider integration of data, distributed object systems, or the Web to any great extent, it seems clear that it will have to in some sense integrate these technologies in order to support its intended applications.

In the sections that follow, we describe some basic ideas for use in characterizing computer-related grids.  In Section 3.2, we discuss some general attributes that seem to apply to computer-related grids.  In Section 3.3, we look at some important types of computer systems, examine their "gridness", and identify some types of facilities which, when added to those systems, would cause them to be considered more "grid-like".  In Section 3.4, we discuss the need to combine the various levels of "grids" together into unified architectures, and the start of a general approach to doing this.  We present some concluding remarks in Section 3.5.


3.2  General Aspects of "Gridness"

By looking at the "family resemblance" of the grid concepts described in Section 2, we can say that a grid is fundamentally an integrating mechanism or concept.  In considering an integrating mechanism, it is useful to focus on: We can think of the things (resources) to be integrated in a computer-related grid as (in a rough order of increasing "semantic complexity"): Integrating these things involves: These, in turn, involve a number of more detailed, but still general, capabilities, including: A number of observations can be made in connection with the points made so far:

"Gridness" can be thought of as a continuum.  At one end, there is the simple interconnection or network of resources, as in the dictionary definitions of "grid".  We can think of such a network as a "loose grid", if we must use the term "grid" for these networks at all.  At the other end, there are the systems that allow the interconnected resources to function as well-integrated units, as in the grid concepts described in Section 2 (particularly the DoD concepts described in Section 2.3, and the CoABS grid as described in [CoA98]).  We can refer to these systems, which exhibit the characteristics described above (and possibly other defining characteristics not yet identified) in the strongest sense, as "true grids".  This "true grid" endpoint of the "gridness" continuum is, of course, an arbitrary designation.  Systems will exist at various points along the continuum, becoming "stronger grids" as they exhibit these "gridness" characteristics to a greater extent.

A key aspect of grids is composition of resources in a sense that goes beyond simply interconnecting them (although interconnection is clearly required).  The compositional facilities provided by grids can apply at all levels, including hardware/computational power, data and software (software including both individual components and services, and composition including such things as interoperability and formation of aggregates), and agents and people (e.g., formation of communities and teams).  These compositions of resources are applied to "composed tasks" (i.e., tasks that go beyond separately accessing or invoking the individual resources):  in the transportation grid, the composed task is generically "provide access to resources";  in the power grid, it's "provide power";  in computers, it's presumably "provide computation" (or, more abstractly, "perform service/task").  At the agent/human level, the tasks are suitably abstract (e.g., as "translate document" is enabled by the CoABS Grid knowing that there's a connected person who understands Arabic).  Ideally, we want these compositions to exhibit a fractal property or, looking at them the other way around, we want composition to exhibit a closure property.  This means that the resource compositions should have characteristics that are similar to those of individual resources at the same level of abstraction, so that we can treat the compositions as resources themselves.  For example, the computational grid seamlessly forms a large virtual computer from individual computers in a network, forming something that looks like yet another computer (which itself could be further aggregated).  Similarly, relational database theory emphasizes the idea that operations on data such as joins should exhibit a closure property, permitting newly formed aggregates of data to be operated on in the same way as the pieces from which they were formed.  A similar idea can apply to agents.  It should be possible to form teams or communities of agents that are interacted with as if they were single agents, with the group transparently dividing up any resulting work that has to be done.  Grids also tend to emphasize the dynamic aspects of composition, i.e., that it should be possible to easily form compositions of resources, then break them up when the resources are no longer needed, for recomposition elsewhere.  In addition, grids tend to involve some level of unified (but not necessarily centralized) management, since grids tend to be thought of as "units".  However, care is needed to match the level of abstraction of the management with the level of abstraction of the grid.  For example, the Internet has a certain amount of load management at the network level, but this does not make it a computational grid, even though it does connect numerous computers.  A level of management at the computational level would be required for that.

Whether the composition of resources involves movement of the resources depends on the kind of grid and its applications (there is invariably movement of some sort, but not necessarily of the resources).  For example, the composition of resources in a transportation grid necessarily involves moving those resources from where they are to where they are needed.   In a computational grid, the resources are generically "computational capacity".  In conventional computer networks, the capacity itself doesn't move, instead, the load is moved.  However, specific groupings of capacity ("virtual capacity") can seem to move as sharing arrangements and interconnections are set up and torn down (as in the case of the Computing Fabric of Section 2.2).  Data is moved in a computer network in the same way that resources are moved on a transportation grid.  In the case of distributed object systems, there can be either movement of load alone (e.g., in CORBA systems, where objects are static, messages representing load are sent to them, and messages representing results are returned), movement of resources (in the case of Java objects), or both (e.g., even in a Java-based network, some services, or special purpose devices such as sensors, may not be able to move).  Similar considerations apply to agent systems.

Grids involve the participants providing to the grid as well as taking from it. There is a great deal of asymmetry in some grid-related technologies that sometimes must be dealt with in order to build "true grids" from these technologies.  For example, it is straightforward to think of connecting personal computers to the Internet in order to access information.  It is less straightforward to think of these personal computers as being part of the Internet in the sense of having their file systems and computational facilities fully integrated with the Internet in order to form a computational grid in the sense of Section 2.1.  To do this, additional technical (and security) issues must be addressed.  From another point of view, it is generally more straightforward to integrate data than it is computational capabilities. Typically this is because (a) the interfaces (for others to gain access to attached computing resources) are not as well developed as they are for data, and (b) the mechanisms for effectively using the added computation are not as well developed either (e.g., in a local network it may be possible to run an application located on someone else's machine, but it is not as easy to distribute a computation over several machines).

The relationship between a grid in a "loose" sense and a grid in the stronger sense of Section 2 is generally that the "loose" grid is or can be used as part of an organization that constitutes a "true grid".  Finding the actual grid may sometimes require considering a wider context, or adding additional technology.   For example, the transport grid (or a subset, like "the railroad grid") may be viewed as just the network of transport connections and the points connected.  However, this grid was created in the context of higher-level desires by people to move/share resources (food and other goods).  It is the unification of the transport links, together with the higher-level control mechanisms (and to some extent the economic system that provides the "tasking") that creates a grid in the stronger sense.  The Internet is another example. At one level, the Internet may be thought of as a loose form of grid, because it provides network connectivity among multiple computers.  However, considering it a grid in a stronger sense requires additional technology.  For example, [ABIS96] notes that while the Internet might be a model for the ABIS information grid, it lacks attributes such as security, and resource allocation based on (mission) priority, needed to support their idea of a grid.  Considering a wider context can also identify relationships between the Internet and a stronger grid concept.  For example, via Internet email, it is possible for people to organize collaborative efforts, integrating the activities of widely-scattered people. This does not mean that the Internet, or Internet email, by itself, constitutes a grid.  However, considering the connected people as part of the "system" enables that system to be thought of more realistically as a grid, with the Internet as a part, and with higher-level organizational strategy and goals being provided by the people involved.  Similarly, distributed computer networks are at the heart of the computational grids described in Section 2.1, but additional mechanisms must be added to those networks in order to form grids in the stronger sense. Expanding the context can help us see both the grid that was intended, and also what additional components and mechanisms would be necessary to form a "true grid".  This suggests that we might want to look at technologies, such as the Web and distributed object systems, that clearly exhibit certain characteristics that we associate with grids.  However, we want to look at them not as grids in the fullest sense, but as "proto-grids", and then look carefully for the additional technologies that could be added to them to create grids in the stronger sense, as a way of pinning down what a "true grid" really is.

Finally, as stated in the final bullet above, "gridness" seems to imply that the system is "aware of itself" to a certain extent, and has the ability to carry out its tasks "itself", without a great deal of manual intervention.  For example, any interconnected group of distributed computers could be used as a much larger "virtual computer" by employing programmers to cope with all of the distributed programming and other problems necessary to use these resources in specific applications.  That does not mean that this set of distributed computers by itself is a grid.  What differentiates a computational grid is the fact that the grid itself provides services over and above the computers and network to help support the "virtual computer" illusion (possibly to a greater or lesser extent), and alleviate at least some of the detailed programming that would otherwise be necessary.  Similar comments apply to grids at other levels.


3.3  "Gridness" at Different Architectural Levels

As the previous discussion suggests, in considering the idea of computer-related grids, we are faced with computer systems at different "levels" that could exhibit "gridness" characteristics to varying degrees, specifically: (It should be noted that these are technical levels, and differ from the primarily functional layers of grids (information, sensor, engagement) identified in the DoD grid architectures described in Section 2.3, even though the information grid in these architectures greatly resembles a data/object grid as described below.  Building grids at each of these functional layers would require use of technologies from more than one, and possibly all, of these technical levels.)

General technical issues associated with each of these types of systems are fairly well known (or, in the case of the agent systems, becoming so).  However, we would like to understand something of the extent to which these types of systems can be said to form "true grids" in the sense discussed above, and the sorts of technologies that might be required to form "true grids" from these systems if they can't be considered grids already.  In this section, we briefly look at these important types of computer systems, generally evaluate their "gridness", and identify some types of facilities which, when added to those systems, would cause them to be considered more "grid-like".

We need not say much about grids at the level of computation, since the computational grid is our original, paradigmatic computer-related grid.  Computational grids combine an interconnected network of computers with the necessary control and other technologies necessary to form a "true grid" from these computing resources, and the grid exists to form compositions that are bigger, virtual computers.  The technologies that need to be added to the interconnected computers to form the grid have been introduced in Sections 2.1 and 2.2, and are thoroughly discussed in the cited references.

The dividing line between the technologies needed at this level and at other levels is necessarily fuzzy.  For example, some of the technologies involved at this level are those that provide composition of "computation", not just of "computers", e.g., parallel and distributed programming technologies, such as those provided by PVM.  The need for composition at the level of "computation", not just "computer" (but nevertheless at a fairly low level) is further illustrated by Jini's inclusion of a distributed transaction facility as an integral part of what is essentially a rather basic set of facilities.  Transactions essentially define compositions of computations that are to be considered, from the outside, as single atomic units, and hence help simplify the programming of distributed concurrent computations.

Grid-like systems also exist at the level of data.  By analogy with general grid principles, data-level grids would interconnect pieces of data, and enable the interconnected collection of data to be treated as a unit for various purposes.  An obvious candidate for "gridness" at this level is a database.  A database constitutes a data grid in the loose sense, since it forms an interconnected collection of related pieces of data.  However, a database system can also be thought of as more of a "true grid" by considering the compositional and other technologies typically associated with modern database systems.  For example:

Such facilities enable the database to be treated as a unified whole, which is a key characteristic of a grid.

At the same time, conventional DBMSs are limited in their support to just data, and data of relatively limited types at that (for example, object DBMSs are considered below).  We might expect true "data grids" to provide support for many more data types than current DBMSs.  In addition, DBMSs would more closely resemble "true grids" by incorporating additional self-management and organizing facilities.  For example, an active DBMS that monitored its own content, and could automatically incorporate attached new data sources, would exhibit more "true grid" characteristics than current "static" DBMSs.  Ideally, such capabilities would also be extended to allow the connection of heterogeneous databases to form federations, based on common metadata, ontology, and conceptual schema concepts, much more readily than is now the case.  DBMS functionality could also be distributed into the network so that "the network is the DBMS".  This trend is related to the information mediator architectures of the DARPA I*3 and BADD programs, as well as to information agents [Tho98a (slide 14)].

As noted above, we might expect true "data grids" to provide support for many more data types than current DBMSs do.  The World Wide Web is an example of the variety of data that we would expect to be included in a "data grid".  The Web includes a wide variety of data types, including not only HTML pages, but also files of many types (including various document formats, spreadsheets, etc.).  The Web is in many respects a primitive form of distributed database (using its own particular data representations), similar in many respects to early network databases.  Once a page is posted to a Web server, it potentially (assuming it points to other pages, and other pages point to it) becomes part of an interconnected collection of data whose component pages can be readily and uniformly accessed.  However, the mechanisms needed for unifying this collection into a more coherent whole are at a relatively early stage.  Examples of the additional technology needed to make the Web more of a "true grid" include:

The addition of behavior (code, software) to data moves us into the realm of systems based on objects, e.g., distributed object systems such as CORBA-based systems, or object DBMSs.  In such an object system, the basic "grid" is formed of interconnected objects (interconnected by virtue of the references objects contain to other objects).  These objects are pre-packaged units of data and associated software.  If we consider the relationships between the data that forms an object's state and the code which defines its methods as an additional part of the interconnection that forms the "grid", we can also think of an object grid as an interconnected data and software grid.

The Web can increasingly be thought of as a form of object grid as well [Man98a,b; Man99a], due to:

As with the systems discussed in connection with  "data grids", if we add specific additional capabilities or "services" to the basic network (of objects in this case), the systems become more like "true grids".  For example, an object DBMS provides an integrated collection of query, transaction, and other facilities that enable the collection of objects in an object database to behave in a much more cohesive, "grid-like" fashion.  Similarly, the addition of CORBAservices such as transaction, query, trading (yellow pages), etc. services to the basic CORBA-enabled network of distributed objects moves a CORBA-based system in the direction of becoming more grid-like.

However, it is not enough for such services to be defined;  they must also be implemented, and integrated in a seamless way in a given system, in order for that system to begin to have grid properties.  This is a general issue with distributed object systems today:  while the systems provide the basic distributed object interconnection facilities, the additional services which would allow the objects to be combined and used in flexible ways are generally either not very well developed, or not integrated in a very transparent way either with the objects themselves or with each other.  Ideally, what is desired is a seamless "sea of objects" which eliminates or minimizes distinctions between local, persistent, or distributed objects, and in which services are transparently available.  For example, an object DBMS attempts to both minimize the distinction between transient and persistent objects (including the largely-automatic movement of objects off and onto persistent storage) and seamlessly integrate services that can be used with such objects.  A great deal of additional work must be done when using any of today's distributed object systems (including CORBA) to achieve even this level of seamlessness and integration, let alone transparently supporting such capabilities as load balancing or object replication (although there is an OMG Replication Service RFP to which responses are currently being submitted).  Some of these issues are described further in [Man99b].

Both the sets of higher-level services available with current distributed object systems (CORBA, DCOM, Java, and their developments) and the maturity of these services differ greatly.  Some facilities are rapidly being developed for Java which are becoming more slowly available in CORBA (due in some respects to the need in CORBA to deal with platform and language heterogeneity).  Also, for various technical reasons, some of the techniques used in these distributed object systems do not yet scale well to systems containing many millions of objects (although, in spite of this, such systems can and have been implemented using CORBA-based technologies).

There is also a great deal of work needed on better object composition mechanisms, including improved techniques for forming basic objects from separate pieces of data (state) and code (software), and improved techniques for forming higher level components (or "business objects") or other object aggregations, complete with object interfaces, from collections of individual objects.  Better facilities are also needed in many other areas, including:

In summary, there is much work to be done to raise object systems to the level of "true grids", containing well-integrated services, that provide a virtual, distributed, shared object space, and which transparently handle the load balancing, reliability, and other issues associated with "true grids".  At the same time, of course, these systems are attempting to address very difficult problems, about which there is still much debate.  For example, there is a considerable amount of debate in programming and architectural circles as to the extent to which it is practically possible to achieve transparency when dealing with both local and distributed objects (see, e.g., [WWWK94]).

At the agent level, a considerable amount of additional work also needs to be done, as illustrated by the existence of the CoABS program itself, and work on the CoABS grid described in Section 2.4.  An agent grid exhibits all the general requirements (and associated services and issues) of the other grid levels, but "translated" into the agent level.  For example, load balancing at the agent level involves balancing the loads of agents (and thus requires a way to describe the "load" of an agent, and how to tell if an agent is "overloaded"), and composition must address the requirements of agent composition (e.g., into teams), and agent-level division of labor.  The references cited in Section 2.4 describe some of the many issues connected with the development of the CoABS grid, a particular agent-level grid concept.


3.4  The Need for Unified Grid Architectures

The CoABS grid (at least as described in [CoA98]), in wishing to control non-agent resources (such as computing resources) as well as agents, raises an important additional requirement concerning grids, namely the need to combine grid capabilities at multiple technical levels.  The same requirement is illustrated by the DoD grid architectures described in Section 2.4.  That is, in addition to the need for the individual technical levels described above to become more grid-like, the resulting grids themselves need to be unified.   An agent-level grid supporting this requirement should provide both grid capabilities at the computation and data/object levels in support of agents, as well as grid capabilities at these other levels enabled by agents.  Both these types of support are important in making the maximum use of agent-level capabilities.  For example, agent-level grids (and also object-level grids) can take advantage of the capabilities of underlying computational grids in supporting their load balancing and quality-of-service requirements (particularly where the higher-level grids can interact directly with the lower levels to exert control).  Operational agent grids will also need to interact with data and object systems (which hopefully will become grids at these levels), since much information and software functionality that will need to be accessible to agent grids will continue to exist in these systems.

At the same time, the technical demands of grid concepts at all levels require increasing amounts of "intelligence", collaborative ability, adaptability, component mobility, etc.;  in other words, characteristics frequently associated with agents.  For example [Bra97b] discusses the use of agent technology in simplifying and enhancing distributed computing capabilities, and in particular enhancing intelligent interoperability in such systems.  One such use is the incorporation of agents as resource managers.  He notes:  "A higher level of interoperability would require knowledge of the capabilities of each system, so that secure task planning, resource allocation, execution, monitoring, and possibly, intervention between the systems could take place.  To accomplish this, an intelligent agent could function as a global resource manager."  Further distributing these functions among multiple agents, "A further step toward intelligent interoperability is to embed one or more peer agents within each cooperating system.  Applications request services through these agents at a higher level corresponding more to user intentions than to specific implementations, thus providing a level of encapsulation at the planning level, analogous to the encapsulation provided at the lower level of basic communications protocols."  Agents can also assist in providing better user interfaces for such distributed systems.  As [Bra97b] observes, "In the future, assistant agents at the user interface and resource-managing agents behind the scenes will increasingly pair up to provide an unprecedented level of functionality to people."

[Gen97] also describes the role of agents in enabling interoperability in distributed systems.  In his approach, agents and facilitators are organized into a federated system, in which agents surrender autonomy in exchange for the facilitator's services.  Facilitators coordinate the activities of agents and provide other services such as locating other agents by name (white pages) or by capability (yellow pages), direct communication, content-based routing, message translation, problem decomposition, and monitoring.  On startup, an agent initiates an ACL connection to the local facilitator and provides a description of its capabilities.  It then sends the facilitator requests when it cannot supply its own needs, and is expected to act to the best of its ability to satisfy the facilitator's requests.

The integration of agents with other levels requires the use of object/component technology, together with reflective (self-referencing) capabilities combined with extensive metadata.  For example, [Bra97b] observes:  "A key enabler is the packaging of data and software into components that can provide comprehensive information about themselves at a fine-grain level to the agents that act upon them.  Over time, large undifferentiated data sets will be restructured into smaller elements that are well-described by rich metadata, and complex monolithic applications will be transformed into a dynamic collection of simpler parts with self-describing programming interfaces.  Ultimately, all data will reside in a "knowledge soup", where agents assemble and present small bits of information from a variety of data sources on the fly as appropriate to a given context.  In such an environment, individuals and groups would no longer be forced to manage a passive collection of disparate documents to get something done.  Instead, they would interact with active knowledge media that integrate needed resources and actively collaborate with them on their tasks."  The Web, in its role as the beginnings of a data/object grid, can be said to be moving in this direction now.  This is particularly true when technologies for addressing finer-grained portions of Web documents (e.g., XML, and related technologies) and for attaching behavior to Web data are considered  [Man98a,b; Man99a]. [Bra97b] also identifies the need for such agents systems to be able to interact with both object systems and more conventional software:  "Ideally, each software component would be "agent-enabled", however, for practical reasons components may at times still rely on traditional interapplication communication mechanisms rather than agent-to-agent protocols."

Objects provide a generic modeling or abstraction mechanism for looking at the wide range of resources that need to be included at all levels in such a combined system. An object in this sense is simply an encapsulated unit that has identity, an interface (possibly more than one), and communicates via messages with other objects and the "outside". This use of objects mirrors the use of objects as a general modeling mechanism in the ISO Reference Model of Open Distributed Processing (RM-ODP) [ISO95].  RM-ODP is intended to describe any distributed processing system (including, in some cases, the roles of humans that may be involved in the system), and its use of objects as a modeling abstraction is not meant to imply that the system is actually implemented using object-oriented programming techniques.  However, while object abstractions need not necessarily be implemented using object-oriented programming, the use of these abstractions makes the application of object technologies such as CORBA, Jini, etc. relatively straightforward.

Representing the computational and communication components of a computational grid as objects, as illustrated in the Legion system's reflective capabilities, allows these components to be both uniformly represented within the architecture, and managed in a straightforward way by higher level components.  The approach of representing computer or network components as objects for management purposes is well-known in both network and computer system management technologies.  Data can be represented as objects in a straightforward fashion, by defining object interfaces containing get (read) and set (write) operations.  The World Wide Web Consortium Document Object Model [Woo98] is an example of a set of such interfaces designed to provide object-oriented interfaces to Web data.  Such interfaces provide programs and agents with more uniform access to information represented both as data (e.g., in databases, on file systems, or in the Web) in distributed object systems, and also support the integration of more "intelligence", in the form of behavior, with such data.  Finally, object interfaces can encapsulate "smart things", e.g., more or less smart agents, and human beings.  For example, agents can be modeled as objects (independently of whether they are implemented as objects), in the sense that they are encapsulated things with independent identity, present interfaces to the rest of the world, and communicate to anything outside them via messages sent to interfaces.  Similarly, people can be modeled as objects:  "fmanola@objs.com" is the identifier of an interface to which messages can be sent.  In some cases the messaging protocols between these various kinds of objects will be relatively simple (e.g., conventional object RPC between distributed software objects, or commands sent to hardware), while in other cases they will be more complicated (agent communication language (ACL) sent between agents, or the email flow between people);  however, similar abstraction principles can apply to objects at all levels.

In such an integrated architecture, there is also a need to define additional forms of organization on the available resources in addition to the technical levels already discussed, together with associated metadata.  For example, large scale distributed object systems increasingly are being designed with 3- (or sometimes multi-) tier architectures [MGHH+98]. These architectures involve the division of the system's components (and object definitions) into functional tiers based on the different functional concerns they address. For example, a typical 3-tier architecture has a tier for objects representing user interface elements, a tier for business or application objects, and a tier for database servers. The business object tier separates out the common definitions of enterprise operations and semantics from the more specialized concerns addressed in the other tiers.  Other examples of such organization include the use of Common Schema concepts [Man98c] or enterprise-level ontologies (enterprise-wide agreements on common semantics), and specialized schemas/ontologies for use by specialized user communities (together with mappings to the common definitions where possible).

Semantics-based mappings between the different technical levels in such an architecture are also required.  For example, the ATAIS architecture document [BFHH+98] describes a series of interoperability levels:  isolated, co-habitable, syntactic, semantic, seamless, and adaptive.  The computational grid idea can be characterized as emphasizing high levels of interoperability on this spectrum, but at a low level of abstraction (i.e., in terms of computing resources).  The agent grid often involves a much higher level of abstraction.  Other levels (e.g., data, objects) are, in a sense, in between these extremes.  Raising the level of abstraction complicates providing "gridness" (deep integration) because the requirements on one side, and the available resources/services on the other, are more semantically heterogeneous (unlike, e.g., "memory" and "CPU bandwidth"), and thus both characterizing them, and matching requirements with resources, becomes harder.  An example of this is the complexity of addressing quality-of-service (QoS) issues, which involves defining mappings between "quality" measures at higher levels, and resource allocations at lower levels [Man99b].

Such additional levels of organization provide the basis for more interoperability among the various technical levels of resources contained in the system, and hence help enhance the ability of these resources to operate as a "true grid".


3.5  Conclusions and Recommendations

The concept of a "grid" is a generally-useful idea, but only if it means something more than an ordinary collection of distributed resources.  Ideally, it implies some higher level integration of distributed resources beyond simply connecting them.  Additional work is needed to identify the details of the added functionality required to go beyond simple distributed collections of resources to the formation of "true grids".

The grid concept can be usefully applied at a number of individual technical levels (computation, data, object, agent).  The development of grid concepts at these various technical levels reflects the fact that simple interconnection technologies at these levels are becoming relatively mature (even though there is still much work to do on these technologies).  The emphasis now is on techniques for combining the interconnected resources to solve increasingly complex problems.  In particular, there is emphasis on:

Grids have been defined at the computation level already.  At other levels, more advanced integrating mechanisms have been defined (e.g., federated DBMSs at the data level), but these do not yet approach the level of grids.  To advance the use of the grid concept, the general idea of a grid needs to be applied to each of these levels, in order to identify in detail the specific technologies (some of which may already exist, e.g., active DBMS capabilities at the data level) which would enable the creation of grids from collections of distributed resources at each of these levels.

Grids at these individual levels are useful by themselves, but the maximum advantage comes when these different levels of grid capabilities are combined.  There is a need for additional work to develop a unifying technical grid architecture which incorporates these separate grid levels, and identifies mappings between them.  The requirements of true grids at the higher levels (e.g., agent) probably require grid-like functionality at the lower levels (e.g., data and computation) anyway, hence these mappings are needed to guide the implementation of the higher levels in terms of the lower ones.  These functional mappings are similar to the types of end-to-end mappings being investigated in providing Quality of Service guarantees in distributed systems [Man99b].  Building such a combined grid, for example, would appear to involve all the computational grid issues of system management (and associated metadata), distributed computation and load balancing, mobile code, security, etc., as well as the data-, object-, and agent-level versions of those issues (issues which, e.g., reflect the semantics of the entities on whose behalf the agents are functioning, or the resources which the agents wrap). This requires a way of describing resources and capabilities, and resource requirements and tasks, and a way to map between them, at all these levels. The combined grid also involves the need for a way to define higher-level goals, i.e., a way to define the goals of the grid itself, that are optimized by the load-balancing, etc. that is going on. These goals are presumably at a higher-level than those of individual agents (although these might also be characterized as the goals of higher-level agents, or agents of higher authority, rather than goals of the grid per se).

In such an integrated architecture, in addition to the technical levels already mentioned, there is also a need to define additional forms of organization on the available resources.  These include such things as the use of multiple functional tiers, the use of Common Schema concepts or enterprise-level ontologies (enterprise-wide agreements on common semantics), and specialized schemas/ontologies for use by specialized user communities (together with mappings to the common definitions where possible).  Semantics-based mappings between the different technical levels in such an architecture are also required.  Such additional levels of organization provide the basis for more interoperability among the various technical levels of resources contained in the system, and hence help enhance the ability of these resources to operate as a "true grid"

All this work requires additional analysis not only of technical issues, but also of application requirements.  All sorts of technology already exists that is potentially useful in integrating distributed resources.  However, further detailed understanding of application requirements is needed to drive the detailed selection of the combination of technical features needed to form various types of grids.  Work on the CoABS grid is an example of ongoing work of this type.

The discussion in this report does not replace the need to address the detailed technical issues associated with various grid concepts in the cited references.  However, it does provide a way of thinking about the general ideas which these grid concepts have in common, which hopefully can be helpful in attempts to understand and unify them.


Acknowledgements

I wish to acknowledge the helpful discussions and input of Craig Thompson, Venu Vasudevan, and Paul Pazandak, all of OBJS, and Richard Ivanetich (IDA), for important contributions to the ideas in this paper.


References

[ABIS96]  ABIS Task Force, 1996 Advanced Battlespace Information System (ABIS) Task Force Report, 1996. <http://www.dtic.mil/dstp/96_docs/abis/abis.htm>.

[BFHH+98]  E. Brady, B. Fabian, M. Harrell, F. Hayes-Roth, S. Luce, E. Powell, G. Tarbox, "The Advanced Technology Architecture for Information Superiority", draft 10/16/98.

[Bra97a]  J. M. Bradshaw (ed.), Software Agents, American Assn. for Artificial Intelligence/MIT Press, 1997.

[Bra97b]  J. M. Bradshaw, "An Introduction to Software Agents", in [Bra97a].

[CG98]  A. Cebrowski and J. Garstka, Network-Centric Warfare:  Its Origin and Future, U. S. Naval Institute Proceedings, Vol. 124/11,139, January 1998, 28-35 <http://www.usni.org/Proceedings/Articles98/PROcebrowski.htm>.

[CoA98] DARPA CoABS Read Ahead Package and CoABS Kickoff Meeting, Pittsburgh, July 22-23, 1998.

[DC498]  Directorate for Command, Control, Communications, and Computer Systems, Observations on the Emergence of Network-Centric Warfare, Information Paper, 1998 <http://www.dtic.mil/jcs/j6/education/warfare.html>.

[DDRE98]  Director, Defense Research and Engineering, Joint Warfighting Science and Technology Plan, 1998 <http://www.dtic.mil/dstp/98_docs/jwstp/jwstp.htm>.

[FF97a] G. Fox and W. Furmanski, "Petaops and Exaops: Supercomputing on the Web", IEEE Internet Computing 1(2), March-April 1997.

[FF97b] G. Fox and W. Furmanski, "HPcc as High Performance Commodity Computing", Technical Report, December 1997, http://www.npac.syr.edu/users/gcf/hpdcbook/HPcc.html.

[FF97c] G.Fox and W. Furmanski, "High-Performance Commodity Computing", in [FK99a].

[FK99a] I. Foster and C. Kesselman (eds.). The Grid : Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999. ISBN 1-55860-475-8, Hardcover @ $62.95.

[FK99b] I. Foster and C. Kesselman, "Computational Grids", in [FK99a].

[FK99c]  I. Foster and C. Kesselman, "The Globus Toolkit", in [FK99a].

[Gen97]  M. R. Genesereth, "An Agent-Based Framework for Interoperability", in [Bra97a].

[GLS94]  W. Gropp, E. Lusk, and A. Skjellum, Using MPI:  Portable Parallel Programming with the Message Passing Interface, MIT Press, Cambridge, 1994.

[GFLH98] A. Grimshaw, A. Ferrari, G. Lindahl, and K. Holcomb, "Metasystems", Comm. ACM 41(11), November 1998.

[GG99]  D. Gannon and A. Grimshaw, "Object-Based Approaches", in [FK99a].

[HS98]  N. Huhns and M. Singh (eds.), Readings in Agents, Morgan Kaufmann, 1998.

[ISO95]  ISO/IEC JTC1/SC21/WG7 (1995), Reference Model of Open Distributed Processing <http://www.iso.ch:8000/RM-ODP/> (see also <http://www-cs.open.ac.uk/~m_newton/odissey/RMODP.html> and <http://www.dstc.edu.au/AU/research_news/odp/ref_model/ref_model.html>).

[J695] Joint Staff (J6), Joint Pub 6.0:  Doctrine for C4 Systems Support to Joint Operations, 30 May 1995 <http://www.dtic.mil/doctrine/jel/new_pubs/jp6_0.pdf>.

[Ket98] B. Kettler, DARPA CoABS Program:  Use Cases for a Prototype Grid, draft 3.1, 12/15/98, Brian Kettler, ISX Corporation <http://coabs.globalinfotek.com/grid.htm> (password protected).

[KT98] N. Karnik and A. Tripathi, "Design Issues in Mobile-Agent Programming Systems", IEEE Concurrency 5(3), July-September 1998.

[Man98a]  F. Manola, Towards a Web Object Model, Technical Report, Object Services and Consulting, Inc., <http://www.objs.com/OSA/wom.htm>, 1998.

[Man98b]  F. Manola, Some Web Object Model Construction Technologies, Technical Report, Object Services and Consulting, Inc., <http://www.objs.com/OSA/wom-II.htm>, 1998.

[Man98c]  F. Manola, Flexible Common Schema Study, Technical Report, Object Services and Consulting, Inc., December, 1998 <http://www.objs.com/aits/9811-common-schema-report.htm>.

[Man99a]  F. Manola, "Technologies for a Web Object Model", IEEE Internet Computing, 3(1), January/February, 1999.

[Man99b]  F. Manola, Providing Systemic Properties (Ilities) and Quality of Service in Component-Based Systems, Technical Report, Object Services and Consulting, Inc., January 1999 <http://www.objs.com/aits/9901-iquos.html>.

[MGHH+98] F. Manola, et.al., "Supporting Cooperation in Enterprise-Scale Distributed Object Systems", in M. P. Papazoglou and G. Schlageter (eds.), Cooperative Information Systems: Trends and Directions, Academic Press, 1998.

[Paz98a]  P. Pazandak, Best of Class Agent System Features, <http://www.objs.com/agility/tech-reports/9809-best-of-class-capabilities.htm>, 1998.

[Paz98b]  P. Pazandak, Next Generation Agent Systems & the CoABS Grid, draft Technical Report, <http://www.objs.com/agility/tech-reports/9810-NGAS.htm>, 1998.

[Pis98a]  A. Piszcz, "Background on Agents for DARPA's NGII Architecture", Mitre Techical Report MTR 98W0000085, August 1998.

[Pis98b]  A. Piszcz, Grid Metaservice Considerations for Control of Agent Based Systems, draft, 3 September, 1998.

[Sho93]  Y. Shoham, "Agent-Oriented Programming", Artificial Intelligence 60(1), 51-92.

[Tho98a]  C. Thompson, Strawman Agent Reference Architecture, slide presentation, <http://www.objs.com/agility/tech-reports/9808-agent-ref-arch-draft3.ppt>, 1998.

[Tho98b]  C. Thompson, Characterizing the Agent Grid, Technical Report, Object Services and Consulting, Inc., 1998 <http://www.objs.com/agility/tech-reports/9812-grid.html>.

[VV95]  W. Van de Velde, "Cognitive Architectures--From Knowledge Level to Structured Coupling", in L. Steels (ed.), The Biology and Technology of Intelligent Autonomous Agents, Springer Verlag, Berlin, 1995.

[Woo98]  L. Wood, et al., Document Object Model (DOM) Level 1 Specification, W3C Recommendation, World Wide Web Consortium, <http://www.w3.org/TR/REC-DOM-Level-1/>, 1998.

[WWWK94]  J. Waldo, G. Wyant, A. Wollrath, and S. Kendall, A Note on Distributed Computing, SMLI TR-94-29, Sun Microsystems Laboratories, Inc., November 1994 <http://www.smli.com/techrep/1994/abstract-29.html>.
 


This report was prepared by Object Services and Consulting, Inc. (OBJS) under subcontract to the Institute for Defense Analyses (IDA) on its Task A-209, Advanced Information Technology Services Architecture, under contract DASW01 94 C 0054 for the Defense Advanced Research Projects Agency. Publication of this document does not indicate endorsement by the Department of Defense, nor should the contents be construed as reflecting the official position of that agency.

© Copyright 1998, 1999 Object Services and Consulting, Inc. (OBJS)
© Copyright 1998, 1999 Institute for Defense Analyses (IDA)

Permission is granted to copy this document provided this copyright statement is retained in all copies.

Disclaimer: Neither OBJS nor IDA warrant the accuracy or completeness of the information in this report.