Garbage Collection for Distributed Persistent Objects

Sung-Wook Ryu, B. Clifford Neuman
ryu@isi.edu , bcn@isi.edu

University of Southern California/Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292-6695

1. Introduction

Distributed object technology provides efficient and safe distribution and reuse of software components in the network. Applications are composed of independent components which are distributed over the network and can communicate with other components. Due to the technology, developers can create reliable, robust, secure, and maintainable distributed applications easier and faster. Distributed object technology enables components of software to plug-and-play, interoperate across the network, and coexist with legacy applications.

A distributed object consists of data and methods along with interfaces through which clients invoke the methods of the object. When a client requests a service to an object, the object is created and executed on its server process. The object replies to the client with results after processing and performing the request. The object is reclaimed as garbage when it is not reachable from any clients or root objects. For this procedure, called garbage collection, CORBA [CORBA-A, 1997; CORBA-S, 1997] uses reference counting, DCOM [Chappell, 1996] uses reference counting and pinging, and Java RMI uses a lease.

The persistent object service enables objects to remain long after its server process terminates. To be persistent, the state of an object should be stored in persistent storage such as disks so that the object can be activated later with the stored previous state. The state of an object is stored and loaded either transparently or explicitly: clients can make their objects persistent explicitly to reuse them later; or objects are stored and loaded transparently by the activation and de-activation mechanism. The persistent object service and the activation mechanism can be used to conserve system resources [Wollrath et al., 1995]. Since a system may contain a considerable number of objects, it is unreasonable that objects remain active and consume valuable system resources although no requests are made on them for a long time. When persistent objects are not reachable from any clients or roots, they should be reclaimed as garbage.

In this position paper, we discuss the issues of garbage collection for distributed persistent objects, and present a practical and efficient garbage collection mechanism.

2. The issues of garbage collection for distributed persistent objects

Garbage collection is a critical problem for administratively decentralized large scale distributed systems which manage distributed persistent objects. This process is important because manual garbage collection is error-prone and it is difficult for clients to maintain the information about references correctly. Many solutions for distributed garbage collection have been suggested. However, they are not suitable for large scale distributed systems because their target systems are small in scale and they cannot collect cyclic garbage.

In the ideal distributed system, objects continue to exist as long as they are reachable from clients or root objects and should be reclaimed when unreachable. In practice, this is difficult to support in administratively decentralized large scale distributed systems because of the following reasons:

Distributed objects and references are dynamically created, deleted, migrated, and shared across the network. Therefore, it is difficult to determine when an object is not reachable, and whether it is safe to reclaim it.
Distributed systems are administratively decentralized, so clients' well-behaving cooperation should not be required.
Distributed systems are very large in scale, so it is impossible to get a global view of clients, objects and their references.
Servers and clients can crash during garbage collection related operations.
Messages can be lost, and the network can be partitioned for a while.

Since persistent objects may remain after their clients and servers terminate, garbage objects might stay in persistent storage forever unless garbage collection is performed. When garbage collection is performed by clients manually only, the same problem happens because manual garbage collection is error-prone and clients' well-behaving cooperation should not be required. If some clients terminate without deleting their references, storage is wasted, which even causes system crash.

CORBA supports the persistent object service and the activation mechanism. When active objects have not received any requests for a long time (e.g., 15 minutes in NEO), they are de-activated and stored on persistent storage. They are activated again when they are accessed. This process is performed transparently. Clients can store the state of their objects explicitly as well. The persistent objects exist until they are explicitly deleted by clients, which means that garbage objects should be reclaimed by clients manually. DCOM also supports the persistent object service. Objects can be stored into a Running Object Table(ROT), and they can be activated later. DCOM does not support a garbage collection mechanisms for the ROT, so the client which created objects should come back and delete the objects. In [Pjama], root objects are created by clients, and the root objects and objects reachable from the root objects are assumed to be alive. Garbage objects which are unreachable from the root objects are reclaimed by the external garbage collector, opjgc.

The current garbage collection mechanisms for active objects in memory are not suitable for persistent objects. Reference counting is simple and scalable, but 1) it cannot collect cyclic garbage objects, and 2) it is impossible to maintain the referencing link counts correctly. Since clients might terminate without reducing their reference counts in a loosely synchronized distributed system, the reference count of an object can be greater than zero, even though no other objects are referencing the object. Both pinging and lease are based on [Birrell et al., 1993] but they cannot collect cyclic garbage. That is, garbage objects in a same cycle will refresh each other, and they never expire. Since persistent objects may remain long after its server process terminates, all garbage including cyclic garbage should be reclaimed completely. Therefore, we need a new garbage collection mechanism which is efficient, scalable and complete.

3. Our approach

We have implemented a garbage collection mechanism for the Prospero directory service which maintains distributed persistent objects [Neuman, 1992]. In our system model, all distributed objects are equal, and there are no special root objects. Objects which have a long lifetime (Time-to-Live) effectively serve as root objects. Objects reachable from clients are assumed to be alive, and unreachable objects are assumed garbage. Each object may have a different reclamation method based on its characteristics. For example, temporary objects are removed directly once they become garbage. However, some important objects such as bank accounts are moved to dormant account repository and notify the customers. Our mechanism consists of two major steps: 1) acyclic garbage is collected by timeouts; and 2) cyclic garbage is collected by last referenceable timestamp propagation and backward inquiry. We assume that all clocks in the network are synchronized to some degree, and crashed networks and servers recover within a finite period of time.

3.1 Timeouts

Information about references are maintained by timeouts which is the same concept as that of RMI's lease. When an object is created by a client, the client assigns a TTL (Time-To-Live) to the object. The new object also gets an expiration time which is the creation time plus the TTL. The TTL is the period for which the object is guaranteed to exist after being refreshed (receiving a refresh message). When a link is made to the object, the TTL is added to the current time, and the resulting expiration time is stored in both the object and the link. Since each link contains the expiration of the target object, clients can send a refresh message before the expiration time. To guarantee that a link will continue to work (the target object remains valid), it must be refreshed before its expiration.

Two optimization techniques are introduced in order to reduce the communication overhead incurred by the refreshing process. First, refresh messages are gathered and sent on a host-to-host basis rather than a client-to-object basis. Second, refreshing frequency is controlled by the likelihood of becoming garbage. [Lieberman & Hewitt, 1983] showed that the likelihood of becoming garbage for a newly created object is higher than that of an old one. Therefore, garbage collection should be performed more frequently for newly created objects and less for old objects. To controlled the frequency of refreshing, the TTL of an object starts from a small number and is increased as the object is accessed.

3.2 Last referenceable timestamp propagation and backward inquiry

Because timeouts do not collect cyclic garbage, last referenceable timestamp propagation and backward inquiry are introduced. In practice, distributed object spaces do not have special root objects which are always alive. For that reason, our garbage collection mechanism uses reachability from clients rather than from roots. Cyclic garbage is reclaimed by two steps: last referenceable timestamp propagation which detects local garbage; and backward inquiry which performs partial synchronization and confirms garbage.

Each object maintains an LRTS (Last Referenceable TimeStamp) which indicates the most recent time when the object was accessible (i.e., reachable) by clients. When an object is accessed by a client, the LRTS of the object is set to the accessed time, and the LRTS (the accessed time) is propagated to all reachable objects recursively when links are refreshed. All reachable objects from the accessed object will get new LRTSs eventually. The LRTS of inaccessible distributed cyclic garbage, however, will stabilize at a value that is less than or equal to the time at which the cycle became inaccessible. Objects whose LRTSs are not more recent than a local threshold are local garbage. Since each system selects its own local threshold, each has a different threshold. Moreover, it is possible that LRTSs might not arrive at target objects on time because they are sent only during the link refreshing process. Therefore, before local garbage is reclaimed, it should be confirmed to be garbage.

The basic idea of backward inquiry is that, before a local garbage object is reclaimed, all referencing objects are examined to see whether they are garbage or not. If all referencing objects are garbage, then the object is also garbage and can be reclaimed. The best way to check referencing objects is to ask them directly to examine themselves whether they are garbage or not, and to reply with results. Since referencing objects refresh their target objects before expiration times, objects can receive refresh messages from all referencing objects. This means that objects can receive information about all referencing objects (i.e., an implicit reference list) within their TTL times, even though they do not have an explicit reference list.

Since messages necessary for cyclic garbage collection are bundled with the messages used for timeouts, the overhead of cyclic garbage collection is minimized.

3.3 Evaluation and implication

Our garbage collection mechanism satisfies the following properties:

The mechanism scales well in terms of the number of distributed objects due to the two optimization techniques of the refreshing mechanism.
No extra messages are necessary for cyclic garbage collection because messages needed for cyclic garbage collection are bundled with refresh messages.
The latency time to collect garbage increases in proportion to the length of the garbage chain, which is scalable.
All garbage including distributed cyclic garbage is reclaimed.
Our mechanism does not require global synchronization of all distributed systems.

We have a plan to apply our garbage collection mechanism on practical distributed persistent object systems such as CORBA, DCOM and RMI. We expect that our garbage collection mechanism will work well for distributed persistent object systems.

References

[Birrell et al., 1993] Andrew Birrel, David Evers, Greg Nelson, Susan Owicki and Edward Wobber, Distributed garbage collection for network objects, Technical Report 116, Digital Equipment Cooperation Systems Research Center, December 15, 1993.

[Chappell, 1996] David Chappell, Understanding ActiveX and OLE, Microsoft Press, 1996

[CORBA-A, 1997] OMG, The Common Object Request Broker: Architecture and Specification Revision 2.1, August 1997.

[CORBA-S, 1997] OMG, CORBAservices: Common Object Services Specification, July 1997.

[Lieberman & Hewitt, 1983] H. Lieberman and C. Hewitt, A real-time garbage collector based on the lifetimes of objects, Commun. ACM, vol. 26, June 1983, pages 419-429.

[Neuman, 1992] B. Clifford Neuman, The virtual system model: A scalable approach to organizing large systems, Ph.D. dissertation, University of Washington, 1992.

[RMI, 1997] Sun Microsystems, Remote Method Invocation Specification, 1997.

[Spence & Atkinson, 1997] Susan Spence and Malcolm Atkinson, A Scalable Model of Distribution Promoting Autonomy of and Cooperation between PJava Object Stores, Proceedings of the Hawaii International Conference on System Sciences, January 1997, Hawaii, USA.

[Pjama] The Forest Project, PJama: Orthogonal Persistence for Java , http://www.sunlabs.com/research/forest.

[Wollrath et al., 1995] Ann Wollrath, Geoff Wyant, and Jim Waldo, Simple Activation for Distributed Objects, USENIX Conference on Object-Oriented Technologies (COOTS) June 26-29, 1995 Monterey, California.