Composing Active Proxies to Extend the Web

Rohit Khare, rohit@uci.edu, University of California at Irvine
Adam Rifkin, adam@cs.caltech.edu, California Institute of Technology

Introduction

In considering the future of "compositional software architectures" as rendered through distributed object systems and on the World Wide Web, it is useful to set aside the hype of new technologies and consider what is already being accompished with existing infrastructures. On the Web, users and developers have already adopted two powerful ways to compose active processing with information distribution: active pages ("cgi-bin") and active proxies. In this position paper, we focus on the latter as a tool for parties beyond the original developer to externalize extensions to a software or information architecture.

Our Position

Independent extensibility is a critical affordance of compositional software architectures. To realize the full potential of concurrent evolution of systems by all the system's stakeholders, architects should be encouraged to support externalized, component-oriented hooks. In particular, active proxies on the Web demonstrate the power of independent evolution and the serendipitous synergy of orthogonal services. Soon, HTTP in conjunction with PEP will systematize this power and bring it to clients and servers as well.

Examples of Active Proxies

When fetching a resource through the HyperText Transport Protocol, clients can contact the origin server or an intermediate server that will fetch it on their behalf. One of the most familiar uses of proxied HTTP is caching: many users behind a single caching proxy can benefit from a local copy of their most frequently accessed resources. The caching proxy operates on behalf of users to maintain up-to-date copies from the origin server. Sometimes, it also acts on behalf of the publisher to collect and report usage statistics for its cached resources; the entire cache may also be filled on behalf the publisher (a "mirror" proxy).

There are many other species of proxies, though. Caches merely relay the original resource; active proxies have free reign to extend and emend these resources. Consider these applications:

Crit-Link Mediator
 Any user can fetch any page on the Web through the crit.org proxy; it's returned with a Crit-Link banner across the top and the collected comments of previous users about that page at the bottom. It's a public annotation service that moves the Web closer to its original vision of a multiparty conversation.

This service was developed by Ka-Ping Yee, a student at Waterloo, based on design work by Eric Drexler at the Foresight Institute -- the ideas can be traced back to Ted Nelson's Xanadu. Alexa is a similar community annotation tool that collects feedback from users about document quality (and also acts as a cache -- in this case, from the multiterabyte Internet Archive of extinct pages).

Lucent Personalized Web Assistant
 Lucent's tool systematically replaces one's real identity with a pseudonyumous one on the Net. Using HTTP security features, users log onto the proxy, which then allows users to 'register' at sites using escape codes in fill-in forms (\u for username, \p for a site-specific password, etc).

 

This service was developed by several Bell Labs cryptography and Web protocol researchers. Two related services are the Anonymizer, which intercepts cookies, Java, JavaScript, and strips out Referer: and User-Agent: request headers; and NoShit, which strips out graphics characteristic of Web advertising. Conversely, some advertisers weave ads onto public pages and track end-users with the same technology.

 

Format Converters
Numerous services exist to transform multimedia formats. Low-bandwidth devices or low-resolution devices like wireless Web palmtops will access proxies that reduce color graphics to black-and-white thumbnails on the server side. Another service of Yee's, a Medium-Independent Notation for Structured Expressions (MINSE), allows authors to embed mathematical expressions and the like which are automatically compiled to ASCII layouts or embedded graphics for different clients.

 

Content Translators
Natural-language understanding has progressed to the point of assisting professional translators by producing a first pass. Several companies demonstrate their technology by offering to translate pages as a proxy service.
Remote Processing
The best defense against rogue mobile code is isolation. At least one vendor offers to quarantine Java applets by running them on the proxy and only sending the display output inward to the end-user.
Content Filtering
Many proponents of content selection strategies -- whether for child-protection, political censorship, or enforcing organizational security directives -- posit centralized filtering by proxies. Content labels, digital signatures, and other assertions are inspected, migrating policy enforcement upstream.

 

Protocol Gateways
HTTP proxies can ease migration between editions of HTTP. In the future, developers expect to multiplex multiple HTTP streams through HTTP-NG ("next generation") adaptors. Security and authentication protocols for Web sites are also found centralized in "Web firewalls". HTTP proxies also stand in for other URL schemes: FTP, Gopher, Wais, and Ph protocols are commonly available.
This sampler focuses primarily on extending the Web as an information space. The same approaches apply to the Web as an active software system, though. Imagine a proxy that extracts the user interface from a dozen package-tracking services and presents a single meta-interface for any shipper? WebMethods' Web Interface Defintion Language points the way to composing Web transactions. Imagine a data-logging proxy that filters the event stream with pluggable strategies. UC Irvine's Expectation-Driven Event Monitoring dynamically composes filters fetched from the Web. Imagine extending a flight-reservation systems' command interface to interpose a graphical map. Apple Computer's Web Objects demonstrates an airline UI prototype.

The challenge is that only first-party stakeholders can extend their software architectures in these ways today. The information-oriented extensions listed above were not; they were developed by outsiders and deployed by outsiders in service of outsiders. Active proxies hold the promise that soon, outsiders will write new health-plan comparators and mutual-fund trading interfaces and ...

Composing Active Proxies

Want to annotate a Japanese page without advertisements from a HTTP-NG server? Want to book a plane ticket and a hotel room in a single transaction? Active proxies can be neatly reused as black-box components when chained together via HTTP. However, we can envision neater, more efficient ways to enable reuse. The HTTP Protocol Extension Protocol (PEP) transcends the welter of competing APIs to offer a single syntax for naming, specializing, and applying active proxies with finer-grained control. PEP also affords reasoning about compatible extensions and composite extensions.

We are already familiar with many analogues to active proxies as reusable filters. The difference is in the the affordances of the interchange format. UNIX filters operate on ASCII streams; SQL queries operate on relational tables; active proxies and pages operate on Web hypermedia (HTML/XML + HTTP).

The affordances for composition are also similar: manual, sequential composition only. The specification clearly allows HTTP proxy chains to apply several transformations along the way, but in practice none of the services sampled above allows for onward chaining (they just fetch the actual content from the origin server, rather than branching to yet another proxy server specified by the end-user). The downside of packaging these extensions as a proxy is the assumption that all users and all destinations are treated the same -- that is, in applying the same function to all inputs and outputs.

This is the universe PEP was designed for -- each PEP module has the same executable power as a proxy, but can be selectively applied to portions of Web space, on behalf of certain users, with known urgency (required or optional), in concert with other extensions (or exclusively of conflicting ones), in sequence or in parallel, on selected hops of the HTTP proxy chain. Most significantly, since PEP modules are identified by the URI of the protocol they implement, PEP-aware Web tools can negotiate common sets of compatible modules and settings.

PEP enshrines a philosophy of decentralization. Anyone can publish an extension by maintaining a Web page describing it. Any such module has as much expressive power to rewrite its input as an active page or active proxy. Any resource can be bound to require an extension ("those .quicken files require an http://pep.w3.org/SEA/Encryption/-compatible filter"). Any extension can express its own policy (hop-by-hop or end-to-end; requisite and incompatible co-extensions).

Its designers developed applications for content-filtering, electronic-payment selection, and a modular security architecture -- all of which could be composed to, say, purchase encrypted PICS labels. Many of these applications are actually more powerful than active proxies, since PEP allows their functions to be moved into the origin client and server; security decisions can be made at the desktop rather than the firewall.

These benefits are not free, though. Selectively applying active proxy extensions requires more logic in the server to select compatible PEP modules and enforce users' and publishers' policies. In truth, it has been easier for extenders to deploy active pages and active proxies than to design for the future. PEP is still on the IETF standards track two years after its debut. Decentralized extensibility is a tough sell, but we believe it is essential.

Internal extensibility exists and is well supported *within* the black box -- OOA&D and software architectures research gives us reason to hope. Outside the box, though, the rise of open information systems on the geodesic network of the Web heralds a political shift in the constituency for extensiblity: there are many, many more actors with an interest in the extensiblity of your architectures!

References

For more information about the ideas and systems we have discussed in this document...
  1. PEP Working Draft: HTTP Protocol Extension Protocol. Henrik Frystyk Nielsen, Rohit Khare, and Dan Connolly.
  2. Crit-Link Design Paper: Crit-Link mediator, a proxy for annotating pages -- PUBLIC annotation of pages.
  3. LPWA Design: The Lucent Personalized Web Assistant
  4. Paper: Application-Specific Proxy Servers as HTTP Stream Transducers. World Wide Web Journal, Winter 1996 (v1n1) (proceedings of WWW4). Charles Brooks, Murray S. Mazer, Scott Meeks, and Jim Miller. (html)
  5. Paper: Ubiquitous Advertising on the WWW: Merging Advertising at the Browser. World Wide Web Journal, Summer 1996 (v1n3) (proceedings of Workshop on Web Demographics). Youji Kohda, Susumu Endo. (html)
  6. World Wide Web Proxies . Luotonen, A., and Altis, K.
  7. Paper: Weaving a Web of Trust. World Wide Web Journal, Summer 1997 (v2n3). Rohit Khare and Adam Rifkin. (html)
  8. Paper: Capturing the State of Distributed Systems with XML. World Wide Web Journal, Fall 1997 (v2n4). Rohit Khare and Adam Rifkin (html)
  9. Article: Product Evaluation of WebObjects, Byte July 1997. Rohit Khare. (html)
  10. Discussion Archive: FoRK Mailing List.
Thanks to Jim Whitehead for reviewing a draft of this position paper.