This document was originally written in January '96 as an internal
OBJS investigation into the feasibility of using current video
conferencing technology as a solution to the reduction in one-to-one
and multi-party face-to-face interaction inherent in the Virtual
Office concept that was later adopted for OBJS operation. When
we started to assemble the Internet Tool Survey, we decided this
study was apropos to the Groupware survey and decided to include
it. For that reason it differs considerably in form and focus
from the other surveys.
In analyzing the deficiencies of our current distributed office environments in meeting our communications needs, I believe that the current combination of telephone, Internet connection, email, and web browser, meets most of our needs. The single facility provided by a central office that is not currently met is face-to-face meetings.
It has been hypothesized that audio conferencing can adequately
met this need, but sometimes audio is not enough. Sometimes you
really need visual aids, like a white board, to get your point
across. Sometimes it just helps to see facial expressions. Audio
conferencing can be confusing without visual cues as to who is
speaking and who wants to speak, and what the non-verbal reactions
of participants are to the speaker's comments. System support
for audio-conferencing might help, but the vanilla variety is
lacking. Brainstorming meetings are the example most often cited
as requiring face-to-face interaction. The observation has been
made that we don't touch, smell, or taste each other during such
meetings, we only hear and see, so audio/video conferencing (generally
just referred to as video conferencing) at least has the potential
of satisfying those needs.
In general, I've concluded that a combination of audio-, video-,
and data-conferencing is a feasible alternative, in terms of cost,
quality, and effectiveness. I don't feel the technology yet provides
an ideal solution in any of these areas, but I believe it provides
an adequate one. I'm not yet sure exactly what technology I recommend,
but I'm confident that one exists that meets our needs and is
affordable. If we are careful in our selection of technology,
we should be able to adopt improved technology as it emerges.
There are two classes of systems that are appropriate for us to consider: systems that run over LANs and WANs using TCP/IP, and systems that run directly over communications lines using the H.320 and T.120 standards for video conferencing and data-conferencing.
H.320 is a family of video conferencing standards adopted by the ITU-T (the United Nations International Telecommunications Union) that run over a variety of communications lines (T1, F-T1, ISDN BRI or PRI, Switched 56). There are a variety of mostly-commercial video conferencing systems which conform to these standards. Many of these systems are expensive high-end standalone systems meant for point-to-point video conferencing between groups, or remote lecture/classroom formats. Lately, however, these systems have begun to be delivered on PCs for Personal Video conferencing users, with one or more custom boards for video conversion and compression and decompression, a camera, and an ISDN BRI connection. The market leaders are PictureTel and Intel, whose products cost from $2K -$5K. In general, these systems are relatively expensive to purchase, requiring special-purpose hardware, and expensive to operate remotely, with ISDN long-distance charges being roughly double that of regular long-distance charges. To support multi-point conferences, either a Multi-point Conferencing Unit must be purchased (at about $5K/participant) or multi-point bridging charges paid (at about $60/hour/participant).
We saw the PictureTel Live PCS 50 and PCS 100 demonstrated over ISDN BRI (128000 bps) at PictureTel's office at the Dallas Infomart. The audio was great. The video was not perfect, but well above adequate with a good quality 352 x 288 pixel color image that only became unacceptably jerky when there was a lot of motion in the image. Data-conferencing software, including a virtual whiteboard and application sharing was included in the package.
ProShare Video System 200 is somewhat less expensive than PictureTel products, as little as $1K when purchased with local and long distance ISDN service. It also will run over a LAN (PictureTel said it product will soon) but its 200kbps bandwidth requirement puts it out of reach for ISDN-connected Internet use. For the most part, the hardware in this product probably has some hope for being reusable in other video conferencing products in the future, since Intel sells the video board separately (CU-SeeMe doesn't work with it, however). I doubt the same can be said for PictureTel's hardware, but I don't really know.
In general, the TCP/IP-based solutions have bandwidth problems, at least when employed over the Internet. TCP/IP has been extended with protocols to support multi-cast and real-time. The former allows multiple nodes to receive packets from one source with pretty much the same overhead as a single node, which reduces the overhead for multi-person video conferencing. The latter guarantees the timeliness of packet delivery, which is desirable for real-time video and audio delivery. MBone, discussed later, is an experimental technology base which uses these new protocols. Without these new protocols, the delivery of real-time audio and video is problematic, with delivery to multiple users requiring network bandwidth proportional to the number of recipients, and with the quality of the delivered video and audio degraded by the various loads on the various networks and routers connecting the transmitter with the receiver.
I tried out VDOLive, a point-to-point delivery product for pre-recorded audio/video that is heavily promoted on the Internet. VDOLive runs over the Internet without the multi-cast and real-time TCP/IP extensions. I found that it wasn't really satisfactory, at least with my 128Kbps ISDN connection. While the audio generally was acceptable, with only occasional gaps, video dropouts ranged up to 75%, with frequent periods where no change to the image was discernible, even though it supported only a tiny inch square image. This isn't sufficient for video conferencing. You really couldn't infer much at all from the video image about facial expression, or any other visual cues to augment the audio signal. You might as well have just stuck with the audio, and this wasn't even a real-time video source.
VDOLive uses proprietary compression and communications schemes that are produced by its server, which they sell, and is how they make their money. The client is distributed free. In spite of the use of the word "live" in their name, VDOLive doesn't really support the delivery of live video, although I guess they could claim live delivery of video, since they do deliver the video, and decompress it, incrementally, slightly in advance of the client's consumption of it. They maintain memory buffers of video to smooth out delivery, which has a definite advantage over other video displayers, which must copy entire video files, possibly storing them to disk, before displaying them, which could make absolutely impossible the viewing of a large video on a modest platform.
The metaphor VDOLive supports is video production/distribution. In general, video compression algorithms for canned video delivery (e.g. MPEG) are optimized for decompression, not compression. The assumption is that compression is done once, in a studio on an expensive platform, but decompression is done millions of times on inexpensive platforms (e.g. set top boxes), so any speed up in decompression is worth a slow down in compression. This doesn't make sense for video conferencing, where compression must also be taking place in real time, although it is feasible for one-to-many video conferences (e.g. lecture), in cases where there is a very capable machine at the point of production. It especially doesn't make sense for the kind of video conferencing we envision, with multiple participants comparably-equipped with PCs or workstations alternating as producers and consumers. In addition to the compression issue, the one-to-many distribution also places undue demands on the producer machine. Without multi-cast, the server must send individually-addressed packets to each consumer, thereby limiting the number of consumers supportable by a single producer.
Of course, this has an even greater impact in our situation, where anyone can be a producer, and everyone is likely to have, at best, at least in the short term, an ISDN line, and either a high-end PC or a modest SparcStation. As I stated earlier, VDOLive didn't work adequately over ISDN for just one consumer. Obviously, it couldn't handle more. (Of course, this doesn't mean that VDOLive could never become an adequate video conferencing platform (i.e. with multi-cast and a different compression algorithm), just that it isn't today.)
CU-SeeMe is a free tcp/ip video conferencing system developed at Cornell for the Macintosh. It now runs on PCs under Windows 3.1 and Windows 95, but not, to my knowledge, under Windows NT. To support multi-point video conferencing (up to eight participants viewable simultaneously) without multi-cast, CU-SeeMe uses "reflectors", server programs which run under Unix that provide single points of contact for all participants in the conference. Every participant is only connected to the reflector, so individual participants only send and receive video to and from one site via uni-cast packets. The reflector retransmits the incoming audio and video only to those other sites that have requested it, thereby limiting the total bandwidth used. Still, the reflector needs to be a pretty capable machine with substantial network bandwidth available to it, depending on the number of simultaneous clients it needs to support. The latest version of the reflector supports multi-cast, further reducing retransmission costs, but it still must receive video from many sources independently. The clients do not multicast. There are other systems comparable to CU-SeeMe that run on PCs, but CU-SeeMe seems to be the one in widest use (maybe 100,000 users at this time). White Pine Software has licensed CU-SeeMe and is commercializing it.
I briefly experimented with CU-SeeMe and the QuickCam on the Toshiba (75MHz Pentium) laptop running Windows 95. I connected to a public reflector. I can't remember which, and never knew how capable a machine it was or what kind of connection it had, or what kind of connection the other participants had. Mine was 128K ISDN. While the application was pretty interesting, the performance was totally unacceptable. Even it I only accepted video from one site, I only got a frame every ten seconds or so, at best. Most of the time I'd be lucky to get an entire frame from a site. Also, my audio didn't seem to work. The overall effect was like a slide show, every now and then you'd get a new still image of someone sitting at their desk, but you could never consider this remotely close to acceptable for video conferencing. Of course, this was a one-time totally-uncontrolled experiment. Once others in OBJS have ISDN, we'll have to run a more controlled experiment. In spite of the failure of this experiment, I still think its possible this could be a workable minimal solution, given my more positive experience with the QuickCam on the local machine (see below).
Connectix's VideoPhone is CU-SeeMe's direct competitor, they even compare themselves to CU-SeeMe. VideoPhone, the software, sells for about $60. It comes bundled with Connectix's QuickCam, described below, for about $160. VideoPhone touts their support of color (although the QuickCam is b/w), and their support for standard Video for Windows capture cards (including Intel's Smart Video Recorder Pro, which CU-SeeMe doesn't yet support, and is bundled, I think, with Intel's ProShare 200)(Note: a capture card is unnecessary with QuickCam's digital video source, however), full duplex audio, more comprehensive dataconferencing suite, support for IP Multicast, support for any Windows WAV sound device (CU-SeeMe supports only built-in mics), and support for any Video for Windows codec (CU-SeeMe has a its own software codec). This is unsubstantiated by me. I've only read about this product. But, given its price, it is worth considering, and I plan to try it.
Under Unix, there are a number of pre-commercial video conferencing systems (e.g. NetVideo (nv), Video conference (vic), INRIA Video conferencing System (ivs)) that rely on the MBone (Multicast Backbone On the Internet). MBone itself is not a product. Its a protocol (e.g. multicast) that runs over TCP/IP, and an Internet infrastructure which supports the protocol. Adding multicast to TCP/IP enables delivery of data to multiple sites with essentially the same overhead as delivery to one site, fixing the packet explosion problem, enabling switchable one-to-many video conferencing. Using RTP, the Real Time Protocol, at least makes the bandwidth problem predictable and manageable, as can be observed by participating in an MBone video conference, but, at least at present, it does not satisfy all of our requirements, at least over the Internet. The first reason is that there is currently a 500Kbps ceiling limiting total MBone traffic through Internet routers. Since current video conferencing technology is not very effective below 100Kbps, that means only 5 or so video conferences can be scheduled simultaneously. That doesn't appear to leave room for frequent use of the Internet for private, ad hoc, video conferences. At present, MBone is used over the Internet only for large scheduled public events, like IETF meetings. There is a proposal to raise that limit, but it is unlikely that the increase will enable the kind of use we require, at least in the near term, at least until most of the Internet is running over ATM. The second reason is that even with multicast, multipoint conferences of the sort we talk about, where everyone is a full participant, both a transmitter and a receiver, cannot be comfortably supported given ISDN bandwidth between the workstation and the Internet. That's because multicast reduces the number of outgoing video streams to one, but it does nothing to reduce the number of incoming video streams, which remain proportional to the number of participants. As mentioned above, CU-SeeMe reflectors reduce incoming video somewhat by not delivering it to participants who don't want it, but that doesn't help at all in our scenario. Because ISDN BRI is 128000 bps, and a good quality video conference requires close to that, there is no way to support all these additional incoming video streams in real time. The current solution is that only the current speaker's video is sent, or, better, that only the speaker's video is updated frequently. This is not ideal. The solution might be a "smart reflector" of some sort, that can merge all incoming video streams into one efficient composite stream and multicast that to everyone. PictureTel has a product coming out this spring that might address this need, but I haven't heard enough details to be sure. Nevertheless, a reflector, or multipoint server, or smart reflector, or conference manager, whatever you want to call it, is a server of some sort, requiring, at least, more capable hardware, and, probably, a higher bandwidth network connection. At present, I don't see how this server is avoided in the scenario I present. You may be able to avoid buying special-purpose equipment for this, by using a multipoint video conferencing service, or renting space on a reflector, or running a server on a powerful computer with a T1 connection to the Internet or with support for multiple ISDN dialups, but it looks like a server is needed in this picture.
This is the most widely used MBone video conferencing product. It appears to only be available for Unix at present, so I can't try it out until we get a SparcStation. It's possible someone has ported this, or some other MBone product, to some version of Windows. I'll investigate further.
This is Sun's video conferencing product. It works with the SunVideo board and the SunCamera, and includes a Whiteboard and shared application support. It costs about $2500/seat. I think it was actually developed by InSoft Inc., developers of Communique! and OpenDVE.
To support any of the TCP/IP-based video conferencing systems described above, you only need a network connection (dialup or direct), a capable computer, a sound card, a video camera, and a frame grabber (analog video to digital converter). If the camera is digital, then the frame grabber may not be necessary, as is the case with QuickCam from Connectix.
Video conferencing at 28.8 dialup rates is not quite a live experience. It looks more like a sequence of still photos, a slide show, than a live video. It seems to me to have marginal value. Maybe a little better than a regular phone call, but not much.
ISDN is probably the minimum connection required to get real live motion video. We saw that it was adequate in a dedicated mode with the PictureTel system, but I have not yet seen it perform adequately over the Internet. I'm sure that there will be some degradation, but it remains to be seen if the result is acceptable.
I purchased a QuickCam, a low cost ($100) CCD-based B/W video
camera that is plugged into the parallel port and can be used
with several cheap or free video conferencing systems over the
net (e.g. VideoPhone, CU-SeeMe). I installed the camera on my
50MHz 486 running Windows 3.1, because it doesn't yet work with
Windows NT. The advantage of QuickCam is that, because it is CCD-based,
it is fully digital, and, therefore, does not need to be converted
from analog to digital, as is required when the video source is
a normal video source (e.g. camcorder, VCR, TV signal). When the
source is analog, an additional $500+ board is normally required
to do the conversion (i.e. frame grabber). The QuickCam directly
delivers a digital images up to 320x240 pixels at 16 or 64 levels
of gray, with frame rates up to 24 frames per second. Their quoted
frame rate of 18 per second for a 160 by 120 pixel image would
be adequate for video conferencing in my view, but I do not see
that speed in my installation. I hypothesized that this might
be because of the relatively slow speed and small memory of the
486. However, subsequent experiments on a Toshiba laptop and a
desktop 133MHz Pentium ran at roughly the same speed. In any of
these configurations, the QuickCam runs at a minimally sufficient
speed at 160x120 resolution, but that image is too small if mapped
pixel-for-pixel to my monitor; but it may be that that resolution
is enough, if you could scale the image up. Most of my experimentation
was with the snapshot application that comes with the camera,
which doesn't support scaling. The movie application supports
2x scaling, which still looks good, but still is a little small.
The screensaver blows the image up to the full screen. At that
size, unless you're ten feet away from the screen, the 160x120
image is much too grainy, but it doesn't really get any jerkier.
So, there's still some hope that this could suffice if the video
conferencing software supports scaling to some intermediate size.
CU-See me doesn't. So, a future task is to try out VideoPhone
using the QuickCam.
I think we've proven that you can video conference effectively and affordably with the PictureTel system. Further investigation will help us select the actual technology we use, on the basis of our business interests and cost and quality. Some of the major unresolved issues for me are:
H.320-based systems should be interoperable, but I'd like to see that proven. That might require a trip to a show.
TCP/IP-based systems are generally not interoperable, because there are no widely accepted standards for video conferencing over tcp/ip, although there are efforts to define such standards. There are de facto standards on PCs, and on Unix, but to my knowledge, Unix and PC systems will not interoperate. I'm not even sure that PC systems interoperate, or that Unix systems interoperate. A current lack of interoperability of these systems is not a killer for us, though, since these are software products, they don't need special hardware, so only a software port or upgrade might be required down the line to reach compatibility when a standard emerges.
Our video conferencing server requirement to support multipoint
conferencing intersects with our possible requirement for a company
wide server. This interaction needs more investigation.
This research is sponsored by the Defense Advanced Research Projects Agency and managed by the U.S. Army Research Laboratory under contract DAAL01-95-C-0112. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of the Defense Advanced Research Projects Agency, U.S. Army Research Laboratory, or the United States Government.
© Copyright 1996 Object Services and Consulting, Inc. Permission is granted to copy this document provided this copyright statement is retained in all copies. Disclaimer: OBJS does not warrant the accuracy or completeness of the information on this page.
This page was written by Steve Ford. Send questions and comments about it to firstname.lastname@example.org.
Last updated: 04/22/96 7:57 PM
Back to Internet Tool Survey -- Back to OBJS