Virtual Environments and Environmental Instruments

Stephen R. Ellis

NASA Ames Research Center
Moffett Field, California 94035
and
University of California, Berkeley
Berkeley, California 94720
silly@eos.arc.nasa.gov

Excerpts from Simulated and Virtual Realities, K. Carr and R. England, eds. Taylor & Francis, London. pp. 11-51, reproduced with permission [1]. Omissions are by request of the publisher and are indicated with ellipses (...).

Table of Contents

1. Communication and Environments

 

1.1 Virtual environments are media
1.2 Optimal design
1.3 Extensions of the desk-top metaphor
1.4 Environments
1.5 Sense of physical reality

 

2. Virtualization

 

2.1 Definition of virtualization
2.2 Levels of virtualization
2.3 Environmental viewpoints and controlled elements
2.4 Breakdown by technological functions
2.5 Spatial and environmental instruments

 

3. Origins of Virtual Environments

 

3.1 Early visionaries
3.2 Vehicle simulation and three-dimensional cartography
3.3 Physical and logical simulation
3.4 Scientific and medical visualization
3.5 Teleoperation and telerobotics and manipulative simulation
3.6 Photography, cinematography, video technology
3.7 Role of engineering and physiological models

 

4. Virtual Environments: Performance and Trade-offs

 

Notes

 

Bibliography and References

1. Communication and Environments

1.1 Virtual environments are media

Virtual environments created through computer graphics are communications media (Licklider et al., 1978). They have both physical and abstract components like other media. Paper, for example, is a communication medium but the paper is itself only one possible physical embodiment of the abstraction of a two-dimensional surface onto which marks may be made. Consequently, there are alternative instantiations of the same abstraction. As an alternative to paper, for example, the Apple Newton series of intelligent information appliances resemble handwriting-recognizing magic slates on which users write commands and data with a stylus (see Apple Computer Co., 1992). The corresponding abstraction for head-coupled, virtual image, stereoscopic displays that synthesize a coordinated sensory experience is an environment. Recent advances and cost reductions in the underlying technology used to create virtual environments have made possible the development of new interactive systems that can subjectively displace their users to real or imaginary remote locations.

Different expressions have been used to describe these synthetic experiences. Terms like "virtual world" or "virtual environment" seem preferable since they are linguistically conservative, less subject to journalistic hyperbole and easily related to well-established usage as in the term "virtual image" of geometric optics. These so called "virtual reality" media several years ago caught the international public imagination as a qualitatively new human-machine interface (Pollack, 1989; D'Arcy, 1990; Stewart, 1991; Brehde, 1991), but they, in fact, arise from continuous development in several technical and nontechnical areas during the past 25 years (Ellis, 1990, 1993; Brooks, 1988; Kalawsky, 1993). Because of this history, it is important to ask why displays of this sort have only recently captured public attention.

The reason for the recent attention stems mainly from a change in the perception of the accessibility of the technology. Though its roots, as discussed below, can be traced to the beginnings of flight simulation and telerobotics displays, recent drops in the cost of interactive 3D graphics systems and miniature video displays have made it realistic to consider a wide variety of new applications for virtual environment displays. Furthermore, many video demonstrations in the mid-1980's gave the impression that indeed this interactive technology was ready to go. In fact, at that time, considerable development was needed before it could be practicable and these design needs still persist for many applications. Nevertheless, virtual environments can indeed become Ivan Sutherland's "ultimate computer display"; but in order to insure that they provide effective communications channels between their human users and their underlying environmental simulations, they must be designed.

[back to top]

1.2 Optimal design

A well designed human-machine interface affords the user an efficient and effortless flow of information between the device and its human operator. When users are given sufficient control over the pattern of this interaction, they themselves can evolve efficient interaction strategies that match the coding of their communications to the machine to the characteristics of their communication channel (Zipf, 1949; Mandelbrot, 1982; Ellis and Hitchcock, 1986; Grudin and Norman, 1993). Successful interface design should strive to reduce this adaptation period by analysis of the users' task and their performance limitations and strengths. This analysis requires understanding of the operative design metaphor for the interface in question, i.e. the abstract or formal description of the interface in question.

The dominant interaction metaphor for the human computer interface changed in the 1980's. Modern graphical interfaces, like those first developed at Xerox PARC (Smith et al., 1982) and used for the Apple Macintosh, have transformed the "conversational" interaction from one in which users "talked" to their computers to one in which they "acted out" their commands within a "desktop" display. This so-called desktop metaphor provides the users with an illusion of an environment in which they enact system or application program commands by manipulating graphical symbols on a computer screen.

[back to top]

1.3 Extensions of the desk-top metaphor

Virtual environment displays represent a three-dimensional generalization of the two-dimensional desk-top metaphor [2]. The central innovation in the concept, first stated and elaborated by Ivan Sutherland (1965; 1970) and Myron Krueger (1977; 1983) with respect to interactive graphics interfaces was that the pictorial interface generated by the computer could became a palpable, concrete illusion of a synthetic but apparently physical environment. In Sutherland's terms, this image would be the "ultimate computer display." These synthetic environments may be experienced either from egocentric or exocentric viewpoints. That is to say, the users may appear to actually be immersed in the environment or see themselves represented as a "You are here" symbol (Levine, 1984) which they can control through an apparent window into an adjacent environment.

The objects in this synthetic space, as well as the space itself, may be programmed to have arbitrary properties. However, the successful extension of the desk-top metaphor to a full "environment" requires an understanding of the necessary limits to programmer creativity in order to insure that the environment is comprehensible and usable. These limits derive from human experience in real environments and illustrate a major connection between work in telerobotics and virtual environments. For reasons of simulation fidelity, previous telerobotic and aircraft simulations, which have many of the aspects of virtual environments, also have had to take explicitly into account real-world kinematic and dynamic constraints in ways now usefully studied by the designers of totally synthetic environments (Hashimoto et al., 1986; Bussolari et al., 1988; Kim et al., 1988; Tachi et al., 1989; Bejczy et al., 1990; Sheridan, 1992; Cardullo, 1993).

[back to top]

1.4 Environments

Successful synthesis of an environment requires some analysis of the parts that make up the environment. The theater of human activity may be used as a reference for defining an environment and may be thought of as having three parts: a content, a geometry, and a dynamics (Ellis, 1991).

Decomposition of an environment into its abstract functional components.

Content

The objects and actors in the environment are its content. These objects may be described by vectors which identify their position, orientation, velocity, and acceleration in the environmental space, as well as other distinguishing characteristics such as their color, texture, and energy. This vector is thus a description of the properties of the objects. The subset of all the terms of the characteristic vector which is common to every actor and object of the content may be called the position vector. Though the actors in an environment may for some interactions be considered objects, they are distinct from objects in that in addition to characteristics they have capacities to initiate interactions with other objects. The basis of these initiated interactions is the storage of energy or information within the actors, and their ability to control the release of this stored information or energy after a period of time. The self is a distinct actor in the environment which provides a point of view establishing the frame of reference from which the environment may be constructed. All parts of the environment that are exterior to the self may be considered the field of action. As an example, the balls on a billiard table may be considered the content of the billiard table environment and the cue ball controlled by the pool player maybe considered the self. The additional energy and information that makes the cue ball an actor is imparted to it by the cue controlled by the pool player and his knowledge of game rules.

 

Geometry

The geometry is a description of the environmental field of action. It has dimensionality, metrics, and extent. The dimensionality refers to the number of independent descriptive terms needed to specify the position vector for every element of the environment. The metrics are systems of rules that may be applied to the position vector to establish an ordering of the contents and to establish the concept of geodesic or the loci of minimal distance paths between points in the environmental space. The extent of the environment refers to the range of possible values for the elements of the position vector. The environmental space or field of action may be defined as the cartesian product of all the elements of the position vector over their possible ranges. An environmental trajectory is a time-history of an object through the environmental space. Since kinematic constraints may preclude an object from traversing the space along some paths, these constraints are also part of the environment's geometric description.

 

Dynamics

The dynamics of an environment are the rules of interaction among its contents describing their behaviour as they exchange energy or information. Typical examples of specific dynamical rules may be found in the differential equations of newtonian dynamics describing the responses of billiard balls to impacts of the cue ball. For other environments, these rules also may take the form of grammatical rules or even of look-up tables for pattern-match-triggered action rules. For example, a syntactically correct command typed at a computer terminal can cause execution of a program with specific parameters. In this case the meaning and information of the command plays the role of the energy, and the resulting rate of change in the logical state of the affected device, plays the role of acceleration.

This analogy suggests the possibility of developing a semantic or informational mechanics in which some measure of motion through the state space of an information processing device may be related to the meaning or information content of the incoming messages. In such a mechanics, the proportionality constant relating the change in motion to the message content might be considered the semantic or informational mass of the program. A principle difficulty in developing a useful definition of "mass" from this analogy is that information processing devices typically can react in radically different ways to slight variations in the surface structure of the content of the input. Thus it is difficult to find a technique to analyze the input to establish equivalence classes analogous to alternate distributions of substance with equivalent centres of mass. The centre-of-gravity rule for calculating the centre of mass is an example of how various apparently variant mass distributions may be reduced to a smaller number of equivalent objects in a way simplifying consistent theoretical analysis as might be required for a physical simulation on a computer.

The usefulness of analyzing environments into these abstract components, content, geometry, and dynamics, primarily arises when designers search for ways to enhance operator interaction with their simulations. For example, this analysis has organized the search for graphical enhancements for pictorial displays of aircraft and spacecraft traffic (McGreevy and Ellis, 1986; Ellis et al., 1987; Grunwald and Ellis, 1988, 1991, 1993). However, it also can help organize theoretical thinking about what it means to be in an environment through reflection concerning the experience of physical reality.

[back to top]

1.5 Sense of physical reality

Our sense of physical reality is a construction derived from the symbolic, geometric, and dynamic information directly presented to our senses. But it is noteworthy that many of the aspects of physical reality are only presented in incomplete, noisy form. For example, though our eyes provide us only with a fleeting series of snapshots of only parts of objects present in our visual world, through a priori "knowledge" brought to perceptual analysis of our sensory input, we accurately interpret these objects to continue to exist in their entirety [3]. (Gregory, 1968, 1980, 1981; Hochberg, 1986). Similarly, our goal-seeking behaviour appears to filter noise by benefiting from internal dynamical models of the objects we may track or control (Kalman, 1960; Kleinman et al., 1970). Accurate perception consequently involves considerable a priori knowledge about the possible structure of the world. This knowledge is under constant recalibration based on error feedback. The role of error feedback has been classically mathematically modeled during tracking behaviour (McRuer and Weir, 1969; Jex et al., 1966; Hess, 1987) and notably demonstrated in the behavioural plasticity of visual-motor coordination (Welch, 1978; Held et al., 1966; Held and Durlach, 1991) and in vestibular and ocular reflexes (Jones et al., 1984; Zangemeister and Hansen, 1985; Zangemeister, 1991).

Thus, a large part of our sense of physical reality is a consequence of internal processing rather than being something that is developed only from the immediate sensory information we receive. Our sensory and cognitive interpretive systems are predisposed to process incoming information in ways that normally result in a correct interpretation of the external environment, and in some cases they may be said to actually "resonate" with specific patterns of input that are uniquely informative about our environment (Gibson, 1950; Heeger, 1989; Koenderink and van Doorn, 1977; Regan and Beverley, 1979).

These same constructive processes are triggered by the displays used to present virtual environments. However, since the incoming sensory information is mediated by the display technology, these constructive processes will be triggered only to the extent the displays provide high perceptual fidelity. Accordingly, virtual environments can come in different stages of completeness, which may be usefully distinguished by their extent of what may be called "virtualization".

[back to top]

2. Virtualization

2.1 Definition of virtualization

Virtualization may be defined as the "process by which a viewer interprets patterned sensory impressions to represent objects in an environment other than that from which the impressions physically originate." A classical example would be that of a virtual image as defined in geometrical optics. A viewer of such an image sees the rays emanating from it as if they originated from a point that could be computed by the basic lens law rather than from their actual location.

Virtualization most clearly applies to the two sense modalities associated with tremote stimuli, vision and audition. In audition as in vision, stimuli can be synthesized so as to appear to be originating from sources other than their physical origin (Wightman and Kistler, 1898a, 1898b). But carefully designed haptic stimuli that provide illusory senses of contact, shape and position clearly also show that virtualization can be applied to other sensory dimensions (Lackner, 1988). In fact, one could consider the normal functioning of the human sensory systems as the special case in which the interpretation of patterned sensory impressions results in the perception of real objects in the surrounding physical environment, which are in fact the physical energy sources. In this respect perception of reality resolves to the case in which, through a process of cartesian systematic doubt, it is impossible for an observer to refute the hypothesis that the apparent source of the sensory stimulus is indeed the physical source.

Virtualization, however, extends beyond the objects to the spaces in which they themselves may move. Consequently, a more detailed discussion of what it means to virtualize an environment is required.

[back to top]

2.2 Levels of virtualization

. . .

[back to top]

2.3 Environmental viewpoints and controlled elements

. . .

[back to top]

2.4 Breakdown by technological functions

. . .

[back to top]

2.5 Spatial and environmental instruments

Like the computer graphics pictures drawn on a display surface, the enveloping synthetic environment created by a head-mounted display may be designed to convey specific information. Thus, just as a spatial display generated by computer graphics may be transformed into a spatial instrument by selection and coupling of its display parameters to specific communicated variables, so too may a synthetic environment be transformed into an environmental instrument by design of its content, geometry, and dynamics (Ellis and Grunwald, 1989a,b). Transformations of virtual environments into useful environmental instruments, however, are more constrained than those used to make spatial instruments because the user must actually inhabit the environmental instrument. Accordingly, the transformations and coupling of actions to effects within an environmental instrument must not diverge too far from those transformations and couplings actually experienced in the physical world, especially if the instrument is to be used without disorientation, poor motor coordination, and motion sickness. Thus, spatial instruments may be developed from a greater variety of distortions in the viewing geometry and scene content than environmental instruments. Environmental instruments, however, may be well-designed if their creators have appropriate theoretical and practical understanding of the constraints. Thus, the advent of virtual environment displays provides a veritable cornucopia of opportunity for research in human perception, motor-control, and interface technology.

[back to top]

3. Origins of Virtual Environments

3.1 Early visionaries

The obvious, intuitive appeal that virtual environment technology has is probably rooted in the human fascination with vicarious experiences in imagined environments. In this respect, virtual environments may be thought of as originating with the earliest human cave art (Fagan, 1985), though Lewis Carroll's Through the Looking-Glass (1883) certainly is a more modern example of this fascination.

Fascination with alternative, synthetic realities has been continued in more contemporary literature. Aldous Huxley's "feelies" in Brave New World (1932) were also a kind of virtual environment, a cinema with sensory experience extended beyond sight and sound. A similar fascination must account for the popularity of microcomputer role-playing adventure games such as Wizardry (Greenberg and Woodhead, 1980). Motion pictures, and especially stereoscopic movies, of course, also provide examples of noninteractive spaces (Lipton, 1982). Theatre provides an example of corresponding performance environment which is more interactive and has been discussed as a source of useful metaphors for human interface design (Laural, 1991).

The contemporary interest in imagined environments has been particularly stimulated by the advent of sophisticated, relatively inexpensive, interactive techniques allowing the inhabitants of these environments to move about and manually interact with computer graphics objects in three-dimensional spaces. This kind of environment was envisioned in the science fiction plots (Daley, 1982) of the movie TRON (1981) and in William Gibson's Neuromancer (1984), yet the first actual synthesis of such a system using a head-mounted stereo display was made possible much earlier in the middle 1960's by Ivan Sutherland, who developed special-purpose fast graphics hardware specifically for the purpose of experiencing computer synthesized environments through head-mounted graphics displays (Sutherland, 1965, 1970).

Another early synthesis of a synthetic, interactive environment was implemented by Myron Krueger using back-projection and video processing techniques (Krueger, 1977, 1983, 1985) in the 1970's. Unlike the device developed for Sutherland, Krueger's environment was projected onto a wall-sized screen. In Krueger's VIDEOPLACE, the users' images appears in a two-dimensional graphic video world created by a computer. The VIDEOPLACE computer analyzed video images to determine when an object was touched by an inhabitant, and it could then generate a graphic or auditory response. One advantage of this kind of environment is that the remote video-based position measurement does not necessarily, encumber the user with position sensors. A more recent and sophisticated version of this mode of experience of virtual environments is the implementation from the University of Illinois called, with apologies to Plato, the "Cave" (Cruz-Neira et al., 1992).

[back to top]

3.2 Vehicle simulation and three-dimensional cartography

Probably the most important source of virtual environment technology comes from previous work in fields associated with the development of realistic vehicle simulators, primarily for aircraft (Rolfe and Staples, 1986; CAE Electronics, 1991; McKinnon and Kruk, 1991; Cardullo, 1993) but also automobiles (Stritzke, 1991) and ships (Veldhuyzen and Stassen, 1977; Schuffel, 1987). The inherent difficulties in controlling the actual vehicles often require that operators be highly trained. Since acquiring this training on the vehicles themselves could be dangerous or expensive, simulation systems synthesize the content, geometry, and dynamics of the control environment for training and for testing of new technology and procedures.

These systems usually cost millions of dollars and have recently involved helmet-mounted displays to recreate part of the environment (Lypaczewski et al., 1986; Barrette et al., 1990; Furness, 1986, 1987; Kaiser Electronics, 1990). Declining costs have now brought the cost of a virtual environment display down to that of an expensive workstation and made possible "personal simulators" for everyday use (Foley, 1987; Fisher et al., 1986; Kramer, 1992; Bassett, 1992).

The simulator's interactive visual displays are made by computer graphics hardware and algorithms. Development of special-purpose hardware, such as matrix multiplication devices, was an essential step that enabled generation of real-time, that is, greater than 20 Hz, interactive three dimensional graphics (Sutherland, 1965, 1970; Myers and Sutherland, 1968). More recent examples are the "geometry engine" (Clark, 1980, 1982) and the "reality engine" in Silicon Graphics IRIS workstations. These "graphics engines" now can project literally millions of shaded or textured polygons, or other graphics primitives, per second (Silicon Graphics, 1993). Though this number may seem large, rendering of naturalistic objects and surfaces can require rendering hundreds of thousands of polygons. Efficient software techniques are also important for improved three-dimensional graphics performance. "Oct-tree" data structures, for example, have been shown to dramatically improve processing speed for inherently volumetric structures (Jenkins and Tanimoto, 1980; Meagher, 1984). Additionally, special variable resolution rendering techniques for head-mounted systems also can be implemented to match the variable resolution of the human visual system and thus not waste computer resources rendering polygons that the user would be unable to see (Netrovali and Haskell, 1988; Cowdry, 1986; Hitchner and McGreevy, 1993).

Since vehicle simulation may involve moving-base simulators, programming the appropriate correlation between visual and vestibular simulation is crucial for a complete simulation of an environment. Moreover, failure to match these two stimuli correctly can lead to motion sickness (AGARD, 1988). Paradoxically, however, since the effective travel of most moving-base simulators is limited, designers must learn to introduce subthreshold visual-vestibular mismatches to produce illusions of greater freedom of movement. These allowable mismatches are built into so-called "washout" models (Bussolari et al., 1988; Curry et al., 1976) and are key elements for creating illusions of extended movement. For example, a slowly implemented pitch-up of a simulator can be used as a dynamic distortion to create an illusion of forward acceleration. Understanding the tolerable dynamic limits of visual-vestibular miscorrelation will be an important design consideration for wide field-of-view head-mounted displays.

The use of informative distortion is also well-established in cartography (Monmonier, 1991) and is used to help create convincing three-dimensional environments for simulated vehicles. Cartographic distortion is also obvious in global maps which must warp a spherical surface into a plane (Cotter, 1966; Robinson et al., 1984) and three-dimensional maps, which often use significant vertical scale exaggeration (6-20x) to clearly present topographic features. Explicit informative geometric distortion is sometimes incorporated into maps and cartograms presenting geographically indexed statistical data (Tobler, 1963, 1976; Tufte, 1983, 1990; Bertin, 1967/1983), but the extent to which such informative distortion may be incorporated into simulated environments is constrained by the user's movement-related physiological reflexes. If the viewer is constrained to actually be in the environment, deviations from a natural environmental space can cause disorientation and motion sickness (Crampton, 1990; Oman, 1991). For this reason, virtual space or virtual image formats are more suitable when successful communication of the spatial information may be achieved only through spatial distortions. Even in these formats the content of the environment may have to be enhanced by aids such as graticules to help the user discount unwanted aspects of the geometric distortion (McGreevy and Ellis, 1986; Ellis et al., 1987; Ellis and Hacisalihzade, 1990).

In some environmental simulations the environment itself is the object of interest. Truly remarkable animations have been synthesized from image sequences taken by NASA spacecraft which mapped various planetary surfaces. When electronically combined with surface altitude data, the surface photography can be used to synthesize flights over the surface through positions never reached by the spacecraft's camera (Hussey, 1990). Recent developments have made possible the use of these synthetic visualizations of planetary and Earth surfaces for interactive exploration and they promise to provide planetary scientists with the new capability of "virtual planetary exploration" (NASA, 1990; Hitchner, 1992; McGreevy, 1993).

[back to top]

3.3 Physical and logical simulation

Visualization of planetary surfaces suggests the possibility that not only the substance of the surface may be modeled but also its dynamic characteristics. Dynamic simulations for virtual environments may be developed from ordinary high-level programming languages like Pascal or C, but this usually requires considerable time for development. Interesting alternatives for this kind of simulation have been provided by simulation and modeling languages such as SLAM II, with a graphical display interface, and TESS (Pritsker, 1986). These very high-level languages provide tools for defining and implementing continuous or discrete dynamic models. They can facilitate construction of precise systems models (Cellier, 1991).

(Click on the image for an expanded view)

The process of representing a graphic object in virtual space allows a number of different opportunities to introduce informative geometric distortions or enhancements. These either may be a modification of the transforming matrix during the process of object definition or may be modifications of an element of a model. These modifications may take place (1) in an object relative coordinate system used to define the object's shape, or (2) in an affine or even curvilinear object shape transformation, or (3) during the placement transformation that positions the transformed object in world coordinates, or (4) in the viewing transformation or (5) in the final viewport transformation. The perceptual consequences of informative distortions are different depending on where they are introduced. For example, object transformations will not impair perceived positional stability of objects displayed in a head-mounted format, whereas changes of the viewing transformation, such as magnification, will.
Another alternative made possible by graphical interfaces to computers is a simulation development environment in which the simulation is created through manipulation of icons representing its separate elements, such as integrators, delays, or filters, so as to connect them into a functioning virtual machine. A microcomputer program called Pinball Construction Set published in 1982 by Bill Budge is a widely distributed early example of this kind of simulation system. It allowed the user to create custom simulated pinball machines on the computer screen simply by moving icons from a toolkit into an "active region" of the display where they would become animated. A more educational, and detailed example of this kind of simulator was written as educational software by Warren Robinett. This program, called Rocky's Boots (Robinett, 1982), allowed users to connect icons representing logic circuit elements, that is, and-gates and or-gates, into functioning logic circuits that were animated at a slow enough rate to reveal their detailed functioning. More complete versions of this type of simulation have now been incorporated into graphical interfaces to simulation and modeling languages and are available through widely distributed object oriented interfaces such as the interface builder distributed with NeXT® computers.

The dynamical properties of virtual spaces and environments may also be linked to physical simulations. Prominent, noninteractive examples of this technique are James Blinn's physical animations in the video physics courses, The Mechanical Universe and Beyond the Mechanical Universe (Blinn, 1987, 1991). These physically correct animations are particularly useful in providing students with subjective insights into dynamic three-dimensional phenomena such as magnetic fields. Similar educational animated visualizations have been used for courses on visual perception (Kaiser et al., 1990) and computer-aided design (Open University and BBC, 1991). Physical simulation is more instructive, however, if it is interactive, and if interactive virtual spaces have been constructed which allow users to interact with nontrivial physical simulations by manipulating synthetic objects whose behaviour is governed by realistic dynamics (Witkin et al., 1987, 1990). Particularly interesting are interactive simulations of anthropomorphic figures moving according to realistic limb kinematics and following higher level behavioural laws (Zeltzer and Johnson, 1991).

(Click on the image for an expanded view)

Unusual environments sometimes have unusual dynamics. The orbital motion of a satellite in a low earth orbit (upper panels) changes when thrust v is made either in the direction of orbital motion, V0, (left) or opposed to orbital motion (right) and and indicated by the change of the original orbit (dashed lines) to the new orbit (solid line). When the new trajectory is viewed in a frame of reference relative to the initial thrust point on the original orbit (Earth is down, orbital velocity is to the right, see lower panels), the consequences of the burn appear unusual. Forward thrusts (left) cause nonuniform, backward, trochoidal movement. Backward thrusts (right) cause the reverse.
Some unusual natural environments are difficult to work in because their inherent dynamics are unfamiliar and may be nonlinear. The immediate environment around an orbiting spacecraft is an example. When expressed in a spacecraft-relative frame of reference known as "local-vertical-local-horizontal", the consequences of manoeuvring thrusts become markedly counter-intuitive and nonlinear (NASA, 1985). Consequently, a visualization tool designed to allow manual planning of manoeuvres in this environment has taken account of these difficulties (Grunwald and Ellis, 1988, 1991, 1993; Ellis and Grunwald, 1989b). This display system most directly assists planning by providing visual feedback of the consequences of the proposed plans. Its significant features enabling interactive optimization of orbital manoeuvres include an "inverse dynamics" algorithm that removes control nonlinearities. Through a "geometric spreadsheet", the display creates a synthetic environment that provides the user control of thruster burns which allows independent solutions to otherwise coupled problems of orbital manoeuvring. Although this display is designed for a particular space application, it illustrates a technique that can be applied generally to interactive optimization of constrained nonlinear functions.

[back to top]

3.4 Scientific and medical visualization

Visualizing physical phenomena may be accomplished not only by constructing simulations of the phenomena but also by graphs and plots of the physical parameters themselves (Blinn, 1987, 1991). For example, multiple time functions of force and torque at the joints of a manipulator or limb while it is being used for a test movement may be displayed (see, for example, Pedotti et al., 1978) or a simulation of the test apparatus in question itself may be interactively animated.

Successive CAT scan x-ray images may be digitized and used to synthesize a volumetric data set which then may be electronically processed to identify specific tissue. Here bone is isolated from the rest of the data set and presents a striking image that even non-radiologists may be tempted to interpret. Forthcoming hardware will give physicians access to this type of volumetric imagery for the cost of a car. Different tissues in volumetric data sets from CAT scan X-ray slices may be given arbitrary visual properties by digital processing in order to aid visualization. In this image tissue surrounding the bone is made partially transparent so as to make the skin surface as well as the underlying bone of the skull clearly visible. This processing is an example of enhancement of the content of a synthetic environment. (Photograph courtesy of Octree Corporation, Cupertino, CA)
One application for which a virtual space display already has been demonstrated some time ago in a commercial product has been in visualization of volumetric medical data (Meagher, 1984). These images are typically constructed from a series of two-dimensional slices of CAT, PET, or MRI images in order to allow doctors to visualize normal or abnormal anatomical structures in three dimensions. Because the different tissue types may be identified digitally, the doctors may perform an "electronic dissection" and selectively remove particular tissues. In this way truly remarkable skeletal images may be created which currently aid orthopaedic and cranio-facial surgeons to plan operations. These volumetric databases also are useful for shaping custom-machined prosthetic bone implants and for directing precision robotic boring devices for precise fit between implants and surrounding bone (Taylor et al., 1990). Though these static databases have not yet been presented to doctors as full virtual environments, existing technology is adequate to develop improved virtual space techniques for interacting with them and may be able to enhance the usability of the existing displays for teleoperated surgery (Green et al., 1992; UCSD Medical School, 1994; Satava and Ellis, 1994). Related scene-generation technology can already render detailed images of this sort based on architectural drawings and can allow prospective clients to visualize walkthroughs of buildings or furnished rooms that have not yet been constructed (Greenberg, 1991; Airey et al., 1990; Nomura et al., 1992).

[back to top]

3.5 Teleoperation and telerobotics and manipulative simulation

A proximity operations planning display presents a virtual space that enables operators to plan orbital manoeuvres despite counter-intuitive, nonlinear dynamics and operational constraints, such as plume impingement restrictions. The operator may use the display to visualize his proposed trajectories. Violations of the constraints appear as graphics objects, i.e. circles and arcs, which inform him of the nature and extent of each violation. This display provides a working example of how informed design of a planning environment's symbols, geometry, and dynamics can extend human planning capacity into new realms. (Photograph courtesy of NASA)
The second major technical influence on the development of virtual environment technology is research on teleoperation and telerobotic simulation (Goertz, 1964; Vertut and Coiffet, 1986; Sheridan, 1992). Indeed, virtual environments have existed before the name itself, as telerobotic and teleoperations simulations. The display technology, however, in these cases was usually panel-mounted rather than head-mounted. Two notable exceptions were the head-controlled/head-referenced display developed for control of remote viewing systems by Raymond Goertz at Argonne National Laboratory (Goertz et al., 1965) and a head-mounted system developed by Charles Comeau and James Bryan of Philco (Comeau and Brian, 1961). The development of these systems anticipated many of the applications and design issues that confront the engineering of effective virtual environment systems. Their discussions of the field-of-view/image resolution trade-off is strikingly contemporary. A key difficulty, then and now, was lack of a convenient and precise head tracker. The current popular, electromagnetic, six-degree-of-freedom position tracker developed by Polhemus Navigation (Raab et al., 1979; also see Ascension Technology Corp., 1990; Polhemus Navigation Systems, 1990; Barnes, 1992) consequently was an important technological advance. Interestingly, this was anticipated by similar work at Philco (Comeau and Bryan, 1961) which was limited, however, to electromagnetic sensing of orientation. In other techniques for tracking the head position, accelerometers, optical tracking hardware (CAE Electronics, 1991; Wang et al., 1990), or acoustic systems (Barnes, 1992) may be used. These more modern sensors are much more convenient than those used by the pioneering work of Goertz and Sutherland, who used mechanical position sensors, but the important, dynamic characteristics of these sensors have only recently begun to be fully described (Adelstein, Johnston and Ellis, 1992).

Virtual environment technology may assist visualization of the results of aerodynamic simulations. Here a DataGlove is used to control the position of a "virtual" source of smoke in a wind-tunnel simulation so the operator can visualize the local pattern of air flow. In this application the operator uses a viewing device incorporating TV monitors (McDowall et al., 1990) to present a stereo view of the smoke trail around the test model also shown in the desk-top display on the table (Levit and Bryson, 1991). (Photograph courtesy of NASA)
A second key component of a teleoperation work station, or of a virtual environment, is a sensor for coupling hand position to the position of the end-effector at a remote work site. The earlier mechanical linkages used for this coupling have been replaced by joysticks or by more complex sensors that can determine hand shape, as well as position. Modern joysticks are capable of measuring simultaneously all three rotational and three translational components of motion. Some of the joysticks are isotonic (BASYS, 1990; CAE Electronics, 1991; McKinnon and Kruk, 1991) and allow significant travel or rotation along the sensed axes, whereas others are isometric and sense the applied forces and torques without displacement (Spatial Systems, 1990). Though the isometric sticks with no moving parts benefit from simpler construction, the user's kinematic coupling in his hand make it difficult for him to use them to apply signals in one axis without cross-coupled signals in other axes. Consequently, these joysticks use switches for shutting down unwanted axes during use. Careful design of the breakout forces and detents for the different axes on the isotonic sticks allow a user to minimize cross-coupling in control signals while separately controlling the different axes (CAE Electronics, 1991; McKinnon and Kruk, 1991).

Visual virtual environment display systems have three basic parts: a head-referenced visual display, head and/or body position sensors, a technique for controlling the visual display based on head and/or body movement. One of the earliest system of this sort, shown above, was developed by Philco engineers (Comeau and Bryan, 1961) using a head-mounted, biocular, virtual image viewing system, a Helmholtz coil electromagnetic head-orientation sensor, and a remote TV camera slaved to head orientation to provide the visual image. Today this would be called a telepresence viewing system. The first system to replace the video signal with a totally synthetic image produced through computer graphics, was demonstrated by Ivan Sutherland for very simple geometric forms (Sutherland, 1965).
Although the mechanical bandwidth might have been only of the order of 2-5 Hz, the early mechanical linkages used for telemanipulation provided force-feedback conveniently and passively. In modern electronically coupled systems force-feedback or "feel" must be actively provided, usually by electric motors. Although systems providing six degrees of freedom with force-feedback on all axes are mechanically complicated, they have been constructed and used for a variety of manipulative tasks (Bejczy and Salisbury, 1980; Hannaford, 1989; Jacobson et al., 1986; Jacobus et al., 1992; Jacobus, 1992). Interestingly, force-feedback appears to be helpful in the molecular docking work at the University of North Carolina in which chemists manipulate molecular models of drugs in a computer graphics physical simulation in order to find optimal orientations for binding sites on other molecules (Ouh-young et al., 1989).

High-fidelity force-feedback requires electromechanical bandwidths over 30 Hz. Most manipulators do not have this high a mechanical response. A force-reflecting joystick with these characteristics, however, has been designed and built (Adelstein and Rosen, 1991, 1992). Because of the required dynamic characteristics for high fidelity, it is not compact and is carefully designed to protect its operators from the strong, high-frequency forces it is capable of producing (see Fisher et al. (1990) for some descriptions of typical manual interface specifications; also Brooks and Bejczy (1986) for a review of control sticks).

[back to top]

Manipulative interfaces may provide varying degrees of manual dexterity. Relatively crude interfaces for rate-controlled manipulators may allow experienced operators to accomplish fine manipulation tasks. Access to this level of proficiency, however, can be aided by coordinated displays of high visual resolution, by use of position control derived from inverse kinematic analysis of the manipulator, by more intuitive control of the interface, and by more anthropomorphic linkages on the manipulator.

A high-fidelity, force-reflecting two-axis joystick designed to study human tremor. (Photograph courtesy of B. Dov Adelstein)
An early example of a dexterous, anthropomorphic robotic end-effector is the hand by Tomovic and Boni (Tomovic and Boni, 1962). A more recent example is the Utah/MIT hand (Jacobson et al., 1984). Such hand-like end effectors with large numbers of degrees of freedom may be manually controlled directly by hand-shape sensors; for example, the Exos, exoskeleton hand (Exos, 1990).

Significantly, the users of the Exos hand often turn off a number of the joints, raising the possibility that there may be a limit to the number of degrees of freedom usefully incorporated into a dexterous master controller (Marcus, 1991). Less bulky hand shape measurement devices have also been developed using fiber optic or other sensors (Zimmerman et al., 1987; W Industries, 1991). Use of these alternatives, however, involves significant trade-offs of resolution, accuracy, force-reflection and calibration stability as compared with the more bulky sensors. A more recent hand-shape measurement device has been developed that combines high static and dynamic positional fidelity with intuitive operation and convenient donning and doffing (Kramer, 1992).

Experienced operators of industrial manipulator arms (centre) can develop great dexterity (see drawing on right) even with ordinary two-degree-of-freedom, joystick interfaces (left) for the control of robot arms with adequate mechanical bandwidth. Switches on the control box shift control to the various joints on the arm. The source of the dexterity illustrated here is the high dynamic fidelity of the control, a fidelity that needs to be reproduced if supposedly more natural haptic virtual environment interfaces are to be useful (Photographs courtesy of Deep Ocean Engineering, San Leandro, CA).
As suggested by the informal comments of Exos hand-master users who shut down apparently unneeded degrees of freedom on their hand-shape sensor, the number of degrees of freedom that need to be monitored by sensors used for virtual environment displays can become the subject of formal investigations. For example, the head position sensors used on the Fakespace Boom constrain natural head roll in the coronal plane. Furthermore, anecdotal observations of architectural walk-throughs with closed, head-mounted displays have indicated that it was better to disable head-roll tracking because, combined with sensor lag, it seemed to make the use of the display unpleasant (J. Nomura, Personal communication, 1993). Accordingly, one might reasonably ask what benefits in position and orientation display head-roll tracking provides. This question has been investigated for manipulative interaction with targets placed within arm's length of a user. The results show that if large head rotations with respect to the torso are not required (> ~ 50°), head roll tracking provides only minor improvements in the users' ability to rotate objects into alignment with targets presented in a virtual environment (Adelstein and Ellis, 1993). Thus, roll-tracking and roll-compensation on some telepresence camera platforms may be unnecessary if the user's interface to the control of the remote device does not require large head-torso rotations.

[back to top]

3.6. Photography, cinematography, video technology

An exoskeleton hand-shape measurement system in a dexterous hand master using accurate Hall-effect flexion sensors which is suitable to drive a dexterous end-effector. (Photograph courtesy of Exos, Inc, Burlington, MA)
 
Since photography, cinema, and television are formats for presenting synthetic environments, it is not surprising that technology associated with special effects for these media have been applied to virtual environments. The LEEP optics, which are commonly used in many "virtual reality" stereo-viewers, were originally developed for a stereoscopic camera system using matched camera and viewing optics to cancel the aberrations of the wide angle lens. The LEEP system field of view is approximately 110° x 55°, but it depends on how the measurement is taken (Howlett, 1991). Though this viewer does not allow adjustment for interpupilary distance, its large entrance pupil (30 mm radius) removes the need for such adjustment. The stereoscopic image pairs used with these optics, however, are presented 62 mm apart, closer together than the average interpupilary distance. This choice is a useful design feature which reduces the likelihood that average users need to diverge their eyes to achieve binocular fusion.

 
Apparatus used to study the benefits of incorporating head-roll tracking into a head-mounted telepresence display. The left panel shows a stereo video camera mounted on a 3-degree-of-freedom platform that is slaved in orientation to the head orientation of an operator wearing a head-mounted video display at a remote site. The operator sees the video images from the camera and uses them to reproduce the orientation and position of rectangular test objects distributed on matching cylindrical work surfaces.
An early development of a more complete environmental illusion through cinematic virtual space was Morton Heilig's "Sensorama." It provided a stereo, wide-field-of-view, egocentric display with coordinated binaural sound, wind, and odour effects (Heilig, 1955). A more recent, interactive virtual space display was implemented by the MIT Architecture Machine Group in the form of a video-disk-based, interactive map of Aspen, Colorado (Lippman, 1980). The interactive map provided a video display of what the user would have seen had he actually been there moving through the town. Similar interactive uses of video-disk technology have been explored at the MIT Media Lab (Brand, 1987). One feature that probably distinguishes the multimedia work mentioned here from the more scientific and engineering studies reported previously, is that the media artists, as users of the enabling technologies, have more interest in synthesizing highly integrated environments including sight, sound, touch, and smell. A significant part of their goal is the integrated experience of a "synthetic place". On the other hand, the simulator designers are only interested in capturing the total experience insofar as this experience helps specific training and testing. Realism is itself not their goal, but effective communication and training are.

[back to top]

3.7 Role of engineering and physiological models

Since the integration of the equipment necessary to synthesize a virtual environment represents such a technical challenge in itself, there is a tendency for groups working in this area to focus their attention only on collecting and integrating the individual technologies for conceptual demonstrations in highly controlled settings. The videotaped character of many of these demonstrations of early implementation often has suggested system performance far beyond actually available technology. The visual resolution of the cheaper, wide field displays using LCD technology has often been, for example, implicitly exaggerated by presentation techniques using overlays of users wearing displays and images taken directly from large-format graphics monitors. In fact, the users of many of these displays are, for practical purposes, legally blind.

A graphic model of a manipulator arm electronically superimposed on a video signal from a remote worksite to assist users who must contend with time delay in their control actions (Photograph courtesy of JPL, Pasadena, CA).
Accomplishment of specific tasks in real environments, however, places distinct real performance requirements on the simulation of which visual resolution is just an example. These requirements may be determined empirically for each task, but a more general approach is to use human performance models to help specify them. There are good general collections that can provide this background design data (e.g. Borah et al., 1978; Boff et al., 1986; Elkind et al., 1989) and there are specific examples of how scientific and engineering knowledge and computer-graphics-based visualization can be used to help designers conform to human performance constraints (Monheit and Badler, 1990; Phillips et al., 1990; Larimer et al., 1991). Useful sources on human sensory and motor capacities relevant to virtual environments are also available (Brooks and Bejczy, 1986; Howard, 1982; Blauert, 1983; Goodale, 1990; Durlach et al., 1991; Ellis et al., 1993).

Because widely available current technology limits the graphics and simulation update rate in virtual environments to less than 20 Hz, understanding the control characteristics of human movement, visual tracking, and vestibular responses is important for determining the practical limits to useful work in these environments. Theories of grasp, manual tracking (Jex et al., 1966), spatial hearing (Blauert, 1983; Wenzel, 1991), vestibular response, and visual-vestibular correlation (Oman, 1991; Oman et al., 1986) all can help to determine performance guidelines.

[back to top]

Predictive knowledge of system performance is not only useful for matching interfaces to human capabilities, but it is also useful in developing effective displays for situations in which human operators must cope with significant time lags, for example those > 250 ms, or other control difficulties. In these circumstances, accurate dynamic or kinematic models of the controlled element allow the designer to give the user control over a predictor which he may move to a desired location and which will be followed by the actual element (Hashimoto et al., 1986; Bejczy et al., 1990).

Though very expensive, the CAE Fiber Optic Helmet Mounted display, FOHMD (left panel), is one of the highest-performance virtual environment systems used as a head-mounted aircraft simulator display. It can present an overall visual field 162° x 83.5° with 5-arcmin resolution with a high resolution inset of 24° x 18° of 1.5 arcmin resolution. It has a bright display, 30 foot-lambert, and a fast, optical head-tracker: 60-Hz sampling, with accelerometer augmentation. The Kaiser WideEye® display (right panel) is a roughly comparable, monochrome device designed for actual flight in aircraft as a head-mounted heads-up display. It has a much narrower field of view (monocular: 40°, or binocular with 50% overlap 40° x 60°; visual resolution is approximately 3 arcmin). (Photographs courtesy of CAE Electronics, Montreal, Canada; Kaiser Electronics, San José, CA)
Another source of guidelines is the performance and design of existing high-fidelity systems themselves. Of the virtual environment display systems, probably the one with the best visual display is the CAE Fiber Optic Helmet Mounted Display or the "FOHMD" (Lypaczewski et al., 1986; Barrette et al., 1990) which is used in military aircraft simulators. It presents two 83.5° monocular fields of view with adjustable binocular overlap, typically of about 38° in early versions, giving a full horizontal field-of-view of up to 162°. Similarly, the Wright-Patterson Air Force Base Visually Coupled Airborne Systems Simulator or "VCASS" display, also presents a very wide field of view, and has been used to study the consequences of field-of-view restriction on several visual tasks (Wells and Venturino, 1990). Their results support other reports that indicate that visual performance is influenced by increased field-of-view, but that this influence wanes as fields of view greater than 60° are used (Hatada et al., 1980).

A significant feature of the FOHMD is that the 60-Hz sampling of head position had to be augmented by signals from helmet-mounted accelerometers to perceptually stabilize the graphics imagery during head movement. Without the accelerometer signals, perceptual stability of the enveloping environment requires head-position sampling over 100 Hz, as illustrated by well-calibrated teleoperations viewing systems developed in Japan (Tachi et al., 1984, 1989). In general, it is difficult to calibrate the head-mounted, virtual image displays used in these integrated systems. One solution is to use a see-through system and to compare the positions of real objects and superimposed computer-generated objects (Hirose et al., 1990, 1992; Ellis and Bucher, 1994; Janin et al., 1993; Rolland, 1994).

Technical descriptions with performance data for fully integrated systems have not been generally available or accurately detailed (Fisher et al., 1986; Stone, 1991a,b), but this situation should change as reports are published in a number of journals, i.e. IEEE Computer Graphics and Applications; Computer Systems in Engineering; Presence: the Journal of Teleoperations and Virtual Environments; Pixel: the Magazine of Scientific Visualization; Ergonomics; and Human Factors. Compendiums of the human factors design issues are available (e.g. Ellis et al., 1993), and there are books collecting manufacturers' material which ostensibly describes the performance of the component technology (e.g. Kalawsky, 1993). But due to the absence of standards and the novelty of the equipment, developers are likely to find these descriptions still incomplete and sometimes misleading. Consequently, users of the technology must often measure the basic performance measurements of components themselves (e.g. Adelstein et al., 1992).

[back to top]

4. Virtual Environments: Performance and Trade-offs

. . .

Notes

  1. Earlier versions of some parts of this paper appeared as "Nature and origin of virtual environments: a bibliographical essay," in Computer Systems in Engineering, 2 (4), 321-346, 1991 and as "What are virtual environments?" in IEEE Computer Graphics and Applications, 14 (1), 17-22, 1994.
  2. Higher dimensional displays have also been described. See Inselberg (1985) or Feiner and Beshers (1990) for alternative approaches.
  3. This "knowledge" should not be thought of as the conscious, abstract knowledge that is acquired in school. It rather takes the form of tacit acceptance of specific constraints on the possibilities of change such as that are reflected in Gestalt Laws, e.g. common fate or good continuation. Its origin may be sought in the phylogenetic history of a species, shaped by the process of natural selection and physical law, and documented by the history of the earth's biosphere.
[back to top]

Bibliography and References

[papers]||[top]

Advanced Displays and Spatial Perception Laboratory
Human Information Processing Research Branch
Moffett Field, CA 94035-1000