DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
                    UNIVERSITY OF WASHINGTON
           TECHNICAL REPORT ABSTRACTS - 94-06-01 -> 95-02-03


Obtaining Electronic Technical Reports:
  Technical reports in this list with an asterisk (*) following the
  TR number are available on-line as of April 1995
  via FTP (ftp.cs.washington.edu), or
  the WWW (http://www.cs.washington.edu/research/tr/techreports.html)

Obtaining Hardcopy Technical Reports:
  To obtain a technical report on paper, sent to you via Snail Mail,
  please contact <tr-request@cs.washington.edu>.

If you need additional information or assistance, please contact:
  <tr-request@cs.washington.edu>

                          ABSTRACTS
  

TR # 94-06-01 *
Hugues Hoppe
"Surface Reconstruction from Unorganized Points"

This thesis describes a general method for automatic reconstruction of
accurate, concise, piecewise smooth surfaces from unorganized 3D
points.  Instances of surface reconstruction arise in numerous
scientific and engineering applications, including reverse-engineering
- the automatic generation of CAD models from physical objects.

Previous surface reconstruction methods have typically required
additional knowledge, such as structure in the data, known surface
genus, or orientation information.  In contrast, the method outlined
in this thesis requires only the 3D coordinates of the data points.
From the data, the method is able to automatically infer the
topological type of the surface, its geometry, and the presence and
location of features such as boundaries, creases, and corners.

The reconstruction method has three major phases: 1) initial surface
estimation, 2) mesh optimization, and 3) piecewise smooth surface
optimization.  A key ingredient in phase 3, and another principal
contribution of this thesis, is the introduction of a new class of
piecewise smooth representations based on subdivisions.  The
effectiveness of the three-phase reconstruction method is
demonstrated on a number of examples using both simulated and real
data.

Phases 2 and 3 of the surface reconstruction method can also be used
to approximate existing surface models.  By casting surface
approximation as a global optimization problem with an energy function
that directly measures deviation of the approximation from the
original surface, models are obtained that exhibit excellent accuracy
to conciseness tradeoffs.  Examples of piecewise linear and piecewise
smooth approximations are generated for various surfaces, including
meshes, NURBS surfaces, CSG models, and implicit surfaces.


TR # 94-06-02 *
"Experiences with the UWTester in Computer Science and Engineering
Education"
Neil R. McKenzie, Carl Ebeling, Larry McMurchie, Gaetano Borriello

Teaching students the complete process of circuit design, simulation,
implementation, test and debug is a daunting task.  Even though design
description tools and circuit compilers have kept up with the
increasing levels of integration found in current implementation media
such as FPGAs and microcontrollers, it has become increasingly
difficult for students to test and debug the complex hardware projects
enabled by these modern tools and devices.  In this paper we describe
an inexpensive digital circuit testing environment that greatly
simplifies the test and debug experience.  This environment,
consisting of the tester hardware and its corresponding software,
enables students to experience the challenges of testing and debugging
without the expense of costly commercial hardware testers or the
drudgery of setting up ad hoc hardware test jigs.  The UWTester has
been in use in the Department of Computer Science and Engineering at
the University of Washington for the past two years in both
undergraduate and graduate hardware design courses as well as
independent study and research projects.


TR # 94-06-03 *
"Exploiting Shared Memory for Protected Services"
Rene W. Schmidt

The client/server model is commonly used to provide sharing and
protection of data.  Using this model, distrusted applications
(clients) can access and possibly modify shared data, while
guaranteeing data integrity.  To ensure this data integrity, the
shared data is managed by a server that is located in a private
protection domain.  Calls to the server domain utilize a protected
procedure call mechanism, i.e., the calls execute within the server's
domain.  While this model provides safety, it can suffer in
performance, due to the domain-crossing costs required by the
protected procedure calls.

"Porc" is a toolkit for building object-based client/server
applications.  Its main goals are (1) to provide efficient access to
protected objects and (2) to hide the protection boundary from
clients.  In Porc protected server objects are represented as local
proxy objects in the client; the difference between a local object and
a protected object is thus transparent to the client.  This thesis
describes three extensions of the Porc toolkit in order to provide
fast access to protected objects.

The extensions are all based on read-only sharing.  First, read-only
methods must be defined, which are server procedures that do not
modify server state.  Read-only methods can be directly invoked by
clients without requiring a protected procedure call.  Second,
version-based synchronization is needed; a mechanism, which can be
used to efficiently synchronize read-only methods.  Third, shared
proxies are used to avoid the cost of local proxy creation in clients,
making it faster to traverse complex pointer structures.  Shared
proxies also increase transparency by reducing pointer aliasing
problems.


TR # 94-06-04 *
"Automatic Synthesis of Device Drivers for Hardware/Software
Co-design"
Elizabeth Walkup, Gaetano Borriello

Automatically synthesizing device drivers, the hardware and software
needed to interface a device to a processor, is an important element
of hardware/software co-design.  Driver software consists of the
sequences of instructions needed for the processor to control its
interactions with the device.  Driver hardware consists of the digital
logic necessary to physically connect the devices and generate signal
events while meeting timing constraints.  We describes an approach
that begins with device specifications in the form of timing and state
diagrams and determines which signals can be controlled directly from
software and which require indirect control through intervening
hardware.  Minimizing this hardware requires solving a simultaneous
scheduling and partitioning problem whose goal is to limit the number
of wires whose events are not directly generated by the processor
software.  We show that even the simplest version of this problem is
NP-hard and discuss a heuristic solution that should work well in
practical situations.


TR # 94-06-05 *
"Lecture Notes on Message Routing in Parallel Machines"
Martin Tompa

These are the lecture notes from CSE 522, a graduate algorithms course
I taught at the University of Washington in Spring 1994.  The topic of
the course was Message Routing in Parallel Machines.  These notes are
not intended to be a survey of that area, however, as there are
numerous important results that I would have liked to cover but did not
have time.


TR # 94-06-06 *
"The Meerkat Multicomputer: Tradeoffs in Multicomputer Architecture"
Robert C. Bedichek

A central problem preventing the wide application of distributed
memory multicomputers has been their high price, especially for small
installations.  High prices are due to long design times, support for
scaling to thousands of nodes, and high production costs.  This thesis
demonstrates a new approach that combines some carefully chosen
restrictions on scaling with a software-intensive methodology.

We used a concrete design, Meerkat, as a vehicle for exploring the
multicomputer design space.  Meerkat is a distributed memory
multicomputer architecture effective in the range from several to
hundreds of processors.  It uses a two-dimensional, passive backplane
to connect nodes composed of processors, memory, and I/O devices.  The
interconect is conceptually simple, inexpensive to design and
build, has low latency, and provides high bandwidth on long messages.
However, it does not scale to thousands of processors, provide high
bandwidth on short messages, or implement cache coherent shared memory.

This thesis describes Meerkat's architecture, the niche that Meerkat
fills, and the rationale for our design choices.  We present
performance results obtained from our hardware prototype and a
calibrated simulator to demonstrate that parallel numerical workloads
work well on Meerkat.


TR # 94-07-02 *
"System Support for Efficient Network Communication"
Chandramohan A. Thekkath

Recent advances in processor and network technology have significantly
changed the hardware base for distributed systems.  However, high-speed
networks and fast processors alone are not sufficient to deliver the
end-to-end performance that users of these systems expect; good
software design is critical.  This thesis explores software mechanisms
needed to support efficient network access for the next generation of
workstations and networks.

The thesis proposes a new communication mechanism or model based on
remote access (read and write) to protected memory segments.  This
model has two key aspects.  First, it simplifies input data
demultiplexing and reduces data copying costs.  This allows users of
the model very efficient access to remote data.  A second and more
significant feature of the model is that it separates the transfer of
data from the transfer of control.  That is, a remote data access can
complete at the target process without invoking a thread of control
within the target.

The communication model is implemented on DECstation workstations
connected by an ATM network and provides performance close to the
limits imposed by the hardware.  Using example applications, we show
how the model can provide excellent performance for both parallel
applications and conventional distributed system mechanisms like
remote procedure call (RPC).

However, we argue that conventional communication mechanisms, e.g.,
message passing or RPC, do not fully exploit the potential of newer
networks.  These mechanisms closely couple the transfer of both
control and data.  We demonstrate, using the read/write communication
model, that separating data transfer and control transfer in
distributed systems can both (1) eliminate unnecessary and expensive
control transfers and (2) facilitate tighter coupling of a distributed
system.  This has the potential to increase performance and reduce
server load, which supports scaling in the face of an increasing
number of clients.


TR # 94-07-03 *
"Minimal Adaptive Routing on the Mesh with Bounded Queue Size"
Donald D. Chinn, Tom Leighton, Martin Tompa

An adaptive routing algorithm is one in which the path a packet takes
from its source to its destination may depend on other packets it
encounters.  Such algorithms potentially avoid network bottlenecks by
routing packets around "hot spots."  Minimal adaptive routing
algorithms have the additional advantage that the path each packet
takes is a shortest one.

For a large class of minimal adaptive routing algorithms, we present
an Omega(n^2/k^2) bound on the worst case time to route a static
permutation of packets on a n x n mesh or torus with nodes that can
hold up to k >= 1 packets each.  This is the first nontrivial lower
bound on adaptive routing algorithms.  The argument extends to more
general routing problems, such as the h-h routing problem.  It also
extends to a large class of dimension order routing algorithms, yielding
an Omega(n^2/k^2) time bound.

To complement these lower bounds, we present two upper bounds.  One is
an O(n^2/k^2) time dimension order routing algorithm that matches
the lower bound.  The other is the first instance of a minimal adaptive
routing algorithm that achieves O(n) time with constant sized queues
per node.  We point out why the latter algorithm is outside the model
of our lower bounds.


TR # 94-07-04 *
"Separating Data and Control Transfer in Distributed Operating
Systems"
Chandramohan A. Thekkath, Henry M. Levy, Edward D. Lazowska

Advances in processor architecture and technology have resulted in
workstations in the 100+ MIPS range.  As well, newer local-area
networks such as ATM promise a ten- to hundred-fold increase in
throughput, much reduced latency, greater scalability, and greatly
increased reliability, when compared to current LANs such as Ethernet.

We believe that these new network and processor technologies will
permit tighter coupling of distributed systems at the hardware level,
and that distributed systems software should be designed to benefit
from that tighter coupling.  In this paper, we propose an alternative
way of structuring distributed systems that takes advantage of a
communication model based on remote network access (reads and writes)
to protected memory segments.

A key feature of the new structure, directly supported by the
communication model, is the separation of "data transfer" and "control
transfer."  This is in contrast to the structure of traditional
distributed systems, which are typically organized using message
passing or remote procedure call (RPC).  In RPC-style systems, data
and control are inextricably linked -- all RPCs must transfer both
data and control, even if the control transfer is unnecessary.

We have implemented our model on DECstation hardware connected by an
ATM network.  We demonstrate how separating data transfer and control
transfer can eliminate unnecessary control transfers and facilitate
tighter coupling of the client and server.  This has the potential to
increase performance and reduce server load, which supports scaling in
the face of an increasing number of clients.  For example, for a small
set of file server operations, our analysis shows a 50% decrease in
server load when we switched from a communications mechanism requiring
both control transfer and data transfer, to an alternative structure
based on pure data transfer.


TR # 94-07-05 *
"Hardware and Software Support for Efficient Exception Handling"
Chandramohan A. Thekkath and Henry M. Levy

Program-synchronous exceptions, for example, breakpoints, watchpoints,
illegal opcodes, and memory access violations, provide information
about "exceptional" conditions, interrupting the program and vectoring
to an operating system handler. Over the last decade, however,
programs and run-time systems have increasingly employed these
mechanisms as a performance optimization to detect "normal" and
"expected" conditions.  Unfortunately, current architecture and
operating system structures are designed for exceptional or erroneous
conditions, where performance is of secondary importance, rather than
normal conditions.  Consequently, this has limited the practicality of
such hardware-based detection mechanisms.

We propose both hardware and software structures that permit efficient
handling of synchronous exceptions by user-level code.  We demonstrate
a software implementation that reduces exception-delivery cost by
order-of-magnitude on current RISC processors, and show the
performance benefits of that mechanism for several example
applications. 


TR # 94-07-06 
"Self-Explanatory Simulation for an Electronic Encyclopedia"
Franz G. Amador

This dissertation presents a new, interpretive architecture for
self-explanatory simulation that is designed to support highly
interactive, simulation-based electronic encyclopedia articles.
Unlike previous compilation-based architectures, this interpretive
architecture can rapidly respond to external modifications to the
simulated model, such as a user's disconnecting the components of a
simulated refrigerator while interacting with an encyclopedia article
on refrigeration.  This architecture is fully implemented as the Pika
self-explanatory simulator, which prepares for simulation the most
complex reported test example of Forbus and Falkenhainer's SIMGEN
.Mk2, the state-of-the-art self-explanatory simulator compiler, about
2000 times faster than does SIMGEN .Mk2.  Once simulation has begun,
Pika employs an incremental update strategy that, for a realistic
thermodymatic model, reduces the time needed to respond to changes in
the model, internal or external, by a factor of eight over the initial
simulation-preparation time.  Unlike SIMGEN, Pika's modeling language
allows non-directional algebraic and differential equations.  Through
novel use of a constraint hierarchy, Pika identifies state variables,
chooses default quantity values, decides how to use each equation, and
symbolically solves equations as necessary to prepare for numeric
integration.


TR # 94-07-07 *
"Implementing Constraint Imperative Programming Languages: the
Kaleidoscope'93 Virtual Machine"
Gus Lopez, Bjorn Freeman-Benson, and Alan Borning

Constraint Imperative Programming (CIP) languages integrate declarative
constraints with imperative state and destructive assignment, yielding
a powerful new programming paradigm.  However, CIP languages are
difficult to implement efficiently due to complex interactions between
the two donor paradigms.  Neither the virtual machines for classical
object-oriented languages, nor those for existing constraint
languages, are suitable for implementing CIP languages, as each
assumes a purely imperative or a purely declarative computation model.
We have developed a new virtual machine for CIP languages, the
K-machine, an imperative machine with an incremental constraint solver
and a constraint-based, rather than value-based, data store.  This
virtual machine allows user-defined constraints to be defined using
constraint constructor definitions which are the CIP analog to method
definitions.  Similar to methods, these constructors are able to
reference variables indirectly through many levels of pointers.  The
K-machine maintains relations between objects in the presence of state
change to these indirectly referenced objects.  The K-machine is
capable of supporting a wide variety of COP languages, including our
most recent: Kaleidoscope'93.


TR # 94-08-01 * 
"Mediators: Easing the Design and Evolution of Integrated Systems"
Kevin J. Sullivan

People benefit from tightly integrated systems that can be designed,
realized, and evolved affordably.  Unfortunately, common software
design methods do not easily accommodate requirements for tightly
integrated systems.  Indeed, when used to meet such requirements,
common methods tend to yield unnecessarily complex structures that
complicate design and realization and that inhibit subsequent
evolution.  After substantiating this claim, I present the "mediator
method" as a solution.  This method combines "behavioral
entity-relationship (ER) modeling" for design with a "mediator"
approach to implementation.  The mediator method is better than common
methods for designing, realizing, and evolving many integrated
systems.  I support this claim both by rational argument from
simplyfying models and by carefully reflecting on software development
experiences in which the method was used.  One of these efforts led to
the design of Prism, a system used to plan radiation treatments for
cancer patients, now in clinical use at the University of Washington
Cancer Center.  I present Prism as a case study on the use of the
mediator design method.  The mediator method represents a contribution
of significant value because it allows ordinary software developers to
design, realize, and evolve more effective integrated systems more
affordably with common programming languages and without costly or
restrictive new mechanisms.


TR # 94-09-01 *
"The PRESTO Application Suite"
Radhika Thekkath and Susan J. Eggers

This report describes a group of coarse- and medium-grain,
explicitly-parallel applications that have been made available via the
World-Wide Web.  These programs have been written using the PRESTO
user-level threads library, by students at the University of
Washington and Rice University.  PRESTO provides a C++ based
environment for writing object-oriented parallel programs for
shared-memory multiprocessors.  The library provides basic classes
useful for writing parallel programs, among them are thread
manipulation routines for concurrency and synchronization primitives.


TR # 94-09-02 *
"Optimizing Static Calendar Queues"
K. Bruce Erickson, Richard E. Ladner, Anthony LaMarca

The calendar queue is an important implementation of a priority queue
which is particularly useful in discrete event simulators.  In this
paper we present an analysis of the static calendar queue which
maintains N active events.  A step of the discrete event simulator
removes and processes the event with the smallest associated time and
inserts a new event whose associated time is the time of the removed
event plus a random increment with mean "mu."  We demonstrate that for
the infinite bucket calendar queue the optimal bucket width is
approximately delta_{opt} = sqrt(2b/c) (mu/N) where b is the time to
process an empty bucket and c the incremental time to process a list
element.  With bucket width chosen to be delta_{opt}, the expected time
to process an event is approximately minimized at the constant c +
sqrt(2bc) + d, where d is the fixed time to process an event.  We show
that choosing the number of buckets to be O(N) yields a calendar queue
with performance equal to or almost equal to the performance of the
infinite bucket calendar queue.


TR # 94-09-03 *
"Reflecting Source Code Relations in Higher-Level Models of Software
Systems"
Gail C. Murphy, David Notkin, and Kevin Sullivan

Reasoning about an existing software system in terms of high-level
models - such as entity-relationship diagrams, data-flow diagrams,
object communication diagrams, etc. - is both necessary and dangerous.
The necessity arises because the source code for a system is so
complex that no individual can fully understand it.  The danger arises
from the risk that a software engineer may make an inappropriate
decision due to discrepancies between the source and a high-level
model.  We present a formal model, algorithm and tool that reduce the
risk of reasoning about the system in terms of a high-level model when
performing a software engineering task.  Given a high-level model
posited by an engineer, a low-level model of the source (e.g., a call
graph), and a mapping between the two, we compute and report both the
consistencies and inconsistencies between the high-level and source
models.  Reported consistencies increase the engineer's belief that
reasoning in terms of the high-level model is trustworthy.
Inconsistencies warn the engineer to consider carefully whether
additional analysis is needed before continuing with the planned
activity.


TR # 94-09-04 *
"Scheduling Issues in the Co-Synthesis of Reactive Real-Time Systems"
Pai Chou, Elizabeth Walkup, Gaetano Borriello

Many embedded control applications must respect intricate timing
requirements on their interactions with the external environment.
These constraints are derived from response time, rate of execution,
and low-level signaling requirements.  Currently, most of these
systems are being designed in an ad hoc manner.  Many tools assume the
designer has already finalized the scheduling, while most schedulers
make simplifying assumptions and often cannot handle general timing
constraints.

In this paper, we discuss the scheduling issues that must be addresses
by co-synthesis tools for embedded systems and outline possible
approaches to the problems.  Our perspective is based on experience
with Chinook, a hardware-software co-synthesis system for reactive
real-time systems, currently under development at the University of
Washington.  Chinook is initially targeting embedded applications
without operating system support.  From a high-level specification and
a device library, Chinook synthesizes both interface hardware and a
software program to realize the design.


TR # 94-09-05 *
"Reducing False Sharing on Shared Memory Multiprocessors
through Compile Time Data Transformations"
Tor Jeremiassen and Susan Eggers

We have developed compiler algorithms that analyze coarse-grained,
explicitly parallel programs and restructure their shared data to
minimize the number of false sharing misses.  The algorithms analyze
the per-process data accesses to shared data, use this information to
pinpoint the data structures that are prone to false sharing, and
choose an appropriate transformation to reduce it.

The algorithms eliminated an average (across the entire workload) of
64% of false sharing misses, and in two programs more than 90%.
However, how well the reduction in false sharing misses translated
into improved execution time depended heavily on the memory subsystem
architecture and previous programmer efforts to optimize for locality.
On a multiprocessor with a large cache configuration and high cache
miss penalty, the transformations improved the execution time of
programmer-unoptimized applications by as much as 60%. However, on
programs where previous programmer efforts to improve data locality
had reduced the original amount of false sharing, and on a
multiprocessor with a small cache configuration and cache miss
penalty, the gains were more modest.


TR # 94-09-07 *
"A Framework for Selective Recompilation in the Presence of
Complex Intermodule Dependencies"

Compilers and other programming environment tools derive information
from the source code of programs; derived information includes
compiled code, interprocedural summary information, and call graph
views.  If the source program changes, the derived information needs
to be updated. We present a simple framework for maintaining
intermodule dependencies, embodying different tradeoffs in terms of
space usage, speed of processing, and selectivity of invalidation,
that eases the implementation of incremental update of derived
information.  Our framework augments a directed acyclic graph
representation of dependencies with factoring nodes (to save space)
and filtering nodes (to increase selectivity), and it includes an
algorithm for efficient invalidation processing.  We show how several
schemes for selective recompilation, such as smart recompilation,
filter sets for interprocedural summary information, and dependencies
for whole-program optimization of object-oriented languages, map
naturally onto our framework. For this latter application, by
exploiting the facilities of our framework, we are able to reduce the
number of lines of source code recompiled by a factor of seven over a
header file-based scheme, and by a factor of two over the previous
state-of-the-art selective dependency mechanism without consuming
additional space.


TR # 94-09-08
"Efficient Replication Management in Distributed Systems"
Michael Rabinovich

Replication is a critical aspect of large-scale distributed systems.
Without it, the size of a system is limited by factors such as the
risk of component failures, the overloading of popular services, and
access latency to remote parts of the system.

Replication overcomes these problems by allowing service to continue
despite failures using remaining replicas, and by distributing
requests for a given service among multiple server replicas.  However,
replicated systems incur significant performance overhead for
maintaining multiple replicas and keeping them mutually consistent.
Managing replication efficiently is therefore important in building
large-scale distributed systems.

This dissertation concentrates on quorum-based replication management.
It proposes several ways to manage replication efficiently using
different types of quorums.

First, we study structure-based quorums, which are attractive because
they result in low-overhead replica management when the number of
replicas is high.  We propose a way to significantly improve system
availability in protocols using these quorums.  We also study in depth
the performance of a particular class of these quorums based on a grid
structure.

Second, for voting-based quorums, we propose a way to include new or
repaired replicas into the system and exclude failed, disconnected, or
deactivated replicas, asynchronously with user operations.  Thus, such
reconfiguration does not involve any service interruption and can be
done more freely for various purposes, e.g., improving data
availability, migrating data, or redistributing load.

Third, we propose a way to efficiently manage read-one-write-all
(ROWA) replication.  Protocols based on ROWA quorums maintain
consistency by reading one copy of the data and writing all copies.
They are especially attractive because they provide for highly
efficient and fault-tolerant read operation, and because many services
and data in distributed systems fall under the mostly-read category.
The obvious weakness of ROWA, which has prevented it from being widely
used in commercial wide-area networks, is lack of fault-tolerance for
writes.  This thesis proposes a new ROWA protocol that provides
fault-tolerant write operations without any significant deterioration
of read properties.  This protocol offers immediate benefits to
current systems.

Finally, the thesis studies some issues that are common for all types
of quorums, including semantics of write operations and efficient
protocols for transaction commitment.

TR # 94-09-09 *
"Architectural Support for Compiler-Generated Data-Parallel Programs"
Alexander C. Klaiber

To fully realize the advantages of parallel processing demands the
design of efficient communication mechanisms.  Existing communication
architectures span a spectrum ranging from message passing to
remote-memory access, shared memory and cache-only architectures.
These communication architectures are often used (and designed to be
used) directly by the programmer.  However, in the future we can expect
more programs to be written in high-level parallel languages and
compiled to the specific parallel target; the compiler will hide the
details of the underlying hardware from the programmer.  The
communication architecture should then be designed with the compiler,
not the programmer, in mind.

The goal of our work is to improve communication performance for
programs that are written in a high-level parallel language and then
compiled to a specific communication architecture.  To make this task
manageable, we focus on the class of data-parallel languages and we
pick C* as one representative for our experiments.

We evaluate three competing communication architectures -
message-passing, remote memory access, and cache-coherent
shared-memory -- for a set of benchmarks written in C* and compiled to
the respective architecture.  We show that the message-passing model
has several inherent advantages for these benchmarks, resulting in
less interconnect traffic and less time spent waiting for messages to
traverse the interconnect.

On the other hand, the message-passing architecture requires the CPU
to perform significantly more work per message than the other
architectures.  This communication overhead destroys much of the
message passing model's advantage.

We propose a language-oriented communication architecture that retains
the advantages of the message-passing model, yet (in cooperation with
the compiler) significantly reduces the communication overhead.  To do
so, we first identify a small set of low-level communication and
synchronization primitives that are well matched to the needs of C*.
We then design a network interface that is tuned to these primitives
and describe the C* compilation for this base; our network interface
includes hardware for remote read/write requests plus counter-based
synchronization support.  We simulate and measure our compiled C*
benchmarks on a traditional message-passing interface as well as our
language-oriented design; our measurements demonstrate that our design
is effective at reducing communication-related CPU overhead.


TR # 94-09-10
"Constraint Satisfaction and Debugging for Interactive User
Interfaces"
Michael Sannella

Many user interface toolkits use constraint solvers to maintain
geometric relationships between graphic objects, or to connect the
graphics to the application data structures.  One efficient and
flexible technique for maintaining constraint is multi-way local
propagation, where constraints are represented by sets of method
procedures.  This dissertation examines the use of local propagation
constraint solvers in user interface toolkits, and presents three new
systems:

(1) The SkyBlue constraint solver.  SkyBlue is an incremental
constraint solver that uses local propagation to maintain a set of
constraints as individual constraints are added and removed.  If all
of the constraints cannot be satisfied, SkyBlue leaves weaker
constraints unsatisfied in order to satisfy stronger constraints
(maintaining a constraint hierarchy).  SkyBlue is a more general
successor to the DeltaBlue algorithm that satisfies cycles of methods
by calling external cycle solvers and supports multi-output methods.

(2) The Multi-Garnet user interface development system.  Garnet is a
user interface toolkit that incorporates a constraint solver to
maintain one-way constraints between object fields. The Multi-Garnet
package extends Garnet to support multi-way constraints and constraint
hierarchies.  MultiGarnet has been used to construct complex user
interfaces that would have been difficult to construct just using
Garnet's constraint solver.

(3) The CNV user interface builder and debugger.  CNV includes a set
of debugging tools to help programmers understand the behavior of
complex constraint networks.  One tool uses a new algorithm to
generate alternate constraint solutions that would have been produced
if SkyBlue had chosen different methods to satisfy the constraints.


TR # 94-09-11 *
"Wavelets for Computer Graphics: A Primer"
Eric J. Stollnitz, Tony D. DeRose, David H. Salesin

Wavelets are a mathematical tool for hierarchically decomposing
functions.  Wavelets allow any function to be described in terms of a
coarse overall shape, plus details that range from broad to narrow.
Regardless of whether the function of interest is an image, a curve,
or a surface, wavelets provide an elegant technique for representing
the levels of detail present.  This primer is intended to provide
those working in computer graphics with some intuition for what
wavelets are, as well as to present the mathematical foundations
necessary for studying and using them.  We include as examples two
applications of wavelets: image compression, and multiresolution
editing of curves and surfaces.


TR # 94-09-12 *
"Integrating Coherency and Recoverability in Distributed Systems"
Michael J. Feeley, Jeffrey S.Chase, Vivek R. Narasayya, and Henry M.
Levy 

We propose a technique for maintaining coherency of a "transactional"
distributed shared memory, used by applications accessing a shared
persistent store.  Our goal is to improve support for fine-grained
distributed data sharing in collaborative design applications, such as
CAD systems and software development environments.  In contrast,
traditional research in distributed shared memory has focused on
supporting parallel programs; in this paper, we show how distributed
programs can benefit form this shared-memory abstraction as well.

Our approach, called log-based coherency, integrates coherency support
with a standard mechanism for ensuring recoverability of persistent
data.  In our system, transaction logs are the basis of both
recoverability and coherency.  We have prototyped log-based coherency
as a set of extensions to RVM (Satyanarayanan et al. '94), a runtime
package supporting recoverable virtual memory.  Our prototype adds
coherency support to RVM in a simple way that does not require changes
to existing RVM applications.  We report on our prototype and its
performance, and discuss its relationship to other DSM systems.


TR # 94-10-01 *
"Global Illumination of Glossy Environments using Wavelets and
Importance"
Per Christensen, Eric Stollnitz, David Salesin, Tony DeRose

We show how importance-driven refinement and a wavelet basis can be
combined to provide an efficient solution to the global illumination
problem with glossy and diffuse reflections.  Importance is used to
focus the computation on the interactions having the greatest impact
on the visible solution.  Wavelets are used to provide an efficient
representation of radiance, importance, and the transport operator.
We discuss a number of choices that must be made when constructing an
algorithm for a finite-element solution method for glossy global
illumination.  The main contributions of this paper are a
four-dimensional wavelet representation for spatially- and
angularly-varying radiance distributions (which allows for sparse
representation and fast evaluation), the use of the standard wavelet
decomposition of the transport operator, and the formulation of a type
of importance suited for light transports using a finite element
radiance representation.  Our implementation supports curved surfaces
and spatially-varying anisotropic BRDFs.  We use a final gathering
step to improve the visual quality of the solution.


TR # 94-10-02
"N.W. Laboratory for Integrated Systems Semiannual Technical Report
#6"
Gaetano Borriello, Steven M. Burns, Carl Ebeling, Lawrence Snyder

The goal of our work is to develop synthesis tools for high-performance
custom VLSI and embedded systems.  Our focus is on applications in
real-time control and communications.  The major elements of our
research include:
- Specification methods for timing constraints and delay information.
- Timing analysis and verification algorithms.
- Timing-driven synthesis techniques for hardware and software.
- Tools for the analysis of interfaces and synthesis of glue logic and
  software drivers.
- Infrastructure for the rapid-prototyping of board-level designs.
- Implementation of real systems to ground the CAD work.


TR# 94-10-03 *
"Optimistic Trace-driven Simulation"
Xiaohan Qin and Jean-Loup Baer

Parallel simulation of multiprocessor architectures is a promising
direction because a parallel system provides the high computation and
storage capabilities that are required by detailed architectural
simulation.  Additionally, the behavior of the target system exhibits
natural parallelism. In this paper, we consider the evaluation of the
memory hierarchy of multiprocessor systems via parallel trace-driven
simulation.  We present a Time Warp-like parallel trace-driven
simulation algorithm and data structures.  The overhead components of
the optimistic algorithm as well as a number of optimization
strategies are discussed.  The performance of the optimistic parallel
simulator as implemented on a KSR-2 is reported.  Our results show that
significant speedup can be achieved for applications which have good
data locality.  As the amount of shared reference misses increases,
the frequent communication and synchronization overhead inherent in
the applications limits also the speedup of the parallel trace-driven
simulation.


TR # 94-10-04 *
"A Comparative Study of Conservative and Optimistic Trace-driven
Simulation"
Xiaohan Qin and Jean-Loup Baer

In this paper, we consider the evaluation of the memory hierarchy of
multiprocessor systems via parallel trace-driven simulation.  We
study two parallel simulation schemes: a conservative one using an
algorithm proposed by Lin et al., whose main characteristic is to
insert the shared references from every trace in all other traces, and
an optimistic one using a Time Warp-like algorithm.  We compare,
qualitatively and quantitatively, the major causes of overhead and the
overall performance of the two methods.  In addition, we discuss the
trade-offs in terms of implementation and debugging effort and of
application to more general architectural simulation.  The optimistic
scheme is more complex but, in general, has slightly better
performance, is more general, and does not require preprocessing.


TR # 94-10-05 *
"Scheduling Memory Constrained Jobs on Distributed Memory Parallel
Computers"
Cathy McCann and John Zahorjan

While the parallel use of many processors is a major attraction of
scalable multiprocessors for large applications, another important
feature of such machines is the large amount of physical memory they
make available.  Despite these resources, truly large applications may
be limited not only by the amount of computation they require, but
also by the amount of data they must operate on to solve problems of
interest.  

In this paper, we consider the problem of multiprocessor scheduling of
jobs whose memory requirements place lower bounds on the fraction of
the machine required in order to execute.  We address three primary
questions in this work:

1. How can a parallel machine be multiprogrammed with minimal overhead
   when jobs have minimum memory requirements?

2. To what extent does the inability of an application to repartition
   its workload during runtime affect the choice of processor
   allocation policy?

3. How rigid should the system be in attempting to provide equal
   resource allocation to each runnable job in order to minimize
   average response time?


TR # 94-10-06 *
"ZPL Language Reference Manual"
Calvin Lin

This document presents a complete description of the ZPL language.
The language is presented without motivation in a bottom-up fashion,
starting with lexical elements and then building upon these to
introduce the ZPL data types, variables, expressions, and statements.
A table of contents and an extensive index is found at the end.

ZPL is a data parallel language that allows arrays and subarrays to be
manipulated as whole entities.  The language provides constructs that
eliminate tedious and error prone array indexing.  The language's
conciseness is illustrated by the SIMPLE computational fluid dynamics
benchmark, which is approximately 5000 lines when coded in C plus
message passing, but only 500 lines and considerably more readable
when coded in ZPL.  ZPL programs have sequential semantics and control
flow, which considerably simplifies the program development and
debugging process.  A major goal of ZPL is to provide portability and
performance across diverse parallel computers.  As such, ZPL is built
upon the Phase Abstractions parallel programming model.


TR # 94-11-01 *
"Optimal One-Way Sorting on a One-Dimensional Sub-Bus Array"
James D. Fix and Richard E. Ladner

The problem of sorting on a one-dimensional sub-bus array of the
processors is addressed.  A one-dimensional sub-bus array has a bus
connecting the processors which can be segmented into sub-buses on
which an active processor can broadcast.  The sub-bus broadcast
capability is implemented on the MasPar MP-1 and MP-2 parallel
computers.  The sub-bus broadcast operation makes possible a new class
of parallel sorting algorithms which can be characterized by "parallel
insertion steps."  We restrict our attention to the class of sorting
methods where parallel insertion steps are in the same direction, say
left.  For this class of "one-way sorting strategies" we prove a lower
bound and give two strategies, both of which achieve the lower bound.
Because our parallel insertion model is quite general, it is not
necessarily the case that a sorting strategy in the model can be
efficiently implemented on a real sub-bus array of processors.  The
left greedy sort cannot be efficiently implemented, but the left
adaptive insertion sort can be efficiently implemented.  The two
sorting strategies have different properties and each is interesting
in its own right.


TR # 94-11-02
"Wirec 3.2 Tutorial and Reference Manual"
Larry McMurchie and Carl Ebeling

Wirec is a system for describing the structure of electronic systems.
It allows the designer to use either procedural constructs or
schematic symbols and wires to describe circuit structure.  The
procedural constructs, in this case C++, give the power and
flexibility to describe structures algorithmically.  Schematic symbols
and wires have the advantage of being graphical and therefore
intuitive means of representing circuits.  Wirec allows the designer
to embed C++ code in schematics and schematics in C++ code, and
therefore mix the two modes of representation at a very fine grain.


TR # 94-11-03 *
"A Characterization of the `Dynamic Markov Compression' FSM
with Finite Conditioning Contexts"
Suzanne Bunton

The well-known but poorly understood "Dynamic Markov Compression"
(DMC) algorithm is rigorously analyzed.   DMC is one of many
sophisticated on-line stochastic data models from the literature that
may be combined with arithmetic coding to perform lossless data
compression.  We define and prove correct the first finite-context
characterization of DMC model states. Our minimal characterization
proves that DMC's ergodic, unifilar, finite-order Markov models do not
belong to the useful class of FSMX sources.  Nonetheless, our
characterization enables meaningful comparison of DMC models with the
many popular FSMX models in the literature in terms of model structure
and a priori statistical assumptions.  Lastly, the
characterization exposes principled solutions for improving DMC's
already competitive performance, and for reducing DMC's excessive
memory consumption, which prevents DMC models from being used in
practice.  


TR # 94-11-07
"Determination of DRAM Operating Limits with Applications in
Self-Tuning"
Chris Fisher, Jean-Paul Tea, Sean Mirkes, Steve Rabin, Ted Kehl

Dynamic random access memory is commonly used for storing large
amounts of data due to its relatively inexpensive price and its small
space requirements.  The purpose of this research is to investigate
the possibilities for enhancing the performance of DRAM.  This
includes determining the minimal timing for RAS and CAS, investigating
data and address dependency, and determining the amount of jitter in
DRAM circuitry.


TR # 94-11-08 *
"Wit:  An Infrastructure for Wireless Palmtop Computing"
Terri Watson

Wit is a system infrastructure that supports wireless connection of
palmtop computers to a wired local area network (LAN).  The systems
includes a primitive windowing system, non-preemptive threads,
reliable and unreliable connections, and a Tcl interpreter
which allows applications to dynamically extend the palmtop system and
user interface by downloading sequences of Tcl code.  Together, these
services provide an environment for exploring issues in mobile
computing for small devices by supporting a wide range of solutions
for hiding low bandwidth constraints, and permitting disconnected
operation through local execution of application functions.


TR # 94-12-01 *
"Optimization of Object-Oriented Programs Using Static Class Hierarchy
Analysis"
Jeffrey Dean, David Grove, and Craig Chambers

Optimizing compilers for object-oriented languages apply static
analysis and other techniques to try to deduce precise information
about the possible classes of the receivers of messages; if successful,
dynamically-dispatched messages can be replaced with direct procedure
calls and potentially further optimized through inline-expansion.  By
examining the complete inheritance graph of a program, which we call
"class hierarchy analysis," the compiler can improve the quality of
static class information and thereby improve run-time performance.  In
this paper we present class hierarchy analysis and describe techniques
for implementing this analysis effectively in both statically- and
dynamically-typed languages and also in the presence of
multi-methods.  We also discuss how class hierarchy analysis can be
supported in an interactive programming environment and, to some
extent, in the presence of separate compilation.  Finally, we assess
the bottom-line-performance improvement due to class hierarchy
analysis alone and in combination with two other "competing"
optimizations, profile-guided receiver class prediction and method
specialization.

TR # 94-12-02
"AZPL Programming Guide"
Lawrence Snyder

This guide seeks to be a complete introduction to the ZPL programming
language assuming the reader is experienced with some imperative
language such as C, Fortran, Pascal, or the like.  Though precise and
in most instances thorough, the guide does not attempt to be a
reference manual for ZPL.  Rather, it illustrates typical ZPL usage
and explains in an intuitive way how the constructs work.  Emphasis is
placed on teaching the reader to be a ZPL programmer.  Scientific
computations are used as examples throughout.

Computer scientists familiar with the area of programming high
performance parallel computers and interested in an accelerated route
through the guide are advised to study Fig. 1, skim "A Quick Tour,"
read "ZPL Array Concepts" and "Parallel Execution - Two Important
Details."


TR # 94-12-03 *
"Runtime Support for Dynamic Space-Based Applications on Distributed
Memory Multiprocessors"
Immaneni Ashok

"Dynamic space-based" applications are simulations of objects moving
through a closed k-dimensional space subject to mutual forces.  There
are a wide variety of such applications, differing in the kinds of
objects and forces being simulated.  These applications exhibit strong
data locality patterns, but these patterns change during the
computation as objects change position in the simulated space.

To achieve good performance when run on distributed memory
multiprocessors, two conflicting goals must be addressed: the strong
spatial data locality must be exploited, and the computational load
must be balanced across the processors.  These optimizations can be
done statically, by either the programmer or the compiler, or
dynamically, by handwritten application code or a runtime system.
This dissertation investigates the issues involved in designing a
specialized runtime system that provides convenience of programming as
well as efficient execution.

First, we address the issues in designing a programming interface for
dynamic space-based applications.  We propose a new programming model
that enables the development of parallel codes that involve very few
additional lines of code and very few additional concepts beyond those
required to construct a sequential version of the application.

Second, we address the issues in data partitioning and dynamic load
balancing.  We propose heuristics that can be effectively used to
determine a good partitioning scheme based on the application execution
characteristsics and the machine characteristics.  We propose a novel
dynamic load balancing scheme that uses a "non-uniform, adaptive load
discretization" method to estimate the load distribution, a
"hierarchical" scheme to balance the load, and a "predictive" method
to determine how often to rebalance.  We show that the schemes that we
propose are very effective in reducing the execution overheads due to
communication, load imbalance, and load balancing.

Third, we study the issues in adapting to processor reallocations
performed by the operating system on a multiprogrammed system.  We show
that it is advantageous for the runtime system to remap the data,
unless the processor allocation changes too often or there is very
little time left over to finish the job.


TR # 94-12-07 *
"Wait Free Algorithms for Heaps"
Greg Barnes

This paper examines algorithms to implement heaps on shared memory
multiprocessors.  A natural model for these machines is an
asynchronous parallel machine, in which the processors are subject to
arbitrary delays.  On such machines, it is desirable for algorithms to
be "wait-free," meaning that each thread makes progress independent of
the other threads executing on the machine.  We present a wait-free
algorithm to implement heaps.  The algorithms are similar to the
general approach given in an earlier work with optimizations that
allow many threads to work on the heap simultaneously, while still
guaranteeing a strong serializability property.


TR # 95-01-01 *
"TRIBORS: A Triplet-Based Object Recognition System"
Kari Pulli

This paper describes TRIBORS, a triplet-based object recognition
system.  Object recognition is facilitated by probability models,
which are statistical data structures describing the appearance of
object features when viewed from a point in a view class.  The
features are arranged into triplets, and the appearance of the
triplets in the camera image is encoded in a 2D affine-invariant
manner.  The initial probability model is obtained from a set of
synthesized images.  Once objects in real camera images have been
recognized using the initial probability model, the results of
analyzing these images can be used to construct more realistic
probability models.  The use of these improved models results in
faster object recognition and improved classifications.


TR # 95-01-02 *
"Multiresolution Analysis of Arbitrary Meshes"
Matthias Eck, Tony DeRose, Tom Duchamp, Hugues Hoppe, Michael
Lounsbery, Werner Stuetzle

In computer graphics and geometric modeling, shapes are foten
represented by triangular meshes.  With the advent of laser scanning
systems, meshes of extreme complexity are rapidly becoming
commonplace.  Such meshes are notoriously expensive to store,
transmit, render, and are awkward to edit.  Multiresolution analysis
offers a simple, unified, and theoretically sound approach to dealing
with these problems.  A multiresolution representation of a mesh
consists of a simple base mesh together with a sequence of local
corrections of decreasing magnitude.  By omitting small correction
terms we can efficiently construct approximations with fewer triangles
for compact storge and fast transmission and rendering.  When editing
meshes it is often desirable to modify the gross shape without
affecting surface details.  We can accomplish this by modifying terms
in the approximation with large spatial extent.

Lounsbery et al. have recently developed a technique for creating
multiresolution representations for a restricted class of meshes with
"subdivision connectivity."  Unfortunately, meshes encountered in
practice typically do not meet this requirement.  In this paper we
present a method for overcoming the subdivision connectivity
restriction, meaning that completely arbitrary meshes can now be
converted to multiresolution form.  The method is based on the
approximation of an arbitrary initial mesh M by a mesh M^j that has
subdivision connectivity and is guaranteed to be within a specified
tolerance.

Multiresolution approximation of an arbitrary mesh thus proceeeds in
two steps: in the first step we approximate M and M^J, and in the
second step we convert M^J to a multiresolution representation using
the methods of Lounsbery et al.

The key ingredient of our algorithm is the construction of a
parametrization of M over a simple domain.  We expect this
parametrization to be of use in other contexts, such as texture
mapping or the approximation of complex meshes by NURBS patches.


TR # 95-01-04 *
"Automated Bargaining Agents (Preliminary Results)"
Ying Sun and Daniel Weld

Rapid commercialization of the Internet and development of the
National Information Infrastructure are likely to change the nature of
retailing and commerce in profound ways.  These changes pose both
challenges and opportunities for AI, such as how to design intelligent
assistants that help humans cope with increasing complexities of
electronic commerce.  In this paper, we present the first steps
towards creating "automated bargaining agents" -- intelligent
assistants that can reason about the relative supply and demand for
goods and services and negotiate to reach a good deal.  We first
review the economic and game theoretic background for such an
endeavor.  We then present a simple economic model of electronic
commerce and describe several bargaining strategies.  Preliminary
experimental results suggest that sophisticated, adaptive strategies
perform better than simple discounting approaches over a range of
economic conditions.


TR # 95-01-06
"Fast Multiresolution Image Querying"
Charles E. Jacobs, Adam Finkelstein, David H. Salesin

We present a method for searching in an image database using a query
image that is similar to the intended target.  The query image may be
a hand-drawn sketch or a (potentially low-quality) scan of the image
to be retrieved.  Our searching algorithm makes use of multiresolution
decompositions of the query and database images.  The coefficients of
these decompositions are distilled into small "signatures" for each
image.  We introduce an "image querying metric" that operates on these
signatures, and which includes parameters that can be tuned, using a
statistical analysis, to accommodate the kinds of image distortions
found in different types of image queries.  The resulting algorithm is
simple, fast, and requires very little storage overhead for the
database of signatures.  Our experiments with 176 queries in a
database of 1024 images show dramatic improvement, in both speed and
success rate, over using an L^1 or L^2 norm.


TR # 95-01-07 *
"Clustering for Glossy Global Illumination"
Per Christensen, Dani Lischinski, Eric Stollnitz, David Salesin

We present a new clustering algorithm for global illumination in
complex environments.  The new algorithm extends previous work on
clustering for radiosity to allow for nondiffuse (glossy) reflectors.
We treat each cluster as a point source with an angular distribution
of radiant intensity, and we derive an error bound for transfers
between these clusters.  The algorithm groups input surfaces into a
hierarchy of clusters, and then permits clusters to interact only if
the error bound is below an acceptable tolerance.  We show that the
algorithm is asymptotically more efficient than previous clustering
algorithms even when restricted to ideally diffuse environments.
Finally, we demonstrate a working implementation of the algorithm on a
complex architectural environment.


TR # 95-01-08 *
"Extending the Applicability and Improving the Performance of Runtime
Parallelization"
Shun-Tak Leung and John Zahorjan

When static analysis of a sequential loop fails to yield reliable
information on its dependence structure, a parallelizing compiler is
left with three alternatives: it can take the conservative option of
emitting code for a sequential execution; it can optimistically emit
code to speculatively execute the loop as a DOALL; or it can emit
inspector-executor code to determine the actual dependence structure
at runtime and to respect it in a parallel execution.  The first
approach is certain to yield a slow execution.  The second approach
gives very good results when the loop can in fact be executed as a
DOALL, but is of no help otherwise.

In this paper we concentrate on the final approach, runtime
parallelization through the inspector-executor method.  We have two
goals in this work.  The first is to expand the class of loop to which
the approach may be applied by removing restrictions on the loop
dependence structures that it can handle.  To achieve this goal, we
introduce new forms of the inspector and executor that together remove
all restrictions on the loop dependence structure.  Thus, we show how
to parallelize a class of loop that previously would have compelled
the compiler to emit sequential code.

Our second goal is to improve the performance of the
inspector-executor approach through specializations applicable when
static analysis yields some (weak) information about the array
indexing functions used in assignments.  We validate our work through
a set of examples designed to illustrate the fundamental performance
tradeoffs characterizing the specialized implementations, using
results taken from executions on 32 processors of a KSR1.


TR # 95-02-01 *
"Implementing Lighweight Remote Procedure Calls in the Mach 3
Operating System"
Virgil Bourassa and John Zahorjan

The Mach 3 operating system makes extensive use of remote procedure
calls (RPCs) to provide services to client applications.  Although the
existing Mach 3 RPC facility is highly optimized, the incorporation of
a Lightweight Remote Procedure Call (LRPC) facility further reduces
this critical cost.  This paper describes the implementation of LRPC
in the Mach 3 operating system, reviewing the original LRPC concept
and implementation in the TAOS operating system, the issues involved
with the Mach 3 microkernel operating system, and the resulting design
of LRPC_Mach3.  Performance measurements indicate that the resulting
implementation provides a 31% reduction in latency for a minimal RPC,
with even more significant benefits for more complicated RPCs.


TR # 95-02-02 *
"Sound and Efficient Closed-World Reasoning for Planning"
Oren Etzioni, Keith Golden, and Daniel Weld

Closed-world reasoning is the process of inferring that a logical
sentence is false based on its absence from a knowledge base, or the
inability to derive it.  Previous work on circumscription,
autoepistemic logic, and database theory has explored logical
axiomatizations of closed-world reasoning, and investigated
computational tractability for propositional theories.  Work in
planning has traditionally made the closed-world "assumption" but has
avoided closed-world reasoning.  We take a middle position, and
describe a novel method for closed-world reasoning over the
first-order theories of action used by planning algorithms such as
NONLIN, TWEAK, and UCPOP.  We show the method to be both sound and
efficient.  The method has been incorporated in to the XII planner,
which supports our Internet softbot (software robot).  In our
experiments, closed-world inference consistently averaged about 2
milliseconds, while updates averaged approximately 1.2 milliseconds.


TR # 95-02-03 *
"Lightweight Source Model Extraction"
Gail Murphy and David Notkin

Reverse engineers depend on the automatic extraction of information
from source code.  Some useful kinds of information -- source models
-- are well-known: call graphs, file dependences, etc.  Predicting
every kind of potentially useful information is impossible.  We have
developed a lightweight approach for generating flexible and tolerant
source model extractors from lexical specifications.  The approach is
lightweight in that the specifications are relatively small and easy
to write.  It is flexible in that there are few constraints on the
information in the source that can be extracted (e.g., we can extract
from macros, comments, etc.).  It is tolerant if the information 
can be extracted from source that cannot necessarily be compiled or
linked.  In essence, we scan for source constructs that contribute to
the specified source model while ignoring constructs that do not
contribute to that source model.  We have developed tools to support
this approach and applied the tools to the extraction of a number of
different source models (file dependences, event interactions, call
graphs) from multiple source code formats (C, C++, CLOS).  We discuss
our approach and describe its application to extract source models not
available using existing systems; for example, we compute the
invoked-by relation over Filed tools.  We compare and contrast our
approach to the conventional approach of generating source models from
a program database.