DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
                    UNIVERSITY OF WASHINGTON
           TECHNICAL REPORT ABSTRACTS - FROM 93-06-01 -> 94-01-09


Obtaining Electronic Technical Reports:
  Technical reports in this list with an asterisk (*) following the
  TR number are available on-line as of April 1995   
  via FTP (ftp.cs.washington.edu), or 
  the WWW (http://www.cs.washington.edu/research/tr/techreports.html)

Obtaining Hardcopy Technical Reports:
  To obtain a technical report on paper, sent to you via Snail Mail,
  please contact <tr-request@cs.washington.edu>.

If you need additional information or assistance, please contact:
  <tr-request@cs.washington.edu>
 

                          ABSTRACTS
  
TR #93-06-01 *
"Performance of Chaos and Oblivious Routers Under Non-Uniform
Traffic"
Melanie L. Fulgham and Lawrence Snyder

Chaos router is a nonminimal adaptive packet router that is both
deadlock free and probabilistically livelock free.  Unlike its
predecessors, the Chaos router uses randomization to provide
an efficient way to avoid livelock.  The Chaos router has been
compared in software simulation studies to the state-of-the-art
oblivious routers (i.e., dimension order routers) on the hypercube
topology, and the mesh and torus topologies.  The results, based on
simulations of two different workloads, one uniform and one
non-uniform load, show that in general the Chaos router is 
superior to the oblivious router.  However, two workloads alone are
not representative of the typical types of traffic encountered in
practice.  In this paper five additional non-uniform workloads are
simulated.  Each was chosen to represent one of the various types of
communication encountered in typical programs.  The workloads,
compared on both the 256 node torus and hypercube topologies, show
considerable variety and reveal insight on the effectiveness of
Chaotic adaptive routing on non-uniform loads.  Data presented
includes the saturation point of the networks under the various loads,
the throughput as a function of presented load, latency, and total
delay of packets through the network.  Collectively, this data gives a
much clearer and complete characterization of the Chaos and oblivious
routers' behavior on non-uniform traffic.


TR #93-06-02 *
"Managing Abstraction-Induced Complexity"
David Keppel

Abstraction reduces the apparent complexity of an implementation by
hiding all but "the most relevant" details.  However, no interface
is suitable for all the users of the implementation for exactly the
same reason: each user has a slightly different view of what is 
"most relevant."  Thus, although abstractions can reduce complexity,
working around their limitations can introduce other complexities.
Some abstractions are designed to minimize the added complexity.  This
paper examines five common abstraction models and uses common examples
to contrast the tradeoffs presented by each model.

TR #93-06-03 *
"An Algorithm for Probabilistic Planning"
Nicholas Kushmerick, Steve Hanks, and Daniel Weld

We define the probabilistic planning problem in terms of a
probability distribution over initial world states, a boolean
combination of propositions representing the goal, a probability
threshold, and actions whose effects depend on the execution-time
state of the world and on random chance.  Adopting a probabilistic
model complicates the definition of plan success: instead of
demanding a plan that "provably" achieves the goal, we seek
plans whose probability of success exceeds the threshold.

In this paper, we present BURIDAN, an implemented least-commitment
planner that solves problems of this form.  We prove that the
algorithm is both sound and complete.  We then explore BURIDAN's
efficiency by contrasting four algorithms for plan evaluation, using
a combination of analytic methods and empirical experiments. We also
describe the interplay between generating plans and evaluating them,
and discuss the role of search control in probabilistic planning.


TR # 93-06-04 *
"Utility Models for Goal-Directed Decision-Theoretic Planners"
Peter Haddawy and Steve Hanks

AI planning agents are goal-directed: success is measured in terms of
whether or not an input goal is satisfied, and the agent's
computational processes are driven by those goals.  A
decision-theoretic agent, on the other hand, has no explicit goals -
success is measured in terms of its preferences or a utility function
that respects those preferences.

The two approaches have complementary strengths and weaknesses.
Symbolic planning provides a computational theory of plan generation,
but under unrealistic assumptions: perfect information about and
control over the world and a restrictive model of actions and goals.
Decision theory provides a normative model of choice under
uncertainty, but offers no guidance as to how the planning options are
to be generated.  This paper unifies the two approaches to planning by
describing utility models that support rational decision making while
retaining the goal information needed to support plan generation.

We develop an extended model of goals that involves temporal deadlines
and maintenance intervals, and allows partial satisfaction of the
goal's temporal and atemporal components.  We then incorporate that
goal information into a utility model for the agent.  Goal information
can be used to establish whether one plan has higher expected utility
than another, but without computing the expected utility values
directly: we present a variety of results showing how plans can be
compared rationally, but using information only about the probability
that they will satisfy goal-related propositions.  We then demonstrate
how this model can be exploited in the generation and refinement of
plans.


TR # 93-06-05 *
"Benchmarks, Testbeds, Controlled Experimentation, and the Design of
Agent Architectures"
Steve Hanks, Martha Pollack, Paul Cohen

The methodological underpinnings of AI are slowly changing.
Benchmarks, testbeds, and controlled experimentation are becoming more
common.  While we are optimistic that this change can solidify the
science of AI, we also recognize a set of difficult issues concerning
the appropriate use of this methodology.  We discuss these issues as
they relate to research on agent design.  We survey existing testbeds
for agents, and argue for appropriate caution in their use.  We end
with a debate on the proper role of experimental methodology in the
design and validation of planning agents.


TR # 93-06-06 *
"Shade: A Fast Instruction-Set Simulator for Execution Profiling"
Robert F. Cmelik and David Keppel

Shade is an instruction-set simulator and custom trace generator.
Application programs are executed and traced under the control of a
user-supplied trace analyzer.  To reduce communication costs, Shade 
and the analyzer are run in the same address space.  To further
improve performance, code which simulates and traces the application
is dynamically generated and cached for reuse.  Current
implementations run on SPARC systems and, to varying degrees, simulate
the SPARC (Versions 8 and 9) and MIPS I instruction sets.  This paper
describes the capabilities, design, implementation, and performance of
Shade, and discusses instruction set emulation in general.

Shade improves on its predecessors by providing their various tracing
capabilities together in a single tool.  Shade is also fast: Running
on a SPARC and simulating a SPARC, SPEC 89 benchmarks run about 2.3
times slower for floating-point programs and 6.2 times slower for
integer programs.  Saving trace data costs more, but Shade provides
fine control over tracing, so users pay a collection overhead only for
data they actually need.  Shade is also extensible so that analyzers
can examine arbitrary target state and thus collect special
information that Shade does not "know" how to collect.


TR #93-06-08 *
"Data Locality On Shared Memory Computers Under Two Programming
Models"
Ton A. Ngo and Lawrence Snyder

Data locality is a well-recognized requirement for the development of
any parallel application, but the emphasis on data locality varies
with the programming model.  In this paper, we examine two models
at the extreme ends of the spectrum with respect to one general 
class of parallel machines.  We present experiments on shared-memory 
machines that indicate that programs written in the nonshared-memory 
programming model generally offer better data locality and thus better
performance. We study the LU decomposition problem and a Molecular 
Dynamics simulation on five shared-memory machines with widely
differing architectures, and analyze the effect of the programming
models on data locality.  The comparison is done through a simple 
analytical model and measurements on the machines.


TR #93-06-09 *
"A Beginner's Guide to the Truckworld Simulator"
Steve Hanks, Dat Nguyen and Chris Thomas

The Truckworld is a multi-agent planning testbed that features
uncertainty, exogenous change, complex interaction among objects in
the world, semi-realistic sensing, and facilities for communication
and other interaction among agents. This document is designed to give
the first-time user a feel for the testbed's capabilities, and also 
will provide instructions on how to obtain the code and run a 
standard configuration of the simulator.


TR # 93-06-10 *
"Modeling a Dynamic and Uncertain World I: Symbolic and Probabilistic
Reasoning about Change"
Steve Hanks and Drew McDermott

Intelligent agency requries some ability to predict the future.  An
agent must ask itself what is presently its best course of action
given what it now knows about what the world will be like when it
intends to act.  This paper presents a system that uses a
probabilistic model to reason about the effects of an agent's proposed
actions on a dynamic and uncertain world, computing the probability
that relevant propositions will hold at a specified point in time.  The
model allows for incomplete information about the world, the
occurrence of exogenous (unplanned) events, unreliable sensors, and
the possibility of an imperfect causal theory.  The system provides an
application program with answers to questions of the form "is the
probability that \phi will hold in the world at time t great than
\tau?"  It is unique among algorithms for probabilistic temporal
reasoning in that it tries to limit its inference according to the
proposition, time, and probability threshold provided by the
application. The system will also notify the application if
subsequent evidence invalidates its answer to a query.


TR #93-07-01 *
"Data Prefetching for High-Performance Processors"
Tien-Fu Chen

Recent technological advances are such that the gap between processor 
cycle times and memory cycle times is growing. Techniques to reduce 
or tolerate large memory latencies become essential for achieving 
high processor utilization.  In this dissertation, we propose and 
evaluate data prefetching techniques that address the data access 
penalty problems.

First, we propose a hardware-based data prefetching approach for
reducing memory latency. The basic idea of the prefetching scheme  
is to keep track of data access patterns in a reference prediction 
table (RPT) organized as an instruction cache. It includes three
variations of the design of the RPT and associated logic: generic
design, a lookahead mechanism, and a correlated scheme. They differ 
mostly on the timing of the prefetching. We evaluate the three schemes
by simulating them in a uniprocessor environment using the ten SPEC 
benchmarks.  The results show that the prefetching scheme effectively 
eliminates a major portion of data access penalty and is particularly 
suitable to an on-chip design and a primary-secondary cache hierarchy.

Next, we study and compare the substantive performance  gains that
could be achieved with hardware-controlled and software-directed
prefetching on shared-memory multiprocessors. Simulation results
indicate that both hardware and software schemes can handle programs
with regular access patterns. The hardware scheme is good at
manipulating dynamic information, whereas software prefetching has the
flexibility of prefetching larger blocks of data and of dealing with
complex data access patterns. The execution overhead of the
additional prefetching instructions may decrease the software
prefetching performance gains. An approach that combines software and
hardware schemes is shown to be very promising for reducing the memory
latency with the least overhead.

Finally, we study  non-blocking caches that can tolerate read and
write miss penalties by exploiting the overlap between post-miss
computations and data accesses. We show that hardware data
prefetching caches generally outperform non-blocking caches.
We derive a static instruction scheduling algorithm to order
instructions at compile time. The algorithm is shown to be effective
in exploiting instruction parallelism available in a basic block 
for non-blocking loads.


TR #93-07-02 *
"Specification, Simulation, and Verification of Timing Behavior"
Tod Amon

Temporal behavior needs to be formally specified, validated, and
verified, if systems that interface with the outside world are to be
synthesized from high-level specifications.  Due to the high level 
of abstraction, the work presented in this thesis applies to systems 
that ultimately can be implemented using hardware, software, or a 
combination of both.

In the area of specification, a generalization of the event-graph 
specification paradigm is presented.  The model supports the
expression of complex functionality using an operational semantics and
cleanly integrates structure into the event-based paradigm.  Temporal
relationships between systems events are specified using a
denotational semantics that relies on both chronological and causal
relationships to identify the discrete event occurrences being 
constrained.

It is of critical importance that a design representation support user
validation.  A simulator, called OEsim, has been implemented to
support validation.  The simulator can report timing constraint
violations, and detect inconsistencies between the operational and
denotational specifications.

Synthesis algorithms can exploit guarantees regarding temporal
behavior to produce efficient implementations, and often need timing
information in order to properly synthesize designs from abstract
specifications.  Two verification tools are presented in this
dissertation.  The first demonstrates the feasibility and benefits of
performing symbolic timing verification, in which variables are used
in place of numbers.  The second addresses a fundamental problem:
determining how synchronization affects the temporal behavior of
concurrent systems.  


TR #93-08-01 *
"A (More) Formal Definition of Communicating Real-Time State Machines"
Alan C. Shaw

The language of communicating real-time state machines is defined
precisely in three parts.  First, the syntax of a single machine and
of a set of connected machines are described.  Then, the static
semantics is described as the set of execution paths obtained through
a static analysis.  Finally, the dynamic semantics is defined by
specifying a simulation algorithm that produces execution traces or
histories.  The most difficult and interesting aspect is that dealing
with time.


TR #93-08-02
"Three-Dimensional Shape from Color Photometric Stereo"
Per Christensen and Linda Shapiro

Computer vision systems can be used to determine the shape of real
three-dimensional objects for purposes of object recognition and pose
estimation or for CAD applications. One method that has been developed
is "photometric stereo."  This method uses several images taken from
the same viewpoint, but with different lightings, to determine the
three-dimensional shape of an object. Most previous work in
photometric stereo has been with gray-tone images; color images have
only been used for dielectric materials.  In this paper we describe a
procedure for color photometric stereo, which recovers the shape of a
colored object from two or more color images of the object under white
illumination. This method can handle different types of materials,
such as composites and metals, and can employ various reflection
models such as the Lambertian, dichromatic, and Torrance-Sparrow
models. For composite materials, colored metals, and dielectrics there
are two advantages of utilizing color information: at each pixel,
there are more constraints on the orientation, and the result is less
sensitive to noise. Hence the shape can be found more accurately.  The
method has been tested on both artificial and real images of objects
of various materials.


TR #93-09-01 *
"Building Softbots for UNIX (Preliminary Report)"
Oren Etzioni, Neal Lesh and Richard Segal

AI is moving away from "toy tasks" such as block stacking towards
real-world problems.  This trend is positive, but the amount of
preliminary groundwork required to tackle a real-world task can be
software environments, such as operating systems or databases, as
domains for agent research.  The cost, effort, and expertise required
to develop and systematically experiment with software agents is
relatively low.  Furthermore, software environments circumvent many
thorny, but peripheral,  research issues that are inescapable in other
environments.  Thus, software environments enable us to test agents in
a real world yet focus on core AI research issues.  To support this
claim, we describe our project to develop UNIX "softbots" (software
robots) - intelligent agents that interact with UNIX.  Existing 
softbots accept a diverse set of high-level goals, generate and
execute plans to achieve these goals in real time, and recover from
errors when necessary.


TR #93-09-02 *
"The Interaction Between Static Typing and Frameworks"
Gail Murphy and David Notkin

Frameworks capture the commonalities in design and implementation
between a family of related applications and are typically expressed
in an object-oriented language.  Software engineers use frameworks to
reduce the cost of building complex applications.  This paper
characterizes the operations of instantiation, refinement and
extension used to build applications from frameworks and explores how
these operations are supported by static typing policies of common 
object-oriented languages.  We found that both conservative
contravariant and covariant static typing policies were effective in
supporting the operations of framework instantiation and framework
extension.  However, both policies were ineffective at supporting the
operation of framework refinement.  Although it does not support the
refinement operation itself, covariance is sufficiently expressive to
support the instantiation of a properly refined framework.  This
result illustrates how programming languages can at times complicate
rather than simplify the engineering of software.


TR #93-09-03 *
"Prism: A Case Study in Behavioral Entity-Relationship Modeling
and Design"
Kevin Sullivan, Ira Kalet, and David Notkin

We describe the collaboratively developed "Prism," a tightly
integrated but architecturally flexible environment for modeling,
visualizing, and simulating radiation treatment plans for cancer
patients.  Despite significant advances in function, we built Prism at
low cost in effort and code size.  The key was using "behavioral
entity-relationship (ER) modeling" to craft an architecture, and
objects representing "abstract behavioral types (ABTs)" to implement
it.  In more detail, we model a system as a set of independently
defined entities used directly by clients but made to work together
by behavioral relationships connecting them.  We implement this model
in an imperative programming framework by representing both entities
and relationships as instances of classes that define, announce, and
register with events in addition to defining and calling operations.
Prism provided a realistic test of the behavioral ER modeling and
design method.


TR #93-09-04 *
"Kaleidoscope: A Constraint Imperative Programming Language"
Gus Lopez, Bjorn Freeman-Benson, and Alan Borning

The Constraint Imperative Programming (CIP) family of languages
integrates constraints and imperative, object-oriented programming.
In addition to combining the useful features of both paradigms, there
are synergistic effects of this integration, such as the ability to
define constraints over user-defined domains.  We discuss
characteristics of the CIP family and provide a rationale for its
creation.  The synergy of constraints and objects imposes additional
challenges for the provision of constructs, such as object identity
and class membership, that are well-understood in conventional
language paradigms.  We discuss the benefits and challenges of
combining the constraint and imperative paradigms, and present our
current ideas in the context of the design and implementation of the
Kaleidoscope'93 language.


TR #93-09-05 *
"The Meerkat Multicomputer"
Robert B. Bedichek and Curtis Brown

Meerkat is a distributed memory multicomputer architecture that
scales to hundreds of processors.  Meerkat uses a two dimensional
passive backplane to connect nodes composed of processors, memory, and
I/O devices.  The interconnect is conceptually simple, inexpensive to
design and build, has low latency, and provides high bandwidth on long
messages.  However, it does not scale to thousands of processors, does 
not provide high bandwidth on short messages, and does not provide
cache coherent shared memory.  Our hypothesis is that many 
general-purpose, database, and parallel numerical workloads work well
on systems with Meerkat's characteristics.  We describe the Meerkat
architecture, the niche that Meerkat fills, the motivation behind our
design choices, and give performance results obtained from our hardware
prototype and a calibrated simulator.


TR #93-09-06 *
"UCPOP User's Manual - Version 2.0"
Anthony Barrett, Keith Golden, Scott Penberthy and Daniel Weld

The UCPOP partial order planning algorithm handles a subset of the
KRSL action representation that corresponds roughly with
Pednault's ADL language.  In particular, UCPOP operates with
actions defined using conditionals and nested quantification.  This
manual describes a COMMON LISP implementation of UCPOP, explains how
to set up and run the planner, details the action representations
syntax, and outlines the main data structures and how they are
manipulated.  Improvements in Version 2.0 include:

  Declarative specification of search control rules
  Universal quantification over dynamic universes (i.e., object
    creation and destruction)
  Domain axioms
  Predicates expanding to lisp code
  Larger set of domain theories search functions for testing
  Expanded users manual
  Improved CLIM-based graphic plan-space browser


TR #93-09-08
"The Truckworld Manual"
Dat Nguyen, Steve Hanks and Chris Thomas

The Truckworld is a multi-agent planning testbed that has been
designed with the following goals in mind:

* To model complex causal interactions between different
  entities in the simulated world, thereby providing a rich
  problem domain for the experimenter.

* To provide a model of sensing that includes incompleteness,
  noise, and cost.

* To support uncertainty in the world through agent commands 
  that yield uncertain effects and exogenous events that occur
  beyond the agent's control.

* To provide an interface that facilitates the user in running
  and customizing a simulation.

* To support interaction among multiple agents.

This manual provides an overview of the Truckworld simulator, details
the steps to run a simulation, and presents the specification language
for customizing the simulation.


TR #93-09-09 *
"Protocol Compilation: High-Performance Communication for Parallel 
Programs"
Edward W. Felten

One obstacle to the use of distributed-memory multicomputers has been
the disappointing performance of communication primitives.  Although
the hardware is capable of moving data very quickly, the software has
been unable to exploit this potential.

The main cause of this "communication gap" is "protocol overhead."
In addition to moving the application's data between processors,
communication systems must engage in other communication and
synchronization to ensure correct execution.  Protocol overhead is an
inherent attribute of parallel computation - it occurs on all
machines, under all programming models.

To reduce the effect of protocol overhead, I advocate the use of
"tailored protocols" - communication protocols custom-designed to
work with a particular application program.  A tailored protocol takes
advantage of advance knowledge about the behavior of the application
program to "slim down" the protocol.  Unfortunately, designing
tailored protocols by hand is very difficult.

The process of designing tailored protocols can be automated.  A
"protocol compiler" is a goal that takes an application program as
input, and produces as output a tailored protocol for that program.
This tailored protocol is plug-compatible with the existing
message-passing library, so its use is transparent to the application
program.

The bulk of this dissertation discusses the principles underlying the
design of a protocol compiler, including the forms of program analysis
the protocol compiler must carry out.  In addition to this general
discussion.  I describe the details of Parachute a working prototype I
have constructed.  Parachute generates tailored protocols for
data-parallel programs running on the Intel series of multicomputers.

Performance models predict, and experiments with Parachute on real
applications confirm, that using a protocol compiler leads to a large
decrease in communication time.  Experiments indicate that Parachute,
despite several engineering compromises in its design, cuts
communication time by 17%.  Reductions in overall application running
time range from 0% to 20%, with an average reduction of 7.3%.  
More aggressive implementation, or new hardware, will lead to 
even larger speedups.


TR #93-10-01
"On the Recognition of Arabic Documents"
Badr Al-Badr

This paper explores methodologies for optical character recognition with
a special emphasis on the Arabic language, using five papers as guides.
The features of written Arabic are described.  And a general model for
OCR systems, MOCR, is defined.  Each of the five papers is reviewed.
Using the model, the approaches in those papers are decomposed and
compared.  The paper concludes by summarizing the contributions of
each of the papers and outlining directions for future research.


TR #93-10-02 *
"Complexity of Sub-Bus Mesh Computations"
Anne Condon, Richard E. Ladner, Jordan Lampe and Rakesh Sinha

We investigate the time complexity of several fundamental problems on
the sub-bus mesh parallel computer with p processors.


TR #93-10-03
"Lower Bounds on Universal Traversal Sequences Based on Chains of
Length 5"
Jonathan Buss and Martin Tompa

Universal traversal sequences for cycles require length at least
proportional to n^(1.43), improving the previous bound of n^(1.33).
For d at least 3, universal traversal sequences for d-regular graphs
require length at least proportional to d^(0.57) n^(2.43).  For
constant d, the best previous bound was n^(2.33).  


TR #93-10-04 *
"Impact of Sharing-Based Thread Placement on Multithreaded
Architectures"
Radhika Thekkath and Susan J. Eggers

Multithreaded architectures context switch between instruction streams 
to hide memory access latency. Although this improves processor 
utilization, it can increase cache interference and degrade overall
performance. One technique to reduce the interconnect traffic is to
co-locate threads that share data on the same processor. Multiple 
threads sharing in the cache should reduce compulsory and invalidation 
misses, thereby improving execution time.

To test this hypothesis, we compared a variety of thread placement 
algorithms via trace-driven simulation of fourteen coarse- and 
medium-grain parallel applications on several multithreaded
architectures.  Our results contradict the hypothesis. Rather than
decreasing, compulsory and invalidation misses remained nearly
constant across all placement algorithms, for all processor
configurations, even with an infinite cache. That is, sharing-based
placement had no (positive) effect on execution time. Instead, load
balancing was the critical factor that affected performance. Our
results were due to one or both of the following reasons: (1) the
sequential and uniform access of shared data by the application's
threads and, (2) the insignificant number of data references that
require interconnect access, relative to the total number of
instructions.


TR #93-10-05 *
"Multiresolution Analysis for Surfaces of Arbitrary Topological Type"
Tony D. DeRose, Michael Lounsbery, Joe Warren

Multiresolution analysis provides a useful and efficient tool for
representing shape and analyzing features at multiple levels of
detail.  Although the technique has met with considerable success when
applied to univariate functions, images, and more generally to
functions defined on real^n, to our knowledge it has not been extended
to functions defined on surfaces of arbitrary genus.

In this report, we demonstrate that multiresolution analysis can
be extended to surfaces of arbitrary genus using techniques from
subdivision surfaces. We envision many applications for this work,
including automatic level-of-detail control in high-performance
graphics rendering, compression of CAD models, and acceleration
of global illumination algorithms. We briefly sketch one of these
applications, that of automatic level-of-detail control of polyhedral
surfaces.


TR #93-11-01 *
"Processor Allocation Policies for Message-Passing Parallel Computers"
Cathy McCann and John Zahorjan

When multiple jobs compete for processing resources on a parallel
computer, the operating system kernel's processor allocation policy
determines how many and which processors to allocate to each. 
In this paper we investigate the issues involved in constructing a
processor allocation policy for large scale,
message-passing parallel computers supporting a scientific workload.

We make four specific contributions:

*  We define the concept of "efficiency preservation" as a
   characteristic of processor allocation policies.  Efficiency
   preservation is the degree to which the decisions of the
   processor allocator degrade the processor efficiencies
   experienced by individual applications 
   relative to their efficiencies when run alone.

*  We identify the interplay between the kernel processor allocation
   policy and the application load distribution policy as a
   determinant of efficiency preservation.

*  We specify the details of two families of processor allocation
   policies, called "Equipartition and Folding."  Within each family,
   different member policies cover a range of efficiency preservation
   values, from very high to very low.

By comparing policies within each family as well as between
families, we show that high efficiency preservation is essential
to good performance, and that efficiency preservation is a more
dominant factor in obtaining good performance than is equality of
resource allocation.

We employ four evaluation techniques in our work.  A simple algebraic
model, coupled with Monte-Carlo simulation, is used to obtain the
basic efficiency preservation characteristics of the policies under
static workloads.  A Markovian birth-death model is used to estimate
mean response times under homogeneous job arrivals.

Finally, a simulation model is used to explore performance under
heterogeneous workloads, and to evaluate the fairness of the policies.


TR #93-11-02 *
"Evaluating Runtime-Compiled Value Specific Optimization"
David Keppel, Susan Eggers and Robert Henry

Traditional compiler optimizations are either data-independent or
optimize around common data values while retaining correct behavior
for uncommon values.  This paper examines "value-specific"
data-dependent optimizations (VSO), where code is optimized at runtime
around particular input values.  Because VSO optimizes for the
specific case, the resulting code is more efficient.  However, since
optimization is performed at runtime, the performance improvement must
more than pay for the runtime compile costs.  We describe two VSO
implementation techniques and compare the performance of applications
that have been implemented using both VSO and static code.  The
results demonstrate that VSO produces better code and often for
reasonable input sizes.  The machine-independent implementations
showed speedups of up to 1.5 over static C code, and the
machine-dependent versions showed speedups of up to 4.3 over static
assembly code.


TR #93-11-03 *
"The Challenges of Mobile Computing"
George H. Forman and John Zahorjan

Advances in wireless networking technology have engendered a new
paradigm of computing, called "mobile computing," in which users
carrying portable devices have access to a shared infrastructure
independent of their physical location.  This provides flexible
communication between people and continuous access to networked
services.  Mobile computing is expected to revolutionize the way
computers are used.

This paper is a survey of the fundamental software design pressures
particular to mobile computing.  The issues discussed arise from three
essential requirements: the use of wireless networking, the ability to
change locations, and the need for unencumbered portability.
Promising approaches to address these challenges are identified, along
with their shortcomings.


TR #93-12-01
"Planning with Continuous Change"
J. Scott Penberthy

This thesis presents ZENO, a computer program that generates plans of
action for achieving goals in a continuously changing world.  The
program searches a space whose nodes are partially completed plans and
whose arcs are the modifications that separate one plan from the next.
Actions and goals in the world are described in a language combining
first-order logic with piecewise linear constraints.  Actions occur
over extended intervals of time and can update the world at any time
during their execution.  These properties enable the program to reason
about continuous change, metric preconditions, metric effects,
deadline goals, and simultaneous action.  The planning algorithms are
developed, proven sound and complete, then demonstrated through an
empirical study of the computer program as it solves simple problems.


TR #93-12-02 *
"Multi-Resolution Tiling"
David Meyers

This paper describes an efficient method for constructing a tiling
between a pair of planar contours. The problem is of interest in a
number of domains, including medical imaging, biological research
and geological reconstructions. Our method requires O(n) space and
appears to require O(n log(n)) time for average inputs, compared
to the O(n^2 log(n)) time and O(n^2) space required by the
optimizing algorithm of Fuchs, Kedem and Uselton. The
results computed by our algorithm are in many cases very nearly the
same as those of their optimizing algorithm, but at a small fraction
of the computational cost. The performance improvement makes the
algorithm usable for large contours in an interactive system. The
use of multiresolution analysis also allows for tiling after partial
reconstruction of the original contours from a low-resolution
version, with controlled loss of detail. The use of lower resolution
approximations to the original contours can result in significant
savings in the time required to display a reconstructed object, and
in the space required to store it.


TR #93-12-03 *
"Multicomputer Interconnection Network Channel Design"
Kevin Bolding

Massively parallel multicomputers require a high-performance
interconnection network.  The physical channels which connect nodes in
the network are the components that most commonly impose restrictions
on the amount of data which can be communicated between nodes of the
network.  Several different schemes for maximizing the bandwidth of
physical interconnection channels are examined.

Conventional technology centers on 5V electrical signalling.  Within
this paradigm, there are several different methods of maximizing
throughput of a channel, given its limited bandwidth.  Simplex and
duplex channels are examined, as well as arbitration protocols and
transmission line pipelining.  Non-conventional technologies which
achieve higher bandwidth transmission through low-voltage signalling
and fiber optic communication are also examined.  For the present, it
is expected that most networks will continue to use electrical
signalling with low-voltage signalling because fiber optic hardware is
too expensive.  As technology changes, though, it may become more
common for networks to be built out of optical or other new
technologies.


TR #93-12-04 *
"Probabilistic Planning with Information Gathering and Contingent
Execution" 
Denise Draper, Steve Hanks, and Dan Weld

One way of coping with uncertainty in the planning process is to plan
to gather information about the world at execution time, building a
plan contingent on that information.  Literature on decision making
discusses the concept of information-producing actions, the value of
information, and plans contingent on information learned from tests,
but these concepts are missing from AI representations and algorithms
for plan generation.

This paper presents a planning algorithm that models
information-producing (sensing) actions and constructs plans whose
executions depend on the nature of the information gathered from
sensors.  We build on the BURIDAN probabilistic planning algorithm
[Kushmerick et al., 1993], extending the action representation to
model the behavior of imperfect sensors, and combine it with a
frameworld for contingent action that extends the CNLP algorithm for
conditional execution.  The result, C-BURIDAN, is an implemented
planner that extends the functionality of both BURIDA and CNLP.


TR #93-12-05 *
"Concord: Re-Thinking the Division of Labor in a Distributed Shared
Memory System" 
J. William Lee

A collection of processors connected by a network --- either a
distributed-memory multiprocessor or a network of workstations ---
offers a potentially scalable and cost-effective solution for
running scientific and engineering applications. Programming this kind
of machines, however, has proved to be difficult. One promising
approach is a distributed shared memory system that lets the
programmer specify the granularity of coherence, avoiding introducing
false sharing. But the challenge in this approach is to offer ease of
programming while maintaining high performance.

The Concord distributed shared memory system meets this challenge by
carefully splitting the responsibilities among the programmer, the
compiler, and the runtime system in three areas: programming,
debugging, and performance tuning. The runtime system and the compiler
cooperate to let the programmer specify easily arbitrary units of
sharing, including dynamic data structures. The runtime system and the
compiler also cooperate to support runtime error checking. The
programmer may specify information as part of performance tuning, and
the runtime system can use this information to coalesce coherence
messages.

A prototype system of Concord has allowed us to program real
applications on an Intel iPSC/2 in a short period of time and achieve
reasonable speedup.


TR #93-12-06 *
"Performance of User-Level Communication on Distributed-Memory
Multiprocessors with an Optimistic Protocol"
J. William Lee
	
The scalability and performance of parallel applications on
distributed-memory multiprocessors depends to a great extent on the
performance of communication. Although the communication hardware on
distributed-memory multiprocessors can transfer data quickly,
applications often fail to achieve communication performance
matching the hardware's capacity. Two key impediments to achieving
high performance communication are flow control and protection
boundaries.

This paper proposes and evaluates two techniques to reduce the cost of
flow control and the cost of crossing protection boundaries. First, to
optimize the common case, processors may send messages optimistically
without ensuring that the destination processor of the message has
enough buffer space to receive the message. Second, to avoid crossing
protection boundaries through expensive traps and software interrupts,
the kernel and the user-level address space may cooperate and
communicate asynchronously through shared data structures. Implemented
using these two techniques, our prototype system on an Intel iPSC/2
improves the performance of message passing primitives by a factor up
to 2.5 and the performance of real applications by up to 30%.


TR #93-12-07
"Codebook Organization to Enhance Maximum A Posteriori Detection of 
Detection of Progressive Transmission of Vector Quantized Images Over 
Noisy Channels"
Ren-Yuh Wang, Eve A. Riskin and Richard Ladner

We describe a new way to organize a full search vector quantization
codebook so that images encoded with it can be sent progressively and
have immunity against channel noise.  The codebook organization
guarantees that the most significant bits (MSBs) of the codeword 
index are most important to the overall image quality and are highly 
correlated.  Simulations show that the effective channel error rates
of the MSBs can be substantially lowered by implementing a Maximum A
Posteriori (MAP) detector similar to one suggested by Phamdo and
Farvardin.  The performance of the scheme is close to that of
Pseudo-Gray coding at lower bit error rates and outperforms it at
higher rates.  No extra bits are used for channel error correction.


TR #93-12-08 *
"Faster Dynamic Linking for SPARC V8 and System V.4"
David Keppel and Stephen Russell

Dynamic linking is again becoming common and processor pipeline
depths are increasing.  These factors make it worth retuning dynamic
linker implementations.  This note examines the current dynamic
linking scheme used by System V.4 running on SPARC V8 processors and
shows four alternative implementations that can improve the
performance of calls to dynamically-linked functions.  Improvements
come from reducing the number of instructions that must be executed to
perform a call; from reducing the dynamically cached size of the
linkage table; and by performing control transfers using instruction
immediates instead of indirecting off of a register.  In the best
case, the overhead of calling a dynamically linked instruction is
reduced from four instructions with a register indirect to a single
instruction and no register indirect.  The paper also discusses issues
and techniques that may be useful for other systems that perform
dynamic using similar techniques.  This note also briefly considers
dynamic relinking.


TR #93-12-09 *
"Abstractions for Portable, Scalable Parallel Programming" 
Gail A Alverson, William G. Griswold, Calvin Lin, David Notkin and 
Lawrence Snyder

In parallel programming, the need to manage communication costs, load
imbalance, and irregularities in the computation puts substantial
demands on the programmer.  Key properties of the architecture, such
as the number of processors and the costs of communication, must be
exploited to achieve good performance.  Coding these properties
directly into a program compromises the portability and flexibility of
the code because significant changes are usually needed to port or
enhance the program.  We describe a parallel programming model that
supports the concise, independent description of key aspects of a
parallel program---such as data distribution, communication, and
boundary conditions---without reference to machine idiosyncrasies.
The independence of such components improves portability by allowing
the components of a program to be tuned independently, and encourages
reuse by supporting the composition of existing components.  The
architecture-sensitive aspects of a computation are isolated from the
rest of the program, reducing the need to make extensive changes to
port a program.  This model is effective in exploiting both data
parallelism and functional parallelism.  This paper provides
programming examples, compares this work to related languages, and
presents performance results.


TR #94-01-01 *
"Piecewise Smooth Surface Reconstruction"
Hugues Hoppe, Tony DeRose, Tom Duchamp, Hubert Jin, John McDonald,
Werner Stuetzle

We present a general method for automatic reconstruction of accurate,
concise, piecewise smooth surface models from scattered range data.
The method can be used in a variety of applications such as reverse
engineering, that is, the construction of CAD models from physical
objects.  Novel aspects of the method are its ability to model
surfaces of arbitrary topological type and to recover sharp
features such as creases and corners.  The method has proven to be
effective, as demonstrated by a number of examples using both
simulated and real data.

A key ingredient in the method, and a principal contribution of this
paper, is the introduction of a new class of piecewise smooth surface
representations based on subdivision.  These surfaces have a number of
properties that make them ideal for use in surface reconstruction:
they are simple to implement, they can model sharp features concisely,
and they can be fit to scattered range data using an unconstrained
optimization procedure.


TR #94-01-02 
"Tractable Closed-World Reasoning with Updates"
Keith Golden, Oren Etzioni and Dan Weld

Closed-world reasoning is the process of inferring that a logical
sentence is false based on its absence from a knowledge base, or the
inability to derive it.  Previous work on circumscription,
autoepistemic logic, and database theory has explored logical
axiomatizations of closed-world reasoning, but has not addressed
computational tractability.  Work in planning has traditionally made
the closed-world  "assumption" but has avoided closed-world
reasoning.  We take a middle position, and describe a tractable method
for closed-world reasoning over a restricted logical theory of the
sort utilized by planning algorithms such as "strips," "nonlin," and
"tweak."  We show the method to be both sound and tractable
(linear-time updates), and incorporate it into the UCPOP planner.
Experiments utilizing our Internet softbot (software robot)
demonstrate that the method can substantially improve the softbot's
performance by eliminating redundant information gathering.


TR  #94-01-03
"To Sense or Not to Sense?  (A Planner's Question)"
Keith Golden, Oren Etzioni and Dan Weld

Classical planners have traditionally made the closed world assumption
--- facts absent from the planner's world model are false.  More
recent planners, which plan with incomplete information, make the open
world assumption --- the truth value of a fact absent from the
planner's  model is unknown, and must be sensed.   The open world
assumption leads to two difficulties: (1) How can the planner
determine the scope of a universally quantified goal?  (previous
incomplete-information planners do not accept such goals) (2) When is
a sensory action "redundant," yielding information already known to
the planner? 

This paper describes the fully-implemented "xii" planner which
represents and reasons about "local closed world information" (LCW)
to handle universally quantified goals and avoid redundant  sensing.
We report on experiments, utilizing our UNIX softbot (software robot),
which demonstrate that "LCW" can substantially improve the softbot's
performance by eliminating redundant information gathering.


TR #94-01-04 *
"A Parallel Trace-driven Simulator: Implementation and Performance"
Xiaohan Qin and Jean-Loup Baer

The simulation of parallel architectures requires an enormous
amount of CPU cycles and, in the case of trace-driven simulation,
of disk storage. In this paper, we consider the evaluation of 
the memory hierarchy of multiprocessor systems via parallel 
trace-driven simulation. We refine  Lin et al.'s original algorithm,
whose main characteristic is to insert the shared references
from every trace  in all other traces, by reducing the amount 
of communication between simulation processes. We have implemented 
our algorithm on a KSR-1. Results of our experiments on traces 
of four applications and three different cache coherence protocols 
show that parallel trace-driven simulation yields significant 
speedups over its sequential counter-part. The communication 
overhead is not substantial compared to the dominant overhead
due to the processing of replicated inserted references.

We also investigate filtering techniques for multiprocessor traces.
We show how to filter--in parallel--private and shared references.
Our technique generates filtered traces for various block sizes
in a single pass. As expected, the simulation of filtered traces 
is much faster but parallel simulation of filtered traces is not 
as effective since the ratio of unfiltered shared to private 
references is now much larger.


TR #94-01-05 *
"Importance-Driven Wavelet Radiance"
Per H. Christensen, Eric J. Stollnitz, David H. Salesin and Tony D.
DeRose

In this paper, we show how the ideas of
importance-based transport and wavelet analysis can be combined to
provide an efficient solution to the radiance transport problem.
Importance is used to focus the computation on just those interactions
having the greatest impact on the final visible solution.  Wavelets
are used to provide an efficient method for representing radiance,
importance, and the transport operator itself.  The framework we
describe is a very general one that supports curved surfaces and
spatially-varying anisotropic BRDFs.  We discuss preliminary results
from an implementation that demonstrate the potential of the
theoretical methods described.


TR #94-01-06 *
"Multiresolution Curves"
Adam Finkelstein and  David H. Salesin

We describe a multiresolution curve representation, based on wavelets,
that allows a curve to be edited at any scale while preserving its
details, to be smoothed at continuous levels, and to be approximated
to within any given error tolerance for scan conversion.  The
algorithms we describe are simple to implement, and efficient in
storage (requiring no more storage than the original m control points)
and speed (typically linear in m).  We focus primarily on B-spline
curves; however, we also demonstrate how the methods described can be
applied to other types of objects, including hand-drawnstrokes with
additional properties like thickness, and tensor-product surfaces.


TR #94-01-07 *
"Interactive Pen-and-ink Illustration"
Michael P. Salisbury, Sean E. Anderson, Ronen Barzel, David H Salesin

We present an interactive system for creating pen-and-ink
illustrations.  The system makes use of stroke textures---collections
of strokes arranged in different patterns---to generate texture and
tone.  The user "paints" with a desired stroke texture to achieve a
desired tone, and the computer draws all of the individual strokes.

Included in the system is support for using scanned or rendered images
as a reference, which provides the user with guides for outline and
tone.  By following these guides closely, the illustration system can
be used for interactive digital halftoning, in which stroke textures
are applied to convey details that would otherwise be lost in this
black-and-white medium.

By removing the burden of placing individual strokes from the user,
the illustration system makes it possible to create fine stroke work
with a purely mouse-based interface.  Thus, this approach holds
promise for bringing high-quality black-and-white illustration to the
world of personal computing and desktop publishing.


TR #94-01-08 *
"Computer-Generated Pen-and-Ink Illustration"
Georges Winkenbach and David H. Salesin

This paper describes the principles of traditional pen-and-ink
illustration, and shows how a great number of them can be implemented
as part of an automated rendering system. It introduces "stroke
textures," which can be used for achieving both texture and tone with
line drawing. Stroke textures also allow resolution-dependent
rendering, in which the choice of strokes used in an illustration is
appropriately tied to the resolution of the target medium. We
demonstrate these techniques using complex architectural models,
including Frank Lloyd Wright's "Robie House." 


TR #94-01-09 *
"Multiresolution Painting and Compositing"
Deborah F. Berman, Jason T. Bartell, David H. Salesin

We describe a representation for "multiresolution images"---images
that have different resolutions in different places---and methods for
creating such images using painting and compositing operations.  These
methods are very easy to implement, and they are efficient in both
memory and speed.  At a particular resolution, the representation
requires space proportional only to the amount of detail actually
present, and the most common painting operations, "over" and
"erase," require time proportional only to the number of pixels
displayed.  Finally, we show how "fractional-level zooming" can be
implemented in order to allow a user to display and edit portions of a
multiresolution image at any arbitrary size.