Winter 1992 Technical Report Abstracts
                       From 91-12-01 To 92-05-02
                    Computer Science and Engineering
                        University of Washington 


TR # 91-12-01
``Predicting Program Execution Times by Analyzing Static and
Dynamic Program Paths''
ChangYun Park 

This paper describes a method to predict guaranteed and tight
deterministic execution time bounds of a program. The basic prediction
technique is a static analysis based on simp[le timing schema for
source-level language constructs, which gives accurate predictions in
many cases. Using powerful user-provided information, dynamic path
analysis refines looser predictions by eliminating infeasible paths
and decomposing the possible execution behaviors in a path wise
manner. Overall prediction cost is scalable with respect to desired
precision, controlling the amount of information provided. We
introduce a formal path model for dynamic path analysis, where user
execution information is represented bu a set of program paths. With a
well-defined practical high-level interface language, user information
can be used in an easy and efficient way. We also introduce a method
to verify given user information with known program verification
techniques. Initial experiments with a timing tool show that safe and
tight predictions are possible for a wide range of programs. The tool
can also provide predictions for interesting subsets of program
executions.


TR # 91-12-02
``On the Convergence of a Pyramidal Euclidean Distance Transform''
S. Tanimoto and R.-E. Danielsson

We have explored the possibility of accelerating the discrete
Euclidean distance transform by employing a parallel pyramid machine.
An algorithm was implemented which we call the Pyramidal Euclidean
Distance Transform (PEDT). It is a pyramidal version of a
straightforward algorithm for parallel mesh computers, and it computes
the discrete Euclidean distance map for a given binary image. The
results of experiments and analysis show that in many practical cases,
the pyramidal version converges significantly more quickly than a
parallel mesh-based version. In addition, the pyramidal version has a
property of computing successive approximations to the correct
distance map in such a way that early approximations are quite close
to the final result. This property allows the user to choose between
speed and accuracy in a manner unlike that with other methods. Because
of the pyramidal version's progressive character, it may be especially
appropriate for use in systems with real-time requirements.


TR # 91-12-03
``Surface Reconstruction from Unorganized Points''
Hughes Hoppe, Tony DeRose, Tom Duchamp, John McDonald,
Werner Stuetzle

We describe and demonstrate an algorithm that takes as input an
unorganized set of points  x_1,...,x_n subset IR^3  on or near an unknown
manifold  U, and produces as output a triangulated polyhedron
that approximates U. Neither the topology, the presence of
boundaries, nor the geometry of  U are assumed to be known in
advance --- all are inferred automatically from the data. This problem
arises in a variety of practical situations such as range scanning an
object from multiple view points, recovery of biological shapes from
two-dimensional slices, and interactive surface sketching.


TR # 92-01-01
``Adaptive Guided Self-Scheduling''
Derek L. Eager and John Zahorjan

Loops are an important source of parallelism in application programs.
The iterations of such loops may be scheduled statically
(at compile time) or dynamically (at run-time) onto the processors
of a parallel machine.
Dynamic scheduling methods, although offering increased robustness
and flexibility relative to static approaches,
must be carefully designed if they are to simultaneously
achieve low overhead and good load balance.

In this paper we propose Adaptive Guided Self-Scheduling, an
enhancement of the Guided Self-Scheduling (GSS) approach to the
dynamic scheduling of loops with independent iterations.  Adaptive
Guided Self-Scheduling addresses the two principal weaknesses of GSS:
excessive overhead caused by fine grained scheduling of the final few
loop iterations when iteration execution times are small, and load
imbalance resulting from consecutive iteration execution times that
vary widely but in a correlated way.  We address the first weakness by
adaptively adjusting the scheduling granularity if, at run time, loop
iterations are determined to be so short that overhead would otherwise
be excessive.  We address the second weakness by providing the option
of wrapped assignment of loop iterations, assigning processors groups
of iterations sampled uniformly throughout the iteration space rather
than blocks of consecutive iterations.

Performance measurements are presented for an implementation of Adaptive
Guided Self-Scheduling on a Sequent Symmetry multiprocessor.


TR # 92-01-02
``A Performance Study of Memory Consistency Models''
Richard N. Zucker and Jean-Loup Baer

Recent advances in technology are such that the speed of processors is
increasing faster than memory latency is decreasing. Therefore the
relative cost of a cache miss is becoming more important. However, the
full cost of a cache miss need not be paid every time in a multiprocessor.
The frequency with which the processor must stall on a cache miss can
be reduced by using a relaxed model of memory consistency. 

In this paper, we present the results of instruction-level simulation
studies on the relative performance benefits of using different models
of memory consistency. Our vehicle of study is a shared-memory
multiprocessor with processors and associated write-back caches
connected to global memory modules via an Omega network. The benefits
of the relaxed models, and their increasing hardware complexity, are
assessed with varying cache size, line size, and number of processors.
We find that substantial benefits can be accrued by using relaxed
models bu the magnitudes of the benefits depends on the architecture
being modeled, the benchmarks, and how the code is scheduled. We did
not find any major difference in levels fo improvement among the
various relaxed models.

TR # 92-01-03
``Managing Applications Using the Simmple Network Management
Protocol - A SYBASE Monitoring Prototype''
Julia Lee Jaundalderis

This experiment uses the Internet-standard Network Management
Framework and the SNMP to integrate the management of a SYBASE SQL
server into a network management system. In doing so, we show the
feasibility of integrating application and network management and
explore design and implementation issues. SYBASE is a distributed
relational database management system.

Project objectives include: defining specific SYBASE managed objects
in a Management Information Base extension and exploring issues
involved in building such extensions; building a prototype SNMP agent
to monitor SYBASE; customizing a network management station to
communicate with the SYBASE agent and perform SYBASE specific
processing; exploring design and implementation issues in using SNMP
to manage applications; comparing the management capabilities of our
SNMP-based management system with existing SYBASE management procedures;
and considering the management of other applications based upon our
findings.

TR # 92-02-01
``Performance Issues in Non-blocking Synchronization on
Shared-Memory Multiprocessors''
Juan Alemany and Edward W. Felten

This paper considers the performance of Herlihy's standard protocol
[Her90, Her91a] for implementing non-blocking concurrent objects. We
identify two problems hindering performance of this protocol:
resources wasted by attempted non-blocking operations that fail, and
the cost of unnecessary data copying. We demonstrate these effects
experimentally, and propose techniques to address them. To reduce the
price of failing non-blocking operations, we introduce a protocol that
maintains information about the utilization of the shared object;
experiments show that this protocol performs significantly better than
the alternatives. To reduce the cost of unnecessary data copying, we
introduce an optimistic protocol that avoids copying, except in the
case where a thread of control blocks or is preempted during its
attempted update.


TR # 92-02-02
``A Centralized Token-Based Algorithm for Distributed Mutual Exclusion''
Ed Lazowska

In this paper, we present a new centralized algorithm for distributed mutual 
exclusion that combines the advantages of existing centralized and token-based
algorithms: it takes only two messages to acquire the lock when it is not held 
by another process, and only one message to pass the lock from one process to 
another. At the same time, our algorithm generates significantly less message 
traffic than non-centralized token-based algorithms. We also show that, within 
our approach, the performance of the algorithm can be further improved by 
providing policies that allow the application to make use of information on the
pattern of lock requests, if available. Finally, we discuss the issue of load 
sharing and show that the central node is unlikely to be the bottleneck in the 
system. All this allows us to conclude that the long-neglected centralized 
approach is a better solution for the distributed mutual exclusion problem
than the non-centralized token-based approach.


TR # 92-03-00
``Research Review 1992''
Computer Science and Engineering Faculty

Out-of-Print (a current one is available each year)

TR # 92-03-01
``Distributed Shared Memory with Versioned Objects''
Michael J. Feeley and Henry M. Levy

Distributed Object Memory (DOM) is an abstraction that
represents a distributed-memory system as a single shared container of
language-level objects. The goal of DOM is to simplify programming of
parallel applications for such systems. All accesses to shared memory
are made relative to objects that reside in one or more node-local
memories. For example, in Amber, a DOM system for a network of
workstations, remote references are transparent at the language
level and are implemented using either remote procedure call or object
replication and migration. While DOM can greatly simplify distribution
for many application classes, it is not well suited for all domains
(parallel-scientific codes, in particular).

To address the shortcomings of DOM for such domains, we introduce 
Versioned DOM (VDOM). In VDOM a version number is associated with
each object. Multiple versions of objects can coexist and may be
cached in local memories as needed to increase concurrency. Object
coherence is driven by synchronization methods implicitly
associated with each object. Explicit versioning is used as the basis
for a memory consistency model that facilitates efficient fine-grain
sharing of objects. The units of VDOM coherence are fragment
objects], from which language-level objects are constructed; this
fragmentation solves the false-sharing problem for language-level
objects. The performance of VDOM primitives compared to standard DOM
coherence is presented, along with speedup results for two parallel
applications implemented using VDOM.


TR # 92-03-02
``How to Use a 64-Bit Virtual Address Space''
Jeffrey S. Chase, Henry M. Levy, Miche Baker-Harvey, and Edward D. Lazowska

Most operating systems execute programs in private address spaces
communicating through messages or files. The traditional private
address space model was developed for 16- and 32-bit architectures, on
which virtual addresses are a scarce resource.

The recent appearance of architectures with flat 64-bit virtual
addressing opens an opportunity to reconsider our use of virtual
address spaces. In this paper we argue for an alternative addressing
model, in which all programs and data reside in a single  global
virtual address space shared by multiple protection domains.
Hardware-base memory protection exists within the single address space,
providing firewalls as strong as in conventional systems.

We explore the tradeoffs in the use of a global virtual address space
relative to the private address space model. We contend that a shared
address space can eliminate obstacles to efficient sharing and exchange
of data structures that are inherent in private address space systems.
The shared address space offers advantages similar to some
capability-based computer systems, but can be implemented on standard
page-based hardware, without the performance costs and restricted data
model typical of capability-based systems.

We introduce Opal, an operating system to support this model on paged
64-bit architectures. The key feature of Opal is a flat virtual
address space that includes data on long-term storage and across the
network.


TR # 92-03-03
``Improving Fault Tolerance and Supportive Partial Writes in
Structural Coterie Protocols for Replicated Objects''
Ed Lazowska and Michael Rabinovich

This paper presents a new technique for efficiently controlling replicas
in distributed systems. Conventional structured coterie protocols are
efficient but incur a penalty of reduced availability in exchange for the
performance gain. Further, the performance advantage can only be fully
realized when write operations always replace the old data
item with the new value instead of updating a portion of the data item. Our 
new approach significantly improves availability while allowing partial 
write operations. 

After presenting our general approach, we apply it to an existing
structured coterie protocol and analyze the availability of the resulting
protocol. We also show that other classes of protocols can make use of our
approach.

TR # 92-03-04
``Color Photometric Stereo''
Per H. Christensen

This report describes a method for recovering three-dimensional shape of an
object from two or more two-dimensional color images of the object. The
illumination geometry is different in each image, hence the method is called
color photometric stereo. At each visible point on the surface, each
image imposes a constraint on the possible surface orientations, and if the
constraints are sufficiently strong, the orientation can be determined. Given
the orientation at many points of the surface, the three-dimensional shape of
the object can be determined.

Previously, little work has been done in the use of color for photometric
stereo. Only for dielectric materials has this combination been used.  The main
contribution of this report is to extend the use to other types of materials
such as composite materials and metals. The approach used to derive the surface
orientation from a color is numerical, therefore the reflection model does not
have to be analytically invertible.  The reflection models that can be used
with color photometric stereo are for example the Lambertian, Phong, and
Torrance-Sparrow models.

The advantage of using color images rather than gray-scale images is two-fold:
At each pixel, there are more constraints on the possible orientation, and
hence fewer images of the same object are necessary to recover the shape. Also,
the noise-sensitivity will be less (assuming that random noise in the three
color components will be independent).

The method has been tested on both artificial and real images of composite
surfaces.

TR # 92-03-05
A Sublinear Space, Polynomial Time Algorithm for Directed
s-t Connectivity
Greg Barnes, Jonathan F. Buss, Walter L. Ruzzo, and Baruch
Schieber

Directed s-t connectivity is the problem of detecting whether there is
a path from vertex s to vertex t in a directed graph.  We present the
first known deterministic sublinear space, polynomial time algorithm
for directed s-t connectivity.  For n-vertex graphs, our algorithm can
use as little as (n/2)^(Theta(sqrt log n)) space while still running
in polynomial time.


TR # 92-03-06
``An Expanded System for Coordinate-Free Geometric Programming''
William J. R. Longabaugh

Coordinate-free affine geometric programming has demonstrated its
usefulness in graphics and computer-aided geometric design research.
This thesis describes an expanded system for coordinate-free geometric
programming that includes affine, vector, and projective spaces, and
that provides the ability to easily move between the various geometric
viewpoints.  By providing this broader view of geometry, the system
makes it possible to apply coordinate-free geometric reasoning to a
wider variety of problems.

TR # 92-03-08
``Behavioral Abstraction''
Kevin J. Sullivan and David Notkin

In this paper, We present `behavior abstraction' as a way of thinking
about and organizing software systems. Whereas a data
abstraction is a model an entity in terms solely of the operations
that can be apllied to it, a behavior abstraction is a model of
anentity in terms of both the operation sit supports and the activities
that it exhibits. We adopt the term  behavior abstraction for two
reasons. First, we support abstraction through the combination of
explicit interfaces and encapsulation. Second, we encourage the
modeling of behavior; operations cause behaviors, while activities
reflect them. Benefits of behavior abstraction include improved
correspondence between specifications and designs; improved
abstraction and encapsulation; and behavioral composability of
independent types.


TR # 92-03-09
``Lightweight Shared Objects in a 64-Bit Operating System''
Jeffrey S. Chase, Henry M. Levy, Edward D. Lazowska
 and Miche Baker-Harvey

Object-oriented models are a popular basis for supporting uniform
sharing of data and services in operating systems, distributed
programming systems, and database systems. We term systems that use
objects for these purposes `object sharing systems'.

Operating systems in common use hamper the development of object
sharing systems because their addressing models are nonuniform, making
uniform object naming expensive and difficult to implement. We argue
that emerging 64-bit architectures make it practical to support
uniform naming at the virtual addressing level, eliminating the key
implementation problem for object sharing systems.

We describe facilities for object-based sharing of persistent data and
services in Opal, an operating system we are developing for paged
64-bit architectures. The distinctive feature of Opal is that object
sharing is supported in a runtime library, above a single virtual
address space that maps all primary and secondary storage in a
network.

TR # 92-03-11
``The Case for Application-Specific Communication Protocols''
Edward W. Felten

Message-passing programs are heavily dependent on the performance of
communication primitives.  While communication hardware has
gotten much faster in the last few years, the communication performance
achieved by application programs has not improved so dramatically.  We argue
that this `communication gap' is inherent in the design of conventional
message-passing systems, which are based on general-purpose communication
protocols implemented in the operating system.  Closing the gap requires
fundamental changes in system design.

We review the arguments for user-level implementation of communication
protocols.  We discuss how the architecture and operating system can be
structured to provide user-level communication without compromising system
protection.

We further argue that general-purpose communication protocols, whether
implemented at user level or in the kernel, are inherently slow.
Optimum performance requires a protocol designed to take advantage of
the behavior of a particular application program.  We introduce the notion
of a  `protocol compiler', a program that generates an application-specific
protocol, based on information about an application's communication
behavior. As a concrete example, we sketch the design of a protocol
compiler that can generate nearly optimal communication code for a
class of data-parallel applications.


TR # 92-04-01
``Acquiring Search-Control Knowledge via Static Analysis''
Oren Etzioni

Explanation-Based Learning (EBL) is a widely-used technique for
acquiring search-control knowledge.  Recently, Prieditis, van
Harmelen, and Bundy pointed to the similarity between Partial
Evaluation (PE) and EBL.  However, EBL utilizes training examples
whereas PE does not.  It is natural to inquire, therefore, whether PE
can be used to acquire search-control knowledge, and if so at what
cost?  This paper answers these questions by means of a case study
comparing Prodigy/EBL, a state-of-the-art EBL system, and Static, a
PE-based analyzer of problem-space definitions.  When tested in
Prodigy/EBL's benchmark problem spaces, Static generated
search-control knowledge that was up to three times as effective as
the knowledge learned by Prodigy/EBL, and did so from twenty-six to
seventy-seven times faster.  The paper describes Static's algorithms,
compares its performance to Prodigy/EBL's, noting when Static's
superior performance will scale up and when it will not.  The paper
concludes with several lessons for the design of EBL systems,
suggesting hybrid PE/EBL systems as a promising direction for future
research.


TR # 92-04-02
``Computer Science in Japanese Universities''
David Notkin and Richard D. Schlicting

This paper presents some impressions of computer science in Japanese
universities based on the authors' sabbatical visits.  The focus is
primarily on such structural aspects of the discipline as departmental
organization, faculty and student populations, funding, research
activity, and computing facilities.  Perhaps the key observation is
that Japanese cultural practices influence the way in which computer
science is approached in Japanese universities to a larger degree than
might be imagined.


TR # 92-04-03
``Learning Mathematics via Image Processing: A Rationale and
a Research Direction''
Steven L. Tanimoto

Digital image processing offers several possible new approaches to the
teaching of a variety of mathematical concepts at the middle-school and
high-school levels.  There is reason to believe that this approach will be
successful in reaching some ``at-risk'' students that other approaches
miss.  Since digital images can be made to reflect almost any aspect of the
real world, some students may have an easier time taking an interest in
them than they might with artificial figures or images resulting from other
graphics-oriented approaches.  Using computer-based tools such as image
processing operators, curve-fitting operators, shape analysis operators,
and graphical synthesis, students may explore a world of mathematical
concepts starting from the psychologically ``safe'' territory of their own
physical and cultural environments.  There is reason to hope that this
approach will be particularly successful with students from diverse
backgrounds, girls and members of minority groups, because the imagery used
in experiments can easily be tailored to individual tastes.  
This approach can be explored in a variety ways including the development
of modules either to replace components of existing curricula or to
augment them.  Some possible modules covering both traditional and
non-traditional topics for school mathematics are described.
Some of the issues related to evaluation of such modules are also
discussed.


TR # 92-04-04
``A Performance Analysis of Network I/O in
Shared-Memory Multiprocessors''
Chandramohan A. Thekkath, Derek L. Eager, Edward D. Lazowska,
and Henry M. Levy

Network I/O performance has not kept pace with processor performance
in modern multiprocessor workstations and servers. This paper studies
this problem in the context of two such systems: a 5-processor DEC SRC
Firefly and a 20-processor Sequent Symmetry. We devise an analytic
model of network I/O. We parameterize it using measurements of the two
systems. We use it to study the performance of these systems as they
currently exist, and to explore certain feasible modifications that
can enhance performance. Our experiments indicate that network
controllers designed for uniprocessors may not be suitable for
multiprocessor configurations. They also suggest that a high
performance I/O bus can significantly improve performance for larger
packets.  Finally, they lead us to conclude that unlike uniprocessors,
the performance improvement due to simpler transport protocols is
comparatively small in multiprocessors. These results arise more from
the increased parallelism that is available with multiprocessors than
just from the increased processing power.


TR # 92-04-05
``An Algorithm of Choices for Solving QR Factorization''
Calvin Lin and Lawrence Snyder

Parallel architectures are very different, yet users who wish to widely
distribute their parallel programs want algorithms that run well on all
parallel machines.  We call such algorithms `Algorithms of Choice'. This
concept is introduced and illustrated using four variants of the Modified
Gram-Schmidt method (for solving QR factorization) on five different
parallel machines: the Sequent Symmetry, NCUBE/7, two instances of the
iPSC/2, and the BBN TC2000 Butterfly.  These machines exhibit extreme
architectural diversity, including shared and distributed memory, and three
types of communication structure.  QR factorization presents difficult
challenges for parallel algorithms due to complex memory access, load
balancing requirements, and granularity effects.
Unlike previous studies we consider column as well as row decompositions.  
A key result is the successful use of the CTA model of parallel
computation. A second result is the identification of the best
architecture-independent algorithm for the Modified Gram-Schmidt method.

TR # 92-04-06
``Index Assignment for Progressive Transmission of Full Search Vector
Quantization''
Eve A. Riskin, Les E. Atlas, Richard Ladner and Ren-Yuh Wang

The question of progressive image transmission for full search vector
quantization is addressed via codeword index assignment. Namely, we
develop three new methods of assigning indices to a vector
quantization codebook and formulate these assignments as labels of
nodes of a full search progressive transmission tree. This tree
defines from the bottom up the binary merging of codewords for
successively smaller-sized codebooks. The binary representation for
the path through the tree represents the progressive transmission
code. The methods of designing the tree which we apply are Kohonen's
self-organizing neutral net, a modification of the common splitting
technique for the generalized Lloyd algorithm, and, borrowing from
optimization theory, minimum cost perfect matching. Empirical
experiments were run on a medical image data base to compare the
signal-to-noise ratio (SNR) of the techniques on the intermediate as
well as the final images. While the neural net technique worked
reasonably well, the other two methods performed better and were close
in SNR to each other. We also compared our results to tree-structured
vector quantizers and confirmed that full search VQ has a slightly
higher SNR.


TR # 92-04-07
``Interactive Proof Systems with Polynomially Bounded Strategies''
Anne Condon and Richard Ladner

Interactive proof systems in which the Prover is restricted to have a
polynomial size strategy are investigated. The restriction of
polynomial size computation tree, visible to the Prover, or
logarithmically bounded number of coin flips by the Verifier guarantee
a polynomial size strategy. The additional restriction of logarithmic
space is also investigated. A main result of the paper is that
interactive proof systems in which the Prover is restricted to a
polynomial size strategy are equivalent to MA, Merlin-Arthur games,
defined by Babai and Moran [1]. Polynomial tree size is also
equivalent to MA, but when logarithmic space is added as a
restriction, the power of polynomial tree size reduces to NP.
Logarithmically bounded number of coin flips are equivalent to NP, and
When logarithmic space is added as a restriction, the power is not
diminished. The proof that NP subseteq IP(log-space,
log-random-bits) illustrates an interesting application of the new
``fingerprinting'' method of Lipton [14]. Public interactive proof
systems which have polynomial size strategies are also investigated.


TR # 92-05-02
``High Performance Parallel/Distributed Computing [1991 CISE
Institutional Infrastructure Proposal]''
Computer Science and Engineering Faculty

This technical report contains excerpts from our 1991 CISE
Institutional Infrastructure proposal and addendum. Because a large
cross-section of the department participated in this effort, the
material provides a reasonable picture of our history, status and
direction. Included are summaries of each of our major areas of
research along with bibliographies of recent faculty publications in
each topic. 

The proposal also outlines our infrastructure and facilities, as well
as the results from prior awards. Faculty credentials, recent doctoral
candidates and their theses, and overall department statistics are
summarized in tables.