DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
                    UNIVERSITY OF WASHINGTON
           TECHNICAL REPORT ABSTRACTS - 95-03-01 -> 95-07-02


Obtaining Electronic Technical Reports:
  Technical reports in this list with an asterisk (*) following the
  TR number are available on-line as of July 1995
  via FTP (ftp.cs.washington.edu), or
  the WWW (http://www.cs.washington.edu/research/tr/techreports.html)

Obtaining Hardcopy Technical Reports:
  To obtain a technical report on paper, sent to you via Snail Mail,
  please contact <tr-request@cs.washington.edu>.

If you need additional information or assistance, please contact:
  <tr-request@cs.washington.edu>

                          ABSTRACTS
  

TR # 95-03-01 *
Darren C. Cronquist
"Simultaneous Place and Route for Wire-Constrained FPGAs"

Simulated annealing placement algorithms which use minimum wire length
metrics based on rectilinear approximations fail to accurately account
for an FPGA's routing resources since the number of logic block
interconnections could be limited, causing certain placements to rely
on resources which may not exist.  In this paper we present a
simulated annealing-based placement algorithm which performs a simple
but effective route after each swap.  We will show that, on
wire-constrained FPGAs, our algorithm is a better evaluator for a
given placement than the faster wire length metrics.  In particular,
on the Triptych 3-input RLB 4x16 array the algorithm achieves final
delays ranging from 3.5% to 21.5% faster than delays yielded by a cost
function tailored for the architecture.  In addition, the algorithm
demonstrates its adaptability by producing even better results on the
most recent variant of the Triptych architecture.  Finally, we will
show that our method can be implemented without an unreasonable
increase in execution time.

TR # 95-03-02 *
Gail C. Murphy, David Notkin, and Kevin Sullivan
"Software Reflexion Models: Bridging the Gap between Source and
High-Level Models"

Software engines often use high-level models (for instance, box and
arrow sketches) to reason and communicate about an existing software
system.  One problem with high-level models is that they are almost
inaccurate with respect to the system's source code.  We have developed
an approach that helps an engineer use a high-level model of the
structure of an existing software system as a lens through which to
see a model of that system's source code.  In particular, an engineer
defines a high-level model and specifies how the model maps to the
source.  A tool then computes a software reflexion model that shows
where the engineer's high-level model agrees with and where it differs
from a source model.

The paper addresses practical aspects and provides a formal
characterization of reflexion models, as well as including discussions
of experiences with reflexion models and of related work.  The primary
example describes the application of reflexion models to NetBSD, an
implementation of Unix comprising 250,000 lines of C code.  In only an
hour or two the engineer computed several reflexion models that
provided him with a useful, global overview of the structure of the
NetBSD virtual memory subsystem.  Each of the reflexion models took
about 30 seconds to compute on a SPARC20.

TR # 95-03-03 *
William Yost
"Cost Effective Fault Tolerance for Network Routing"

As the size of interconnection networks for multicomputers increases,
it becomes clear that some degree of fault tolerance will be required
in order to maintain system reliability at an acceptable level.  The
interconnection networks contain large numbers of relatively
unreliable components.  However, cost and performance concerns
mandate that the addition of fault tolerance to the network have an
unobtrusive impact on network design.  The target design space
requires reasonable reliability while avoiding exorbitant additional
costs.  Because routing messages around failed components may require
non-minimal routes, it makes sense to examine routers which, by
design, allow packets to take non-minimal routes.  However, to provide
for effective and efficient fault tolerance, a unified, coherent
scheme for fault management is required.  Chaotic routing, a
non-minimal adaptive routing scheme, is augmented with a limited
amount of hardware to support fault detection, identification, and
reconfiguration so that the network can maintain reliable operation in
the presence of faults.  The Express Broadcast network is introduced;
a low overhead control network orthogonal to the data network that
can, in general, provide for fine-grained control of the network, and
specifically, provides for global control and synchronization of fault
management procedures for Chaos.  A high-level design is presented in
which fault tolerance is greatly enhanced by the addition of
functionality to targeted clocks while maintaining the structure of
Chaotic routing and avoiding large incremental costs.


TR # 95-03-04 *
Pai Chou, Ross Ortega, Gaetano Borriello
"The Chinook Hardware/Software Co-Synthesis System"

Embedded systems are becoming more commonplace and are being
designed by larger numbers of designers with ever tighter
constraints on design time.  Unfortunately, computer aided design
tools for embedded systems have not kept pace with these trends and
are still fundamentally identical to those used twenty years ago.
The Chinook co-synthesis system, under development at the University
of Washington, addresses the automation of the most time-consuming
and error-prone tasks in embedded controller design, namely: the
synthesis of interface hardware and software needed to integrate
system components; the migration of functions between
processors and/or custom logic; and the co-simulation of system
specifications before, during, and after synthesis.  In this
paper, we describe the principal elements of the Chinook system and
discuss its application to a variety of embedded designs.


TR # 95-03-05 *
Steven L. Tanimoto
"Fast Median Filtering Algorithms for Mesh Computers"

Two fast algorithms for median filtering of images using parallel
computers having 2-D mesh interconnections are given.  Both algorithms
assume that an n x n image is loaded onto the mesh with one processing
element per pixel.  One algorithm performs median filtering over d x d
neighborhoods in O(d^2) time and works with pixel values in an
arbitrarily large range.  This algorithm, while theoretically
suboptimal, achieves a lower constant than a previously published
asymptotically-optimal algorithm and is simpler to program.  The
second algorithm assumes that the range of pixel values is limited and
relatively small, and it accomplishes median filtering in O(d) time.


TR # 95-03-06 *
Donald D. Chinn
"Packet Routing in Multiprocessor Networks"

Large multiprocessor systems have the potential to solve large
problems by breaking them into subtasks and solving them in parallel.
When breaking a problem into subtasks and combining the results,
processors must communicate by sending packets to each other.  The
time it takes for packets to move across the network that connects the
processors will become more significant as multiprocessor systems get
larger.

The mesh and torus, where processors are arranged in a grid pattern,
are popular networks because of their simplicity and their efficient
use of space when physically realized.  There is a simple algorithm to
route packets in these networks, called the dimension order algorithm.
In this algorithm, the path of a packet is fixed once it enters the
network, regardless of any other traffic in the network.

In an adaptive routing algorithm, the path a packet takes from its
source to its destination may depend on packets it encounters.
Minimal adaptive routing algorithms have the additional advantage that
the path each packet takes is a shortest one.

One benchmark for a routing algorithm's performance is how quickly it
can route an arbitrary static permutation, where each processor is the
source for at most one packet and the destination for at most one
packet.  For a large class of minimal adaptive routing algorithms,
this dissertation presents an $\Omega (n^2/k^2)$ bound on the worst
case time to route a static permutation of packets on an $n \times n$
mesh or torus with nodes that can hold up to $k \geq 1$ packets each.
This is the first nontrivial lower bound on adaptive routing
algorithms.  The argument extends to a large class of dimension order
routing algorithms, yielding an $\Omega (n^2/k)$ time bound.

To complement these lower bounds, this dissertation gives two upper
bounds: an $O((n^2/k) + n)$ time dimension order routing algorithm and
the first instance of a minimal adaptive routing algorithm that
achieves $O(n)$ time with constant sized queues per node.

This dissertation also presents experimental results for two
nonminimal routing algorithms.  The results suggest that these
algorithms route permutations similar to the ones in the lower bound
above in time that is superlinear in $n$.


TR # 95-04-01
Geoffrey M. Voelker and Brian N. Bershad
"Mobisaic: An Information System for a Mobile Wireless Computing
Environment"

Mobisaic is a World Wide Web information system designed to serve
users in a mobile wireless computing environment.  Mobisaic extends
the Web by allowing documents to both refer and react to potentially
changing contextual information, such as one's current location in the
wireless network.  Mobisaic relies on client-side processing of
HyperText Markup Language documents that support two new concepts:
Dynamic Uniform Resource Locators (URLs) and Active Documents.  A
dynamic URL is one whose results depend upon the state of the user's
mobile context at the time it is resolved.  An active document
automatically updates its contents in response to changes in a user's
mobile context.  This paper describes the design of Mobisaic, the
mechanism it uses for representing a user's mobile context, and the
extensions made to the syntax and function of Uniform Resource
Locators and HyperText Markup Language documents to support mobility.


TR # 95-04-02 *
James P. Ahrens and Charles D. Hansen
Cost-Effective Data-Parallel Load Balancing

Load balancing algorithms improve a program's performance on
unbalanced datasets, but can degrade performance on balanced datasets,
because unnecessary load redistributions occur. This paper presents a
cost-effective data-parallel load balancing algorithm which performs
load redistributions only when the possible savings outweigh the
redistribution costs. Experiments with a data-parallel polygon
renderer show a performance improvement of up to a factor of 33 on
unbalanced datasets and a maximum performance loss of only 27 percent
on balanced datasets when using this algorithm.


TR # 95-04-03 *
Lauren J. Bricker, Steven L. Tanimoto, Alex I. Rothenberg, Danny C.
Hutama, and Tina H. Wong
"Multiplayer Activities that Develop Mathematical Coordination"

Four computer applications are presented that encourage students to
develop "mathematical coordination" - the ability to manipulate
numerical variables in cooperation with other students so as to
achieve a definite goal.  The programs enable a form of
computer-supported cooperative learning (CSCL).  In this paper we
describe the rationale and design of the programs, the results of an
informal evaluation, and possible future work. The games were
developed using a special software and hardware environment that
facilitates the rapid prototyping of computer-based
cooperative-learning materials.  This research is part of an ongoing
project entitled "Mathematics Experiences Through Image Processing"
whose objective is to develop and test educational materials that
introduce K-12 students to mathematical ideas within the context of
digital image processing activities.


TR # 95-04-04 *
Scott Hauck, Gaetano Borriello, Carl Ebeling
"Achieving High-Latency, Low-Bandwidth Communication: Logic Emulation
Interfaces"

There is a large amount of interest in using multi-FPGA systems for
logic emulation and rapid-prototyping of digital systems.  One
difficulty with this approach is the handling of the external
interfaces of the system.  In this paper we describe a generic
interface transducer, a board capable of handling the external
interfaces of the system under prototype, allowing the emulation to
operate in the target environment.  We also describe how several
protocols can be implemented on this board, including NTSC video,
digital audio, PCMCIA, and VMEbus. 


TR # 95-07-02 *
Michael VanHilst and David Notkin
"Using C++ Templates to Implement Role-Based Designs"

Within the object-oriented technology community, much recent work on
design reuse has focused on role-based collaborations distributed
across multiple objects.  Many benefits can be derived by mapping
role-based designs directly into implementations, including greater
ease in maintaining the connection between designs and implementations
under change, and the opportunity for code reuse along with design
reuse.  Current efforts in role-based designs don't provide these
benefits.  We provide a method for mapping role-based designs into
implementation, preserving the design without unnecessary constraints
on the design structures.  Roles are represented as parameterized
classes, where the parameters represent the types of the participants
in the collaboration.  Composition of roles is implicit in the binding
of parameters to classes in the implementation.  The bindings are
created at compile time by type definitions that are separate from the
role implementations.  In this paper we focus on the use of templates
and typedef statements in the C++ language as the supporting
mechanisms.