DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF WASHINGTON TECHNICAL REPORT ABSTRACTS - 95-03-01 -> 95-07-02 Obtaining Electronic Technical Reports: Technical reports in this list with an asterisk (*) following the TR number are available on-line as of July 1995 via FTP (ftp.cs.washington.edu), or the WWW (http://www.cs.washington.edu/research/tr/techreports.html) Obtaining Hardcopy Technical Reports: To obtain a technical report on paper, sent to you via Snail Mail, please contact . If you need additional information or assistance, please contact: ABSTRACTS TR # 95-03-01 * Darren C. Cronquist "Simultaneous Place and Route for Wire-Constrained FPGAs" Simulated annealing placement algorithms which use minimum wire length metrics based on rectilinear approximations fail to accurately account for an FPGA's routing resources since the number of logic block interconnections could be limited, causing certain placements to rely on resources which may not exist. In this paper we present a simulated annealing-based placement algorithm which performs a simple but effective route after each swap. We will show that, on wire-constrained FPGAs, our algorithm is a better evaluator for a given placement than the faster wire length metrics. In particular, on the Triptych 3-input RLB 4x16 array the algorithm achieves final delays ranging from 3.5% to 21.5% faster than delays yielded by a cost function tailored for the architecture. In addition, the algorithm demonstrates its adaptability by producing even better results on the most recent variant of the Triptych architecture. Finally, we will show that our method can be implemented without an unreasonable increase in execution time. TR # 95-03-02 * Gail C. Murphy, David Notkin, and Kevin Sullivan "Software Reflexion Models: Bridging the Gap between Source and High-Level Models" Software engines often use high-level models (for instance, box and arrow sketches) to reason and communicate about an existing software system. One problem with high-level models is that they are almost inaccurate with respect to the system's source code. We have developed an approach that helps an engineer use a high-level model of the structure of an existing software system as a lens through which to see a model of that system's source code. In particular, an engineer defines a high-level model and specifies how the model maps to the source. A tool then computes a software reflexion model that shows where the engineer's high-level model agrees with and where it differs from a source model. The paper addresses practical aspects and provides a formal characterization of reflexion models, as well as including discussions of experiences with reflexion models and of related work. The primary example describes the application of reflexion models to NetBSD, an implementation of Unix comprising 250,000 lines of C code. In only an hour or two the engineer computed several reflexion models that provided him with a useful, global overview of the structure of the NetBSD virtual memory subsystem. Each of the reflexion models took about 30 seconds to compute on a SPARC20. TR # 95-03-03 * William Yost "Cost Effective Fault Tolerance for Network Routing" As the size of interconnection networks for multicomputers increases, it becomes clear that some degree of fault tolerance will be required in order to maintain system reliability at an acceptable level. The interconnection networks contain large numbers of relatively unreliable components. However, cost and performance concerns mandate that the addition of fault tolerance to the network have an unobtrusive impact on network design. The target design space requires reasonable reliability while avoiding exorbitant additional costs. Because routing messages around failed components may require non-minimal routes, it makes sense to examine routers which, by design, allow packets to take non-minimal routes. However, to provide for effective and efficient fault tolerance, a unified, coherent scheme for fault management is required. Chaotic routing, a non-minimal adaptive routing scheme, is augmented with a limited amount of hardware to support fault detection, identification, and reconfiguration so that the network can maintain reliable operation in the presence of faults. The Express Broadcast network is introduced; a low overhead control network orthogonal to the data network that can, in general, provide for fine-grained control of the network, and specifically, provides for global control and synchronization of fault management procedures for Chaos. A high-level design is presented in which fault tolerance is greatly enhanced by the addition of functionality to targeted clocks while maintaining the structure of Chaotic routing and avoiding large incremental costs. TR # 95-03-04 * Pai Chou, Ross Ortega, Gaetano Borriello "The Chinook Hardware/Software Co-Synthesis System" Embedded systems are becoming more commonplace and are being designed by larger numbers of designers with ever tighter constraints on design time. Unfortunately, computer aided design tools for embedded systems have not kept pace with these trends and are still fundamentally identical to those used twenty years ago. The Chinook co-synthesis system, under development at the University of Washington, addresses the automation of the most time-consuming and error-prone tasks in embedded controller design, namely: the synthesis of interface hardware and software needed to integrate system components; the migration of functions between processors and/or custom logic; and the co-simulation of system specifications before, during, and after synthesis. In this paper, we describe the principal elements of the Chinook system and discuss its application to a variety of embedded designs. TR # 95-03-05 * Steven L. Tanimoto "Fast Median Filtering Algorithms for Mesh Computers" Two fast algorithms for median filtering of images using parallel computers having 2-D mesh interconnections are given. Both algorithms assume that an n x n image is loaded onto the mesh with one processing element per pixel. One algorithm performs median filtering over d x d neighborhoods in O(d^2) time and works with pixel values in an arbitrarily large range. This algorithm, while theoretically suboptimal, achieves a lower constant than a previously published asymptotically-optimal algorithm and is simpler to program. The second algorithm assumes that the range of pixel values is limited and relatively small, and it accomplishes median filtering in O(d) time. TR # 95-03-06 * Donald D. Chinn "Packet Routing in Multiprocessor Networks" Large multiprocessor systems have the potential to solve large problems by breaking them into subtasks and solving them in parallel. When breaking a problem into subtasks and combining the results, processors must communicate by sending packets to each other. The time it takes for packets to move across the network that connects the processors will become more significant as multiprocessor systems get larger. The mesh and torus, where processors are arranged in a grid pattern, are popular networks because of their simplicity and their efficient use of space when physically realized. There is a simple algorithm to route packets in these networks, called the dimension order algorithm. In this algorithm, the path of a packet is fixed once it enters the network, regardless of any other traffic in the network. In an adaptive routing algorithm, the path a packet takes from its source to its destination may depend on packets it encounters. Minimal adaptive routing algorithms have the additional advantage that the path each packet takes is a shortest one. One benchmark for a routing algorithm's performance is how quickly it can route an arbitrary static permutation, where each processor is the source for at most one packet and the destination for at most one packet. For a large class of minimal adaptive routing algorithms, this dissertation presents an $\Omega (n^2/k^2)$ bound on the worst case time to route a static permutation of packets on an $n \times n$ mesh or torus with nodes that can hold up to $k \geq 1$ packets each. This is the first nontrivial lower bound on adaptive routing algorithms. The argument extends to a large class of dimension order routing algorithms, yielding an $\Omega (n^2/k)$ time bound. To complement these lower bounds, this dissertation gives two upper bounds: an $O((n^2/k) + n)$ time dimension order routing algorithm and the first instance of a minimal adaptive routing algorithm that achieves $O(n)$ time with constant sized queues per node. This dissertation also presents experimental results for two nonminimal routing algorithms. The results suggest that these algorithms route permutations similar to the ones in the lower bound above in time that is superlinear in $n$. TR # 95-04-01 Geoffrey M. Voelker and Brian N. Bershad "Mobisaic: An Information System for a Mobile Wireless Computing Environment" Mobisaic is a World Wide Web information system designed to serve users in a mobile wireless computing environment. Mobisaic extends the Web by allowing documents to both refer and react to potentially changing contextual information, such as one's current location in the wireless network. Mobisaic relies on client-side processing of HyperText Markup Language documents that support two new concepts: Dynamic Uniform Resource Locators (URLs) and Active Documents. A dynamic URL is one whose results depend upon the state of the user's mobile context at the time it is resolved. An active document automatically updates its contents in response to changes in a user's mobile context. This paper describes the design of Mobisaic, the mechanism it uses for representing a user's mobile context, and the extensions made to the syntax and function of Uniform Resource Locators and HyperText Markup Language documents to support mobility. TR # 95-04-02 * James P. Ahrens and Charles D. Hansen Cost-Effective Data-Parallel Load Balancing Load balancing algorithms improve a program's performance on unbalanced datasets, but can degrade performance on balanced datasets, because unnecessary load redistributions occur. This paper presents a cost-effective data-parallel load balancing algorithm which performs load redistributions only when the possible savings outweigh the redistribution costs. Experiments with a data-parallel polygon renderer show a performance improvement of up to a factor of 33 on unbalanced datasets and a maximum performance loss of only 27 percent on balanced datasets when using this algorithm. TR # 95-04-03 * Lauren J. Bricker, Steven L. Tanimoto, Alex I. Rothenberg, Danny C. Hutama, and Tina H. Wong "Multiplayer Activities that Develop Mathematical Coordination" Four computer applications are presented that encourage students to develop "mathematical coordination" - the ability to manipulate numerical variables in cooperation with other students so as to achieve a definite goal. The programs enable a form of computer-supported cooperative learning (CSCL). In this paper we describe the rationale and design of the programs, the results of an informal evaluation, and possible future work. The games were developed using a special software and hardware environment that facilitates the rapid prototyping of computer-based cooperative-learning materials. This research is part of an ongoing project entitled "Mathematics Experiences Through Image Processing" whose objective is to develop and test educational materials that introduce K-12 students to mathematical ideas within the context of digital image processing activities. TR # 95-04-04 * Scott Hauck, Gaetano Borriello, Carl Ebeling "Achieving High-Latency, Low-Bandwidth Communication: Logic Emulation Interfaces" There is a large amount of interest in using multi-FPGA systems for logic emulation and rapid-prototyping of digital systems. One difficulty with this approach is the handling of the external interfaces of the system. In this paper we describe a generic interface transducer, a board capable of handling the external interfaces of the system under prototype, allowing the emulation to operate in the target environment. We also describe how several protocols can be implemented on this board, including NTSC video, digital audio, PCMCIA, and VMEbus. TR # 95-07-02 * Michael VanHilst and David Notkin "Using C++ Templates to Implement Role-Based Designs" Within the object-oriented technology community, much recent work on design reuse has focused on role-based collaborations distributed across multiple objects. Many benefits can be derived by mapping role-based designs directly into implementations, including greater ease in maintaining the connection between designs and implementations under change, and the opportunity for code reuse along with design reuse. Current efforts in role-based designs don't provide these benefits. We provide a method for mapping role-based designs into implementation, preserving the design without unnecessary constraints on the design structures. Roles are represented as parameterized classes, where the parameters represent the types of the participants in the collaboration. Composition of roles is implicit in the binding of parameters to classes in the implementation. The bindings are created at compile time by type definitions that are separate from the role implementations. In this paper we focus on the use of templates and typedef statements in the C++ language as the supporting mechanisms.