Winter 1992 Technical Report Abstracts From 91-12-01 To 92-05-02 Computer Science and Engineering University of Washington TR # 91-12-01 ``Predicting Program Execution Times by Analyzing Static and Dynamic Program Paths'' ChangYun Park This paper describes a method to predict guaranteed and tight deterministic execution time bounds of a program. The basic prediction technique is a static analysis based on simp[le timing schema for source-level language constructs, which gives accurate predictions in many cases. Using powerful user-provided information, dynamic path analysis refines looser predictions by eliminating infeasible paths and decomposing the possible execution behaviors in a path wise manner. Overall prediction cost is scalable with respect to desired precision, controlling the amount of information provided. We introduce a formal path model for dynamic path analysis, where user execution information is represented bu a set of program paths. With a well-defined practical high-level interface language, user information can be used in an easy and efficient way. We also introduce a method to verify given user information with known program verification techniques. Initial experiments with a timing tool show that safe and tight predictions are possible for a wide range of programs. The tool can also provide predictions for interesting subsets of program executions. TR # 91-12-02 ``On the Convergence of a Pyramidal Euclidean Distance Transform'' S. Tanimoto and R.-E. Danielsson We have explored the possibility of accelerating the discrete Euclidean distance transform by employing a parallel pyramid machine. An algorithm was implemented which we call the Pyramidal Euclidean Distance Transform (PEDT). It is a pyramidal version of a straightforward algorithm for parallel mesh computers, and it computes the discrete Euclidean distance map for a given binary image. The results of experiments and analysis show that in many practical cases, the pyramidal version converges significantly more quickly than a parallel mesh-based version. In addition, the pyramidal version has a property of computing successive approximations to the correct distance map in such a way that early approximations are quite close to the final result. This property allows the user to choose between speed and accuracy in a manner unlike that with other methods. Because of the pyramidal version's progressive character, it may be especially appropriate for use in systems with real-time requirements. TR # 91-12-03 ``Surface Reconstruction from Unorganized Points'' Hughes Hoppe, Tony DeRose, Tom Duchamp, John McDonald, Werner Stuetzle We describe and demonstrate an algorithm that takes as input an unorganized set of points x_1,...,x_n subset IR^3 on or near an unknown manifold U, and produces as output a triangulated polyhedron that approximates U. Neither the topology, the presence of boundaries, nor the geometry of U are assumed to be known in advance --- all are inferred automatically from the data. This problem arises in a variety of practical situations such as range scanning an object from multiple view points, recovery of biological shapes from two-dimensional slices, and interactive surface sketching. TR # 92-01-01 ``Adaptive Guided Self-Scheduling'' Derek L. Eager and John Zahorjan Loops are an important source of parallelism in application programs. The iterations of such loops may be scheduled statically (at compile time) or dynamically (at run-time) onto the processors of a parallel machine. Dynamic scheduling methods, although offering increased robustness and flexibility relative to static approaches, must be carefully designed if they are to simultaneously achieve low overhead and good load balance. In this paper we propose Adaptive Guided Self-Scheduling, an enhancement of the Guided Self-Scheduling (GSS) approach to the dynamic scheduling of loops with independent iterations. Adaptive Guided Self-Scheduling addresses the two principal weaknesses of GSS: excessive overhead caused by fine grained scheduling of the final few loop iterations when iteration execution times are small, and load imbalance resulting from consecutive iteration execution times that vary widely but in a correlated way. We address the first weakness by adaptively adjusting the scheduling granularity if, at run time, loop iterations are determined to be so short that overhead would otherwise be excessive. We address the second weakness by providing the option of wrapped assignment of loop iterations, assigning processors groups of iterations sampled uniformly throughout the iteration space rather than blocks of consecutive iterations. Performance measurements are presented for an implementation of Adaptive Guided Self-Scheduling on a Sequent Symmetry multiprocessor. TR # 92-01-02 ``A Performance Study of Memory Consistency Models'' Richard N. Zucker and Jean-Loup Baer Recent advances in technology are such that the speed of processors is increasing faster than memory latency is decreasing. Therefore the relative cost of a cache miss is becoming more important. However, the full cost of a cache miss need not be paid every time in a multiprocessor. The frequency with which the processor must stall on a cache miss can be reduced by using a relaxed model of memory consistency. In this paper, we present the results of instruction-level simulation studies on the relative performance benefits of using different models of memory consistency. Our vehicle of study is a shared-memory multiprocessor with processors and associated write-back caches connected to global memory modules via an Omega network. The benefits of the relaxed models, and their increasing hardware complexity, are assessed with varying cache size, line size, and number of processors. We find that substantial benefits can be accrued by using relaxed models bu the magnitudes of the benefits depends on the architecture being modeled, the benchmarks, and how the code is scheduled. We did not find any major difference in levels fo improvement among the various relaxed models. TR # 92-01-03 ``Managing Applications Using the Simmple Network Management Protocol - A SYBASE Monitoring Prototype'' Julia Lee Jaundalderis This experiment uses the Internet-standard Network Management Framework and the SNMP to integrate the management of a SYBASE SQL server into a network management system. In doing so, we show the feasibility of integrating application and network management and explore design and implementation issues. SYBASE is a distributed relational database management system. Project objectives include: defining specific SYBASE managed objects in a Management Information Base extension and exploring issues involved in building such extensions; building a prototype SNMP agent to monitor SYBASE; customizing a network management station to communicate with the SYBASE agent and perform SYBASE specific processing; exploring design and implementation issues in using SNMP to manage applications; comparing the management capabilities of our SNMP-based management system with existing SYBASE management procedures; and considering the management of other applications based upon our findings. TR # 92-02-01 ``Performance Issues in Non-blocking Synchronization on Shared-Memory Multiprocessors'' Juan Alemany and Edward W. Felten This paper considers the performance of Herlihy's standard protocol [Her90, Her91a] for implementing non-blocking concurrent objects. We identify two problems hindering performance of this protocol: resources wasted by attempted non-blocking operations that fail, and the cost of unnecessary data copying. We demonstrate these effects experimentally, and propose techniques to address them. To reduce the price of failing non-blocking operations, we introduce a protocol that maintains information about the utilization of the shared object; experiments show that this protocol performs significantly better than the alternatives. To reduce the cost of unnecessary data copying, we introduce an optimistic protocol that avoids copying, except in the case where a thread of control blocks or is preempted during its attempted update. TR # 92-02-02 ``A Centralized Token-Based Algorithm for Distributed Mutual Exclusion'' Ed Lazowska In this paper, we present a new centralized algorithm for distributed mutual exclusion that combines the advantages of existing centralized and token-based algorithms: it takes only two messages to acquire the lock when it is not held by another process, and only one message to pass the lock from one process to another. At the same time, our algorithm generates significantly less message traffic than non-centralized token-based algorithms. We also show that, within our approach, the performance of the algorithm can be further improved by providing policies that allow the application to make use of information on the pattern of lock requests, if available. Finally, we discuss the issue of load sharing and show that the central node is unlikely to be the bottleneck in the system. All this allows us to conclude that the long-neglected centralized approach is a better solution for the distributed mutual exclusion problem than the non-centralized token-based approach. TR # 92-03-00 ``Research Review 1992'' Computer Science and Engineering Faculty Out-of-Print (a current one is available each year) TR # 92-03-01 ``Distributed Shared Memory with Versioned Objects'' Michael J. Feeley and Henry M. Levy Distributed Object Memory (DOM) is an abstraction that represents a distributed-memory system as a single shared container of language-level objects. The goal of DOM is to simplify programming of parallel applications for such systems. All accesses to shared memory are made relative to objects that reside in one or more node-local memories. For example, in Amber, a DOM system for a network of workstations, remote references are transparent at the language level and are implemented using either remote procedure call or object replication and migration. While DOM can greatly simplify distribution for many application classes, it is not well suited for all domains (parallel-scientific codes, in particular). To address the shortcomings of DOM for such domains, we introduce Versioned DOM (VDOM). In VDOM a version number is associated with each object. Multiple versions of objects can coexist and may be cached in local memories as needed to increase concurrency. Object coherence is driven by synchronization methods implicitly associated with each object. Explicit versioning is used as the basis for a memory consistency model that facilitates efficient fine-grain sharing of objects. The units of VDOM coherence are fragment objects], from which language-level objects are constructed; this fragmentation solves the false-sharing problem for language-level objects. The performance of VDOM primitives compared to standard DOM coherence is presented, along with speedup results for two parallel applications implemented using VDOM. TR # 92-03-02 ``How to Use a 64-Bit Virtual Address Space'' Jeffrey S. Chase, Henry M. Levy, Miche Baker-Harvey, and Edward D. Lazowska Most operating systems execute programs in private address spaces communicating through messages or files. The traditional private address space model was developed for 16- and 32-bit architectures, on which virtual addresses are a scarce resource. The recent appearance of architectures with flat 64-bit virtual addressing opens an opportunity to reconsider our use of virtual address spaces. In this paper we argue for an alternative addressing model, in which all programs and data reside in a single global virtual address space shared by multiple protection domains. Hardware-base memory protection exists within the single address space, providing firewalls as strong as in conventional systems. We explore the tradeoffs in the use of a global virtual address space relative to the private address space model. We contend that a shared address space can eliminate obstacles to efficient sharing and exchange of data structures that are inherent in private address space systems. The shared address space offers advantages similar to some capability-based computer systems, but can be implemented on standard page-based hardware, without the performance costs and restricted data model typical of capability-based systems. We introduce Opal, an operating system to support this model on paged 64-bit architectures. The key feature of Opal is a flat virtual address space that includes data on long-term storage and across the network. TR # 92-03-03 ``Improving Fault Tolerance and Supportive Partial Writes in Structural Coterie Protocols for Replicated Objects'' Ed Lazowska and Michael Rabinovich This paper presents a new technique for efficiently controlling replicas in distributed systems. Conventional structured coterie protocols are efficient but incur a penalty of reduced availability in exchange for the performance gain. Further, the performance advantage can only be fully realized when write operations always replace the old data item with the new value instead of updating a portion of the data item. Our new approach significantly improves availability while allowing partial write operations. After presenting our general approach, we apply it to an existing structured coterie protocol and analyze the availability of the resulting protocol. We also show that other classes of protocols can make use of our approach. TR # 92-03-04 ``Color Photometric Stereo'' Per H. Christensen This report describes a method for recovering three-dimensional shape of an object from two or more two-dimensional color images of the object. The illumination geometry is different in each image, hence the method is called color photometric stereo. At each visible point on the surface, each image imposes a constraint on the possible surface orientations, and if the constraints are sufficiently strong, the orientation can be determined. Given the orientation at many points of the surface, the three-dimensional shape of the object can be determined. Previously, little work has been done in the use of color for photometric stereo. Only for dielectric materials has this combination been used. The main contribution of this report is to extend the use to other types of materials such as composite materials and metals. The approach used to derive the surface orientation from a color is numerical, therefore the reflection model does not have to be analytically invertible. The reflection models that can be used with color photometric stereo are for example the Lambertian, Phong, and Torrance-Sparrow models. The advantage of using color images rather than gray-scale images is two-fold: At each pixel, there are more constraints on the possible orientation, and hence fewer images of the same object are necessary to recover the shape. Also, the noise-sensitivity will be less (assuming that random noise in the three color components will be independent). The method has been tested on both artificial and real images of composite surfaces. TR # 92-03-05 A Sublinear Space, Polynomial Time Algorithm for Directed s-t Connectivity Greg Barnes, Jonathan F. Buss, Walter L. Ruzzo, and Baruch Schieber Directed s-t connectivity is the problem of detecting whether there is a path from vertex s to vertex t in a directed graph. We present the first known deterministic sublinear space, polynomial time algorithm for directed s-t connectivity. For n-vertex graphs, our algorithm can use as little as (n/2)^(Theta(sqrt log n)) space while still running in polynomial time. TR # 92-03-06 ``An Expanded System for Coordinate-Free Geometric Programming'' William J. R. Longabaugh Coordinate-free affine geometric programming has demonstrated its usefulness in graphics and computer-aided geometric design research. This thesis describes an expanded system for coordinate-free geometric programming that includes affine, vector, and projective spaces, and that provides the ability to easily move between the various geometric viewpoints. By providing this broader view of geometry, the system makes it possible to apply coordinate-free geometric reasoning to a wider variety of problems. TR # 92-03-08 ``Behavioral Abstraction'' Kevin J. Sullivan and David Notkin In this paper, We present `behavior abstraction' as a way of thinking about and organizing software systems. Whereas a data abstraction is a model an entity in terms solely of the operations that can be apllied to it, a behavior abstraction is a model of anentity in terms of both the operation sit supports and the activities that it exhibits. We adopt the term behavior abstraction for two reasons. First, we support abstraction through the combination of explicit interfaces and encapsulation. Second, we encourage the modeling of behavior; operations cause behaviors, while activities reflect them. Benefits of behavior abstraction include improved correspondence between specifications and designs; improved abstraction and encapsulation; and behavioral composability of independent types. TR # 92-03-09 ``Lightweight Shared Objects in a 64-Bit Operating System'' Jeffrey S. Chase, Henry M. Levy, Edward D. Lazowska and Miche Baker-Harvey Object-oriented models are a popular basis for supporting uniform sharing of data and services in operating systems, distributed programming systems, and database systems. We term systems that use objects for these purposes `object sharing systems'. Operating systems in common use hamper the development of object sharing systems because their addressing models are nonuniform, making uniform object naming expensive and difficult to implement. We argue that emerging 64-bit architectures make it practical to support uniform naming at the virtual addressing level, eliminating the key implementation problem for object sharing systems. We describe facilities for object-based sharing of persistent data and services in Opal, an operating system we are developing for paged 64-bit architectures. The distinctive feature of Opal is that object sharing is supported in a runtime library, above a single virtual address space that maps all primary and secondary storage in a network. TR # 92-03-11 ``The Case for Application-Specific Communication Protocols'' Edward W. Felten Message-passing programs are heavily dependent on the performance of communication primitives. While communication hardware has gotten much faster in the last few years, the communication performance achieved by application programs has not improved so dramatically. We argue that this `communication gap' is inherent in the design of conventional message-passing systems, which are based on general-purpose communication protocols implemented in the operating system. Closing the gap requires fundamental changes in system design. We review the arguments for user-level implementation of communication protocols. We discuss how the architecture and operating system can be structured to provide user-level communication without compromising system protection. We further argue that general-purpose communication protocols, whether implemented at user level or in the kernel, are inherently slow. Optimum performance requires a protocol designed to take advantage of the behavior of a particular application program. We introduce the notion of a `protocol compiler', a program that generates an application-specific protocol, based on information about an application's communication behavior. As a concrete example, we sketch the design of a protocol compiler that can generate nearly optimal communication code for a class of data-parallel applications. TR # 92-04-01 ``Acquiring Search-Control Knowledge via Static Analysis'' Oren Etzioni Explanation-Based Learning (EBL) is a widely-used technique for acquiring search-control knowledge. Recently, Prieditis, van Harmelen, and Bundy pointed to the similarity between Partial Evaluation (PE) and EBL. However, EBL utilizes training examples whereas PE does not. It is natural to inquire, therefore, whether PE can be used to acquire search-control knowledge, and if so at what cost? This paper answers these questions by means of a case study comparing Prodigy/EBL, a state-of-the-art EBL system, and Static, a PE-based analyzer of problem-space definitions. When tested in Prodigy/EBL's benchmark problem spaces, Static generated search-control knowledge that was up to three times as effective as the knowledge learned by Prodigy/EBL, and did so from twenty-six to seventy-seven times faster. The paper describes Static's algorithms, compares its performance to Prodigy/EBL's, noting when Static's superior performance will scale up and when it will not. The paper concludes with several lessons for the design of EBL systems, suggesting hybrid PE/EBL systems as a promising direction for future research. TR # 92-04-02 ``Computer Science in Japanese Universities'' David Notkin and Richard D. Schlicting This paper presents some impressions of computer science in Japanese universities based on the authors' sabbatical visits. The focus is primarily on such structural aspects of the discipline as departmental organization, faculty and student populations, funding, research activity, and computing facilities. Perhaps the key observation is that Japanese cultural practices influence the way in which computer science is approached in Japanese universities to a larger degree than might be imagined. TR # 92-04-03 ``Learning Mathematics via Image Processing: A Rationale and a Research Direction'' Steven L. Tanimoto Digital image processing offers several possible new approaches to the teaching of a variety of mathematical concepts at the middle-school and high-school levels. There is reason to believe that this approach will be successful in reaching some ``at-risk'' students that other approaches miss. Since digital images can be made to reflect almost any aspect of the real world, some students may have an easier time taking an interest in them than they might with artificial figures or images resulting from other graphics-oriented approaches. Using computer-based tools such as image processing operators, curve-fitting operators, shape analysis operators, and graphical synthesis, students may explore a world of mathematical concepts starting from the psychologically ``safe'' territory of their own physical and cultural environments. There is reason to hope that this approach will be particularly successful with students from diverse backgrounds, girls and members of minority groups, because the imagery used in experiments can easily be tailored to individual tastes. This approach can be explored in a variety ways including the development of modules either to replace components of existing curricula or to augment them. Some possible modules covering both traditional and non-traditional topics for school mathematics are described. Some of the issues related to evaluation of such modules are also discussed. TR # 92-04-04 ``A Performance Analysis of Network I/O in Shared-Memory Multiprocessors'' Chandramohan A. Thekkath, Derek L. Eager, Edward D. Lazowska, and Henry M. Levy Network I/O performance has not kept pace with processor performance in modern multiprocessor workstations and servers. This paper studies this problem in the context of two such systems: a 5-processor DEC SRC Firefly and a 20-processor Sequent Symmetry. We devise an analytic model of network I/O. We parameterize it using measurements of the two systems. We use it to study the performance of these systems as they currently exist, and to explore certain feasible modifications that can enhance performance. Our experiments indicate that network controllers designed for uniprocessors may not be suitable for multiprocessor configurations. They also suggest that a high performance I/O bus can significantly improve performance for larger packets. Finally, they lead us to conclude that unlike uniprocessors, the performance improvement due to simpler transport protocols is comparatively small in multiprocessors. These results arise more from the increased parallelism that is available with multiprocessors than just from the increased processing power. TR # 92-04-05 ``An Algorithm of Choices for Solving QR Factorization'' Calvin Lin and Lawrence Snyder Parallel architectures are very different, yet users who wish to widely distribute their parallel programs want algorithms that run well on all parallel machines. We call such algorithms `Algorithms of Choice'. This concept is introduced and illustrated using four variants of the Modified Gram-Schmidt method (for solving QR factorization) on five different parallel machines: the Sequent Symmetry, NCUBE/7, two instances of the iPSC/2, and the BBN TC2000 Butterfly. These machines exhibit extreme architectural diversity, including shared and distributed memory, and three types of communication structure. QR factorization presents difficult challenges for parallel algorithms due to complex memory access, load balancing requirements, and granularity effects. Unlike previous studies we consider column as well as row decompositions. A key result is the successful use of the CTA model of parallel computation. A second result is the identification of the best architecture-independent algorithm for the Modified Gram-Schmidt method. TR # 92-04-06 ``Index Assignment for Progressive Transmission of Full Search Vector Quantization'' Eve A. Riskin, Les E. Atlas, Richard Ladner and Ren-Yuh Wang The question of progressive image transmission for full search vector quantization is addressed via codeword index assignment. Namely, we develop three new methods of assigning indices to a vector quantization codebook and formulate these assignments as labels of nodes of a full search progressive transmission tree. This tree defines from the bottom up the binary merging of codewords for successively smaller-sized codebooks. The binary representation for the path through the tree represents the progressive transmission code. The methods of designing the tree which we apply are Kohonen's self-organizing neutral net, a modification of the common splitting technique for the generalized Lloyd algorithm, and, borrowing from optimization theory, minimum cost perfect matching. Empirical experiments were run on a medical image data base to compare the signal-to-noise ratio (SNR) of the techniques on the intermediate as well as the final images. While the neural net technique worked reasonably well, the other two methods performed better and were close in SNR to each other. We also compared our results to tree-structured vector quantizers and confirmed that full search VQ has a slightly higher SNR. TR # 92-04-07 ``Interactive Proof Systems with Polynomially Bounded Strategies'' Anne Condon and Richard Ladner Interactive proof systems in which the Prover is restricted to have a polynomial size strategy are investigated. The restriction of polynomial size computation tree, visible to the Prover, or logarithmically bounded number of coin flips by the Verifier guarantee a polynomial size strategy. The additional restriction of logarithmic space is also investigated. A main result of the paper is that interactive proof systems in which the Prover is restricted to a polynomial size strategy are equivalent to MA, Merlin-Arthur games, defined by Babai and Moran [1]. Polynomial tree size is also equivalent to MA, but when logarithmic space is added as a restriction, the power of polynomial tree size reduces to NP. Logarithmically bounded number of coin flips are equivalent to NP, and When logarithmic space is added as a restriction, the power is not diminished. The proof that NP subseteq IP(log-space, log-random-bits) illustrates an interesting application of the new ``fingerprinting'' method of Lipton [14]. Public interactive proof systems which have polynomial size strategies are also investigated. TR # 92-05-02 ``High Performance Parallel/Distributed Computing [1991 CISE Institutional Infrastructure Proposal]'' Computer Science and Engineering Faculty This technical report contains excerpts from our 1991 CISE Institutional Infrastructure proposal and addendum. Because a large cross-section of the department participated in this effort, the material provides a reasonable picture of our history, status and direction. Included are summaries of each of our major areas of research along with bibliographies of recent faculty publications in each topic. The proposal also outlines our infrastructure and facilities, as well as the results from prior awards. Faculty credentials, recent doctoral candidates and their theses, and overall department statistics are summarized in tables.