DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF WASHINGTON TECHNICAL REPORT ABSTRACTS - FROM 93-06-01 -> 94-01-09 Obtaining Electronic Technical Reports: Technical reports in this list with an asterisk (*) following the TR number are available on-line as of April 1995 via FTP (ftp.cs.washington.edu), or the WWW (http://www.cs.washington.edu/research/tr/techreports.html) Obtaining Hardcopy Technical Reports: To obtain a technical report on paper, sent to you via Snail Mail, please contact . If you need additional information or assistance, please contact: ABSTRACTS TR #93-06-01 * "Performance of Chaos and Oblivious Routers Under Non-Uniform Traffic" Melanie L. Fulgham and Lawrence Snyder Chaos router is a nonminimal adaptive packet router that is both deadlock free and probabilistically livelock free. Unlike its predecessors, the Chaos router uses randomization to provide an efficient way to avoid livelock. The Chaos router has been compared in software simulation studies to the state-of-the-art oblivious routers (i.e., dimension order routers) on the hypercube topology, and the mesh and torus topologies. The results, based on simulations of two different workloads, one uniform and one non-uniform load, show that in general the Chaos router is superior to the oblivious router. However, two workloads alone are not representative of the typical types of traffic encountered in practice. In this paper five additional non-uniform workloads are simulated. Each was chosen to represent one of the various types of communication encountered in typical programs. The workloads, compared on both the 256 node torus and hypercube topologies, show considerable variety and reveal insight on the effectiveness of Chaotic adaptive routing on non-uniform loads. Data presented includes the saturation point of the networks under the various loads, the throughput as a function of presented load, latency, and total delay of packets through the network. Collectively, this data gives a much clearer and complete characterization of the Chaos and oblivious routers' behavior on non-uniform traffic. TR #93-06-02 * "Managing Abstraction-Induced Complexity" David Keppel Abstraction reduces the apparent complexity of an implementation by hiding all but "the most relevant" details. However, no interface is suitable for all the users of the implementation for exactly the same reason: each user has a slightly different view of what is "most relevant." Thus, although abstractions can reduce complexity, working around their limitations can introduce other complexities. Some abstractions are designed to minimize the added complexity. This paper examines five common abstraction models and uses common examples to contrast the tradeoffs presented by each model. TR #93-06-03 * "An Algorithm for Probabilistic Planning" Nicholas Kushmerick, Steve Hanks, and Daniel Weld We define the probabilistic planning problem in terms of a probability distribution over initial world states, a boolean combination of propositions representing the goal, a probability threshold, and actions whose effects depend on the execution-time state of the world and on random chance. Adopting a probabilistic model complicates the definition of plan success: instead of demanding a plan that "provably" achieves the goal, we seek plans whose probability of success exceeds the threshold. In this paper, we present BURIDAN, an implemented least-commitment planner that solves problems of this form. We prove that the algorithm is both sound and complete. We then explore BURIDAN's efficiency by contrasting four algorithms for plan evaluation, using a combination of analytic methods and empirical experiments. We also describe the interplay between generating plans and evaluating them, and discuss the role of search control in probabilistic planning. TR # 93-06-04 * "Utility Models for Goal-Directed Decision-Theoretic Planners" Peter Haddawy and Steve Hanks AI planning agents are goal-directed: success is measured in terms of whether or not an input goal is satisfied, and the agent's computational processes are driven by those goals. A decision-theoretic agent, on the other hand, has no explicit goals - success is measured in terms of its preferences or a utility function that respects those preferences. The two approaches have complementary strengths and weaknesses. Symbolic planning provides a computational theory of plan generation, but under unrealistic assumptions: perfect information about and control over the world and a restrictive model of actions and goals. Decision theory provides a normative model of choice under uncertainty, but offers no guidance as to how the planning options are to be generated. This paper unifies the two approaches to planning by describing utility models that support rational decision making while retaining the goal information needed to support plan generation. We develop an extended model of goals that involves temporal deadlines and maintenance intervals, and allows partial satisfaction of the goal's temporal and atemporal components. We then incorporate that goal information into a utility model for the agent. Goal information can be used to establish whether one plan has higher expected utility than another, but without computing the expected utility values directly: we present a variety of results showing how plans can be compared rationally, but using information only about the probability that they will satisfy goal-related propositions. We then demonstrate how this model can be exploited in the generation and refinement of plans. TR # 93-06-05 * "Benchmarks, Testbeds, Controlled Experimentation, and the Design of Agent Architectures" Steve Hanks, Martha Pollack, Paul Cohen The methodological underpinnings of AI are slowly changing. Benchmarks, testbeds, and controlled experimentation are becoming more common. While we are optimistic that this change can solidify the science of AI, we also recognize a set of difficult issues concerning the appropriate use of this methodology. We discuss these issues as they relate to research on agent design. We survey existing testbeds for agents, and argue for appropriate caution in their use. We end with a debate on the proper role of experimental methodology in the design and validation of planning agents. TR # 93-06-06 * "Shade: A Fast Instruction-Set Simulator for Execution Profiling" Robert F. Cmelik and David Keppel Shade is an instruction-set simulator and custom trace generator. Application programs are executed and traced under the control of a user-supplied trace analyzer. To reduce communication costs, Shade and the analyzer are run in the same address space. To further improve performance, code which simulates and traces the application is dynamically generated and cached for reuse. Current implementations run on SPARC systems and, to varying degrees, simulate the SPARC (Versions 8 and 9) and MIPS I instruction sets. This paper describes the capabilities, design, implementation, and performance of Shade, and discusses instruction set emulation in general. Shade improves on its predecessors by providing their various tracing capabilities together in a single tool. Shade is also fast: Running on a SPARC and simulating a SPARC, SPEC 89 benchmarks run about 2.3 times slower for floating-point programs and 6.2 times slower for integer programs. Saving trace data costs more, but Shade provides fine control over tracing, so users pay a collection overhead only for data they actually need. Shade is also extensible so that analyzers can examine arbitrary target state and thus collect special information that Shade does not "know" how to collect. TR #93-06-08 * "Data Locality On Shared Memory Computers Under Two Programming Models" Ton A. Ngo and Lawrence Snyder Data locality is a well-recognized requirement for the development of any parallel application, but the emphasis on data locality varies with the programming model. In this paper, we examine two models at the extreme ends of the spectrum with respect to one general class of parallel machines. We present experiments on shared-memory machines that indicate that programs written in the nonshared-memory programming model generally offer better data locality and thus better performance. We study the LU decomposition problem and a Molecular Dynamics simulation on five shared-memory machines with widely differing architectures, and analyze the effect of the programming models on data locality. The comparison is done through a simple analytical model and measurements on the machines. TR #93-06-09 * "A Beginner's Guide to the Truckworld Simulator" Steve Hanks, Dat Nguyen and Chris Thomas The Truckworld is a multi-agent planning testbed that features uncertainty, exogenous change, complex interaction among objects in the world, semi-realistic sensing, and facilities for communication and other interaction among agents. This document is designed to give the first-time user a feel for the testbed's capabilities, and also will provide instructions on how to obtain the code and run a standard configuration of the simulator. TR # 93-06-10 * "Modeling a Dynamic and Uncertain World I: Symbolic and Probabilistic Reasoning about Change" Steve Hanks and Drew McDermott Intelligent agency requries some ability to predict the future. An agent must ask itself what is presently its best course of action given what it now knows about what the world will be like when it intends to act. This paper presents a system that uses a probabilistic model to reason about the effects of an agent's proposed actions on a dynamic and uncertain world, computing the probability that relevant propositions will hold at a specified point in time. The model allows for incomplete information about the world, the occurrence of exogenous (unplanned) events, unreliable sensors, and the possibility of an imperfect causal theory. The system provides an application program with answers to questions of the form "is the probability that \phi will hold in the world at time t great than \tau?" It is unique among algorithms for probabilistic temporal reasoning in that it tries to limit its inference according to the proposition, time, and probability threshold provided by the application. The system will also notify the application if subsequent evidence invalidates its answer to a query. TR #93-07-01 * "Data Prefetching for High-Performance Processors" Tien-Fu Chen Recent technological advances are such that the gap between processor cycle times and memory cycle times is growing. Techniques to reduce or tolerate large memory latencies become essential for achieving high processor utilization. In this dissertation, we propose and evaluate data prefetching techniques that address the data access penalty problems. First, we propose a hardware-based data prefetching approach for reducing memory latency. The basic idea of the prefetching scheme is to keep track of data access patterns in a reference prediction table (RPT) organized as an instruction cache. It includes three variations of the design of the RPT and associated logic: generic design, a lookahead mechanism, and a correlated scheme. They differ mostly on the timing of the prefetching. We evaluate the three schemes by simulating them in a uniprocessor environment using the ten SPEC benchmarks. The results show that the prefetching scheme effectively eliminates a major portion of data access penalty and is particularly suitable to an on-chip design and a primary-secondary cache hierarchy. Next, we study and compare the substantive performance gains that could be achieved with hardware-controlled and software-directed prefetching on shared-memory multiprocessors. Simulation results indicate that both hardware and software schemes can handle programs with regular access patterns. The hardware scheme is good at manipulating dynamic information, whereas software prefetching has the flexibility of prefetching larger blocks of data and of dealing with complex data access patterns. The execution overhead of the additional prefetching instructions may decrease the software prefetching performance gains. An approach that combines software and hardware schemes is shown to be very promising for reducing the memory latency with the least overhead. Finally, we study non-blocking caches that can tolerate read and write miss penalties by exploiting the overlap between post-miss computations and data accesses. We show that hardware data prefetching caches generally outperform non-blocking caches. We derive a static instruction scheduling algorithm to order instructions at compile time. The algorithm is shown to be effective in exploiting instruction parallelism available in a basic block for non-blocking loads. TR #93-07-02 * "Specification, Simulation, and Verification of Timing Behavior" Tod Amon Temporal behavior needs to be formally specified, validated, and verified, if systems that interface with the outside world are to be synthesized from high-level specifications. Due to the high level of abstraction, the work presented in this thesis applies to systems that ultimately can be implemented using hardware, software, or a combination of both. In the area of specification, a generalization of the event-graph specification paradigm is presented. The model supports the expression of complex functionality using an operational semantics and cleanly integrates structure into the event-based paradigm. Temporal relationships between systems events are specified using a denotational semantics that relies on both chronological and causal relationships to identify the discrete event occurrences being constrained. It is of critical importance that a design representation support user validation. A simulator, called OEsim, has been implemented to support validation. The simulator can report timing constraint violations, and detect inconsistencies between the operational and denotational specifications. Synthesis algorithms can exploit guarantees regarding temporal behavior to produce efficient implementations, and often need timing information in order to properly synthesize designs from abstract specifications. Two verification tools are presented in this dissertation. The first demonstrates the feasibility and benefits of performing symbolic timing verification, in which variables are used in place of numbers. The second addresses a fundamental problem: determining how synchronization affects the temporal behavior of concurrent systems. TR #93-08-01 * "A (More) Formal Definition of Communicating Real-Time State Machines" Alan C. Shaw The language of communicating real-time state machines is defined precisely in three parts. First, the syntax of a single machine and of a set of connected machines are described. Then, the static semantics is described as the set of execution paths obtained through a static analysis. Finally, the dynamic semantics is defined by specifying a simulation algorithm that produces execution traces or histories. The most difficult and interesting aspect is that dealing with time. TR #93-08-02 "Three-Dimensional Shape from Color Photometric Stereo" Per Christensen and Linda Shapiro Computer vision systems can be used to determine the shape of real three-dimensional objects for purposes of object recognition and pose estimation or for CAD applications. One method that has been developed is "photometric stereo." This method uses several images taken from the same viewpoint, but with different lightings, to determine the three-dimensional shape of an object. Most previous work in photometric stereo has been with gray-tone images; color images have only been used for dielectric materials. In this paper we describe a procedure for color photometric stereo, which recovers the shape of a colored object from two or more color images of the object under white illumination. This method can handle different types of materials, such as composites and metals, and can employ various reflection models such as the Lambertian, dichromatic, and Torrance-Sparrow models. For composite materials, colored metals, and dielectrics there are two advantages of utilizing color information: at each pixel, there are more constraints on the orientation, and the result is less sensitive to noise. Hence the shape can be found more accurately. The method has been tested on both artificial and real images of objects of various materials. TR #93-09-01 * "Building Softbots for UNIX (Preliminary Report)" Oren Etzioni, Neal Lesh and Richard Segal AI is moving away from "toy tasks" such as block stacking towards real-world problems. This trend is positive, but the amount of preliminary groundwork required to tackle a real-world task can be software environments, such as operating systems or databases, as domains for agent research. The cost, effort, and expertise required to develop and systematically experiment with software agents is relatively low. Furthermore, software environments circumvent many thorny, but peripheral, research issues that are inescapable in other environments. Thus, software environments enable us to test agents in a real world yet focus on core AI research issues. To support this claim, we describe our project to develop UNIX "softbots" (software robots) - intelligent agents that interact with UNIX. Existing softbots accept a diverse set of high-level goals, generate and execute plans to achieve these goals in real time, and recover from errors when necessary. TR #93-09-02 * "The Interaction Between Static Typing and Frameworks" Gail Murphy and David Notkin Frameworks capture the commonalities in design and implementation between a family of related applications and are typically expressed in an object-oriented language. Software engineers use frameworks to reduce the cost of building complex applications. This paper characterizes the operations of instantiation, refinement and extension used to build applications from frameworks and explores how these operations are supported by static typing policies of common object-oriented languages. We found that both conservative contravariant and covariant static typing policies were effective in supporting the operations of framework instantiation and framework extension. However, both policies were ineffective at supporting the operation of framework refinement. Although it does not support the refinement operation itself, covariance is sufficiently expressive to support the instantiation of a properly refined framework. This result illustrates how programming languages can at times complicate rather than simplify the engineering of software. TR #93-09-03 * "Prism: A Case Study in Behavioral Entity-Relationship Modeling and Design" Kevin Sullivan, Ira Kalet, and David Notkin We describe the collaboratively developed "Prism," a tightly integrated but architecturally flexible environment for modeling, visualizing, and simulating radiation treatment plans for cancer patients. Despite significant advances in function, we built Prism at low cost in effort and code size. The key was using "behavioral entity-relationship (ER) modeling" to craft an architecture, and objects representing "abstract behavioral types (ABTs)" to implement it. In more detail, we model a system as a set of independently defined entities used directly by clients but made to work together by behavioral relationships connecting them. We implement this model in an imperative programming framework by representing both entities and relationships as instances of classes that define, announce, and register with events in addition to defining and calling operations. Prism provided a realistic test of the behavioral ER modeling and design method. TR #93-09-04 * "Kaleidoscope: A Constraint Imperative Programming Language" Gus Lopez, Bjorn Freeman-Benson, and Alan Borning The Constraint Imperative Programming (CIP) family of languages integrates constraints and imperative, object-oriented programming. In addition to combining the useful features of both paradigms, there are synergistic effects of this integration, such as the ability to define constraints over user-defined domains. We discuss characteristics of the CIP family and provide a rationale for its creation. The synergy of constraints and objects imposes additional challenges for the provision of constructs, such as object identity and class membership, that are well-understood in conventional language paradigms. We discuss the benefits and challenges of combining the constraint and imperative paradigms, and present our current ideas in the context of the design and implementation of the Kaleidoscope'93 language. TR #93-09-05 * "The Meerkat Multicomputer" Robert B. Bedichek and Curtis Brown Meerkat is a distributed memory multicomputer architecture that scales to hundreds of processors. Meerkat uses a two dimensional passive backplane to connect nodes composed of processors, memory, and I/O devices. The interconnect is conceptually simple, inexpensive to design and build, has low latency, and provides high bandwidth on long messages. However, it does not scale to thousands of processors, does not provide high bandwidth on short messages, and does not provide cache coherent shared memory. Our hypothesis is that many general-purpose, database, and parallel numerical workloads work well on systems with Meerkat's characteristics. We describe the Meerkat architecture, the niche that Meerkat fills, the motivation behind our design choices, and give performance results obtained from our hardware prototype and a calibrated simulator. TR #93-09-06 * "UCPOP User's Manual - Version 2.0" Anthony Barrett, Keith Golden, Scott Penberthy and Daniel Weld The UCPOP partial order planning algorithm handles a subset of the KRSL action representation that corresponds roughly with Pednault's ADL language. In particular, UCPOP operates with actions defined using conditionals and nested quantification. This manual describes a COMMON LISP implementation of UCPOP, explains how to set up and run the planner, details the action representations syntax, and outlines the main data structures and how they are manipulated. Improvements in Version 2.0 include: Declarative specification of search control rules Universal quantification over dynamic universes (i.e., object creation and destruction) Domain axioms Predicates expanding to lisp code Larger set of domain theories search functions for testing Expanded users manual Improved CLIM-based graphic plan-space browser TR #93-09-08 "The Truckworld Manual" Dat Nguyen, Steve Hanks and Chris Thomas The Truckworld is a multi-agent planning testbed that has been designed with the following goals in mind: * To model complex causal interactions between different entities in the simulated world, thereby providing a rich problem domain for the experimenter. * To provide a model of sensing that includes incompleteness, noise, and cost. * To support uncertainty in the world through agent commands that yield uncertain effects and exogenous events that occur beyond the agent's control. * To provide an interface that facilitates the user in running and customizing a simulation. * To support interaction among multiple agents. This manual provides an overview of the Truckworld simulator, details the steps to run a simulation, and presents the specification language for customizing the simulation. TR #93-09-09 * "Protocol Compilation: High-Performance Communication for Parallel Programs" Edward W. Felten One obstacle to the use of distributed-memory multicomputers has been the disappointing performance of communication primitives. Although the hardware is capable of moving data very quickly, the software has been unable to exploit this potential. The main cause of this "communication gap" is "protocol overhead." In addition to moving the application's data between processors, communication systems must engage in other communication and synchronization to ensure correct execution. Protocol overhead is an inherent attribute of parallel computation - it occurs on all machines, under all programming models. To reduce the effect of protocol overhead, I advocate the use of "tailored protocols" - communication protocols custom-designed to work with a particular application program. A tailored protocol takes advantage of advance knowledge about the behavior of the application program to "slim down" the protocol. Unfortunately, designing tailored protocols by hand is very difficult. The process of designing tailored protocols can be automated. A "protocol compiler" is a goal that takes an application program as input, and produces as output a tailored protocol for that program. This tailored protocol is plug-compatible with the existing message-passing library, so its use is transparent to the application program. The bulk of this dissertation discusses the principles underlying the design of a protocol compiler, including the forms of program analysis the protocol compiler must carry out. In addition to this general discussion. I describe the details of Parachute a working prototype I have constructed. Parachute generates tailored protocols for data-parallel programs running on the Intel series of multicomputers. Performance models predict, and experiments with Parachute on real applications confirm, that using a protocol compiler leads to a large decrease in communication time. Experiments indicate that Parachute, despite several engineering compromises in its design, cuts communication time by 17%. Reductions in overall application running time range from 0% to 20%, with an average reduction of 7.3%. More aggressive implementation, or new hardware, will lead to even larger speedups. TR #93-10-01 "On the Recognition of Arabic Documents" Badr Al-Badr This paper explores methodologies for optical character recognition with a special emphasis on the Arabic language, using five papers as guides. The features of written Arabic are described. And a general model for OCR systems, MOCR, is defined. Each of the five papers is reviewed. Using the model, the approaches in those papers are decomposed and compared. The paper concludes by summarizing the contributions of each of the papers and outlining directions for future research. TR #93-10-02 * "Complexity of Sub-Bus Mesh Computations" Anne Condon, Richard E. Ladner, Jordan Lampe and Rakesh Sinha We investigate the time complexity of several fundamental problems on the sub-bus mesh parallel computer with p processors. TR #93-10-03 "Lower Bounds on Universal Traversal Sequences Based on Chains of Length 5" Jonathan Buss and Martin Tompa Universal traversal sequences for cycles require length at least proportional to n^(1.43), improving the previous bound of n^(1.33). For d at least 3, universal traversal sequences for d-regular graphs require length at least proportional to d^(0.57) n^(2.43). For constant d, the best previous bound was n^(2.33). TR #93-10-04 * "Impact of Sharing-Based Thread Placement on Multithreaded Architectures" Radhika Thekkath and Susan J. Eggers Multithreaded architectures context switch between instruction streams to hide memory access latency. Although this improves processor utilization, it can increase cache interference and degrade overall performance. One technique to reduce the interconnect traffic is to co-locate threads that share data on the same processor. Multiple threads sharing in the cache should reduce compulsory and invalidation misses, thereby improving execution time. To test this hypothesis, we compared a variety of thread placement algorithms via trace-driven simulation of fourteen coarse- and medium-grain parallel applications on several multithreaded architectures. Our results contradict the hypothesis. Rather than decreasing, compulsory and invalidation misses remained nearly constant across all placement algorithms, for all processor configurations, even with an infinite cache. That is, sharing-based placement had no (positive) effect on execution time. Instead, load balancing was the critical factor that affected performance. Our results were due to one or both of the following reasons: (1) the sequential and uniform access of shared data by the application's threads and, (2) the insignificant number of data references that require interconnect access, relative to the total number of instructions. TR #93-10-05 * "Multiresolution Analysis for Surfaces of Arbitrary Topological Type" Tony D. DeRose, Michael Lounsbery, Joe Warren Multiresolution analysis provides a useful and efficient tool for representing shape and analyzing features at multiple levels of detail. Although the technique has met with considerable success when applied to univariate functions, images, and more generally to functions defined on real^n, to our knowledge it has not been extended to functions defined on surfaces of arbitrary genus. In this report, we demonstrate that multiresolution analysis can be extended to surfaces of arbitrary genus using techniques from subdivision surfaces. We envision many applications for this work, including automatic level-of-detail control in high-performance graphics rendering, compression of CAD models, and acceleration of global illumination algorithms. We briefly sketch one of these applications, that of automatic level-of-detail control of polyhedral surfaces. TR #93-11-01 * "Processor Allocation Policies for Message-Passing Parallel Computers" Cathy McCann and John Zahorjan When multiple jobs compete for processing resources on a parallel computer, the operating system kernel's processor allocation policy determines how many and which processors to allocate to each. In this paper we investigate the issues involved in constructing a processor allocation policy for large scale, message-passing parallel computers supporting a scientific workload. We make four specific contributions: * We define the concept of "efficiency preservation" as a characteristic of processor allocation policies. Efficiency preservation is the degree to which the decisions of the processor allocator degrade the processor efficiencies experienced by individual applications relative to their efficiencies when run alone. * We identify the interplay between the kernel processor allocation policy and the application load distribution policy as a determinant of efficiency preservation. * We specify the details of two families of processor allocation policies, called "Equipartition and Folding." Within each family, different member policies cover a range of efficiency preservation values, from very high to very low. By comparing policies within each family as well as between families, we show that high efficiency preservation is essential to good performance, and that efficiency preservation is a more dominant factor in obtaining good performance than is equality of resource allocation. We employ four evaluation techniques in our work. A simple algebraic model, coupled with Monte-Carlo simulation, is used to obtain the basic efficiency preservation characteristics of the policies under static workloads. A Markovian birth-death model is used to estimate mean response times under homogeneous job arrivals. Finally, a simulation model is used to explore performance under heterogeneous workloads, and to evaluate the fairness of the policies. TR #93-11-02 * "Evaluating Runtime-Compiled Value Specific Optimization" David Keppel, Susan Eggers and Robert Henry Traditional compiler optimizations are either data-independent or optimize around common data values while retaining correct behavior for uncommon values. This paper examines "value-specific" data-dependent optimizations (VSO), where code is optimized at runtime around particular input values. Because VSO optimizes for the specific case, the resulting code is more efficient. However, since optimization is performed at runtime, the performance improvement must more than pay for the runtime compile costs. We describe two VSO implementation techniques and compare the performance of applications that have been implemented using both VSO and static code. The results demonstrate that VSO produces better code and often for reasonable input sizes. The machine-independent implementations showed speedups of up to 1.5 over static C code, and the machine-dependent versions showed speedups of up to 4.3 over static assembly code. TR #93-11-03 * "The Challenges of Mobile Computing" George H. Forman and John Zahorjan Advances in wireless networking technology have engendered a new paradigm of computing, called "mobile computing," in which users carrying portable devices have access to a shared infrastructure independent of their physical location. This provides flexible communication between people and continuous access to networked services. Mobile computing is expected to revolutionize the way computers are used. This paper is a survey of the fundamental software design pressures particular to mobile computing. The issues discussed arise from three essential requirements: the use of wireless networking, the ability to change locations, and the need for unencumbered portability. Promising approaches to address these challenges are identified, along with their shortcomings. TR #93-12-01 "Planning with Continuous Change" J. Scott Penberthy This thesis presents ZENO, a computer program that generates plans of action for achieving goals in a continuously changing world. The program searches a space whose nodes are partially completed plans and whose arcs are the modifications that separate one plan from the next. Actions and goals in the world are described in a language combining first-order logic with piecewise linear constraints. Actions occur over extended intervals of time and can update the world at any time during their execution. These properties enable the program to reason about continuous change, metric preconditions, metric effects, deadline goals, and simultaneous action. The planning algorithms are developed, proven sound and complete, then demonstrated through an empirical study of the computer program as it solves simple problems. TR #93-12-02 * "Multi-Resolution Tiling" David Meyers This paper describes an efficient method for constructing a tiling between a pair of planar contours. The problem is of interest in a number of domains, including medical imaging, biological research and geological reconstructions. Our method requires O(n) space and appears to require O(n log(n)) time for average inputs, compared to the O(n^2 log(n)) time and O(n^2) space required by the optimizing algorithm of Fuchs, Kedem and Uselton. The results computed by our algorithm are in many cases very nearly the same as those of their optimizing algorithm, but at a small fraction of the computational cost. The performance improvement makes the algorithm usable for large contours in an interactive system. The use of multiresolution analysis also allows for tiling after partial reconstruction of the original contours from a low-resolution version, with controlled loss of detail. The use of lower resolution approximations to the original contours can result in significant savings in the time required to display a reconstructed object, and in the space required to store it. TR #93-12-03 * "Multicomputer Interconnection Network Channel Design" Kevin Bolding Massively parallel multicomputers require a high-performance interconnection network. The physical channels which connect nodes in the network are the components that most commonly impose restrictions on the amount of data which can be communicated between nodes of the network. Several different schemes for maximizing the bandwidth of physical interconnection channels are examined. Conventional technology centers on 5V electrical signalling. Within this paradigm, there are several different methods of maximizing throughput of a channel, given its limited bandwidth. Simplex and duplex channels are examined, as well as arbitration protocols and transmission line pipelining. Non-conventional technologies which achieve higher bandwidth transmission through low-voltage signalling and fiber optic communication are also examined. For the present, it is expected that most networks will continue to use electrical signalling with low-voltage signalling because fiber optic hardware is too expensive. As technology changes, though, it may become more common for networks to be built out of optical or other new technologies. TR #93-12-04 * "Probabilistic Planning with Information Gathering and Contingent Execution" Denise Draper, Steve Hanks, and Dan Weld One way of coping with uncertainty in the planning process is to plan to gather information about the world at execution time, building a plan contingent on that information. Literature on decision making discusses the concept of information-producing actions, the value of information, and plans contingent on information learned from tests, but these concepts are missing from AI representations and algorithms for plan generation. This paper presents a planning algorithm that models information-producing (sensing) actions and constructs plans whose executions depend on the nature of the information gathered from sensors. We build on the BURIDAN probabilistic planning algorithm [Kushmerick et al., 1993], extending the action representation to model the behavior of imperfect sensors, and combine it with a frameworld for contingent action that extends the CNLP algorithm for conditional execution. The result, C-BURIDAN, is an implemented planner that extends the functionality of both BURIDA and CNLP. TR #93-12-05 * "Concord: Re-Thinking the Division of Labor in a Distributed Shared Memory System" J. William Lee A collection of processors connected by a network --- either a distributed-memory multiprocessor or a network of workstations --- offers a potentially scalable and cost-effective solution for running scientific and engineering applications. Programming this kind of machines, however, has proved to be difficult. One promising approach is a distributed shared memory system that lets the programmer specify the granularity of coherence, avoiding introducing false sharing. But the challenge in this approach is to offer ease of programming while maintaining high performance. The Concord distributed shared memory system meets this challenge by carefully splitting the responsibilities among the programmer, the compiler, and the runtime system in three areas: programming, debugging, and performance tuning. The runtime system and the compiler cooperate to let the programmer specify easily arbitrary units of sharing, including dynamic data structures. The runtime system and the compiler also cooperate to support runtime error checking. The programmer may specify information as part of performance tuning, and the runtime system can use this information to coalesce coherence messages. A prototype system of Concord has allowed us to program real applications on an Intel iPSC/2 in a short period of time and achieve reasonable speedup. TR #93-12-06 * "Performance of User-Level Communication on Distributed-Memory Multiprocessors with an Optimistic Protocol" J. William Lee The scalability and performance of parallel applications on distributed-memory multiprocessors depends to a great extent on the performance of communication. Although the communication hardware on distributed-memory multiprocessors can transfer data quickly, applications often fail to achieve communication performance matching the hardware's capacity. Two key impediments to achieving high performance communication are flow control and protection boundaries. This paper proposes and evaluates two techniques to reduce the cost of flow control and the cost of crossing protection boundaries. First, to optimize the common case, processors may send messages optimistically without ensuring that the destination processor of the message has enough buffer space to receive the message. Second, to avoid crossing protection boundaries through expensive traps and software interrupts, the kernel and the user-level address space may cooperate and communicate asynchronously through shared data structures. Implemented using these two techniques, our prototype system on an Intel iPSC/2 improves the performance of message passing primitives by a factor up to 2.5 and the performance of real applications by up to 30%. TR #93-12-07 "Codebook Organization to Enhance Maximum A Posteriori Detection of Detection of Progressive Transmission of Vector Quantized Images Over Noisy Channels" Ren-Yuh Wang, Eve A. Riskin and Richard Ladner We describe a new way to organize a full search vector quantization codebook so that images encoded with it can be sent progressively and have immunity against channel noise. The codebook organization guarantees that the most significant bits (MSBs) of the codeword index are most important to the overall image quality and are highly correlated. Simulations show that the effective channel error rates of the MSBs can be substantially lowered by implementing a Maximum A Posteriori (MAP) detector similar to one suggested by Phamdo and Farvardin. The performance of the scheme is close to that of Pseudo-Gray coding at lower bit error rates and outperforms it at higher rates. No extra bits are used for channel error correction. TR #93-12-08 * "Faster Dynamic Linking for SPARC V8 and System V.4" David Keppel and Stephen Russell Dynamic linking is again becoming common and processor pipeline depths are increasing. These factors make it worth retuning dynamic linker implementations. This note examines the current dynamic linking scheme used by System V.4 running on SPARC V8 processors and shows four alternative implementations that can improve the performance of calls to dynamically-linked functions. Improvements come from reducing the number of instructions that must be executed to perform a call; from reducing the dynamically cached size of the linkage table; and by performing control transfers using instruction immediates instead of indirecting off of a register. In the best case, the overhead of calling a dynamically linked instruction is reduced from four instructions with a register indirect to a single instruction and no register indirect. The paper also discusses issues and techniques that may be useful for other systems that perform dynamic using similar techniques. This note also briefly considers dynamic relinking. TR #93-12-09 * "Abstractions for Portable, Scalable Parallel Programming" Gail A Alverson, William G. Griswold, Calvin Lin, David Notkin and Lawrence Snyder In parallel programming, the need to manage communication costs, load imbalance, and irregularities in the computation puts substantial demands on the programmer. Key properties of the architecture, such as the number of processors and the costs of communication, must be exploited to achieve good performance. Coding these properties directly into a program compromises the portability and flexibility of the code because significant changes are usually needed to port or enhance the program. We describe a parallel programming model that supports the concise, independent description of key aspects of a parallel program---such as data distribution, communication, and boundary conditions---without reference to machine idiosyncrasies. The independence of such components improves portability by allowing the components of a program to be tuned independently, and encourages reuse by supporting the composition of existing components. The architecture-sensitive aspects of a computation are isolated from the rest of the program, reducing the need to make extensive changes to port a program. This model is effective in exploiting both data parallelism and functional parallelism. This paper provides programming examples, compares this work to related languages, and presents performance results. TR #94-01-01 * "Piecewise Smooth Surface Reconstruction" Hugues Hoppe, Tony DeRose, Tom Duchamp, Hubert Jin, John McDonald, Werner Stuetzle We present a general method for automatic reconstruction of accurate, concise, piecewise smooth surface models from scattered range data. The method can be used in a variety of applications such as reverse engineering, that is, the construction of CAD models from physical objects. Novel aspects of the method are its ability to model surfaces of arbitrary topological type and to recover sharp features such as creases and corners. The method has proven to be effective, as demonstrated by a number of examples using both simulated and real data. A key ingredient in the method, and a principal contribution of this paper, is the introduction of a new class of piecewise smooth surface representations based on subdivision. These surfaces have a number of properties that make them ideal for use in surface reconstruction: they are simple to implement, they can model sharp features concisely, and they can be fit to scattered range data using an unconstrained optimization procedure. TR #94-01-02 "Tractable Closed-World Reasoning with Updates" Keith Golden, Oren Etzioni and Dan Weld Closed-world reasoning is the process of inferring that a logical sentence is false based on its absence from a knowledge base, or the inability to derive it. Previous work on circumscription, autoepistemic logic, and database theory has explored logical axiomatizations of closed-world reasoning, but has not addressed computational tractability. Work in planning has traditionally made the closed-world "assumption" but has avoided closed-world reasoning. We take a middle position, and describe a tractable method for closed-world reasoning over a restricted logical theory of the sort utilized by planning algorithms such as "strips," "nonlin," and "tweak." We show the method to be both sound and tractable (linear-time updates), and incorporate it into the UCPOP planner. Experiments utilizing our Internet softbot (software robot) demonstrate that the method can substantially improve the softbot's performance by eliminating redundant information gathering. TR #94-01-03 "To Sense or Not to Sense? (A Planner's Question)" Keith Golden, Oren Etzioni and Dan Weld Classical planners have traditionally made the closed world assumption --- facts absent from the planner's world model are false. More recent planners, which plan with incomplete information, make the open world assumption --- the truth value of a fact absent from the planner's model is unknown, and must be sensed. The open world assumption leads to two difficulties: (1) How can the planner determine the scope of a universally quantified goal? (previous incomplete-information planners do not accept such goals) (2) When is a sensory action "redundant," yielding information already known to the planner? This paper describes the fully-implemented "xii" planner which represents and reasons about "local closed world information" (LCW) to handle universally quantified goals and avoid redundant sensing. We report on experiments, utilizing our UNIX softbot (software robot), which demonstrate that "LCW" can substantially improve the softbot's performance by eliminating redundant information gathering. TR #94-01-04 * "A Parallel Trace-driven Simulator: Implementation and Performance" Xiaohan Qin and Jean-Loup Baer The simulation of parallel architectures requires an enormous amount of CPU cycles and, in the case of trace-driven simulation, of disk storage. In this paper, we consider the evaluation of the memory hierarchy of multiprocessor systems via parallel trace-driven simulation. We refine Lin et al.'s original algorithm, whose main characteristic is to insert the shared references from every trace in all other traces, by reducing the amount of communication between simulation processes. We have implemented our algorithm on a KSR-1. Results of our experiments on traces of four applications and three different cache coherence protocols show that parallel trace-driven simulation yields significant speedups over its sequential counter-part. The communication overhead is not substantial compared to the dominant overhead due to the processing of replicated inserted references. We also investigate filtering techniques for multiprocessor traces. We show how to filter--in parallel--private and shared references. Our technique generates filtered traces for various block sizes in a single pass. As expected, the simulation of filtered traces is much faster but parallel simulation of filtered traces is not as effective since the ratio of unfiltered shared to private references is now much larger. TR #94-01-05 * "Importance-Driven Wavelet Radiance" Per H. Christensen, Eric J. Stollnitz, David H. Salesin and Tony D. DeRose In this paper, we show how the ideas of importance-based transport and wavelet analysis can be combined to provide an efficient solution to the radiance transport problem. Importance is used to focus the computation on just those interactions having the greatest impact on the final visible solution. Wavelets are used to provide an efficient method for representing radiance, importance, and the transport operator itself. The framework we describe is a very general one that supports curved surfaces and spatially-varying anisotropic BRDFs. We discuss preliminary results from an implementation that demonstrate the potential of the theoretical methods described. TR #94-01-06 * "Multiresolution Curves" Adam Finkelstein and David H. Salesin We describe a multiresolution curve representation, based on wavelets, that allows a curve to be edited at any scale while preserving its details, to be smoothed at continuous levels, and to be approximated to within any given error tolerance for scan conversion. The algorithms we describe are simple to implement, and efficient in storage (requiring no more storage than the original m control points) and speed (typically linear in m). We focus primarily on B-spline curves; however, we also demonstrate how the methods described can be applied to other types of objects, including hand-drawnstrokes with additional properties like thickness, and tensor-product surfaces. TR #94-01-07 * "Interactive Pen-and-ink Illustration" Michael P. Salisbury, Sean E. Anderson, Ronen Barzel, David H Salesin We present an interactive system for creating pen-and-ink illustrations. The system makes use of stroke textures---collections of strokes arranged in different patterns---to generate texture and tone. The user "paints" with a desired stroke texture to achieve a desired tone, and the computer draws all of the individual strokes. Included in the system is support for using scanned or rendered images as a reference, which provides the user with guides for outline and tone. By following these guides closely, the illustration system can be used for interactive digital halftoning, in which stroke textures are applied to convey details that would otherwise be lost in this black-and-white medium. By removing the burden of placing individual strokes from the user, the illustration system makes it possible to create fine stroke work with a purely mouse-based interface. Thus, this approach holds promise for bringing high-quality black-and-white illustration to the world of personal computing and desktop publishing. TR #94-01-08 * "Computer-Generated Pen-and-Ink Illustration" Georges Winkenbach and David H. Salesin This paper describes the principles of traditional pen-and-ink illustration, and shows how a great number of them can be implemented as part of an automated rendering system. It introduces "stroke textures," which can be used for achieving both texture and tone with line drawing. Stroke textures also allow resolution-dependent rendering, in which the choice of strokes used in an illustration is appropriately tied to the resolution of the target medium. We demonstrate these techniques using complex architectural models, including Frank Lloyd Wright's "Robie House." TR #94-01-09 * "Multiresolution Painting and Compositing" Deborah F. Berman, Jason T. Bartell, David H. Salesin We describe a representation for "multiresolution images"---images that have different resolutions in different places---and methods for creating such images using painting and compositing operations. These methods are very easy to implement, and they are efficient in both memory and speed. At a particular resolution, the representation requires space proportional only to the amount of detail actually present, and the most common painting operations, "over" and "erase," require time proportional only to the number of pixels displayed. Finally, we show how "fractional-level zooming" can be implemented in order to allow a user to display and edit portions of a multiresolution image at any arbitrary size.