DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
                    UNIVERSITY OF WASHINGTON
       TECHNICAL REPORT ABSTRACTS -  FROM 93-01-01 TO 93-05-08

TR # 93-01-01
"Mesh Optimization"
Hugues Hoppe, Tony DeRose, Tom Duchamp,
John McDonald and Werner Stuetzle

We present a method for solving the following problem:  Given a set
of data points scattered in three dimensions and an initial triangular
mesh M-{o}, produce a mesh M, of the same topological type as M-{o},
that fits the data well and has a small number of vertices.  Our
approach is to minimize an energy function that explicitly models the
competing desires of conciseness of representation and fidelity to the
data.  We show that mesh optimization can be effectively used in at
least two applications:  surface reconstruction from unorganized
points, and mesh simplification (the reduction of the number of
vertices in an initially dense mesh of triangles).

TR # 93-01-02  
"Hierarchical Constraint Logic Programming"
Molly Wilson and Alan Borning

Constraint Logic Programming (CLP) is a general scheme for extending
logic programming to include constraints.  It is parameterized by D,
the domain of the constraints.  However, CLP(D) languages, as well as
most other constraint systems, only allow the programmer to specify
constraints that must hold.  In many applications, such as interactive
graphics, planning, document formatting, and decision support, one
needs to express preferences as well as strict requirements.  If we
wish to make full use of the constraint paradigm, we need ways to
represent these defaults and preferences declaratively, as
constraints, rather than encoding them in the procedural parts of the
language.  We describe a scheme for extending CLP(D) to include both
required and preferential constraints .  An arbitrary number of
strengths of preference are allowed.  We present a theory of such
constraint hierarchies, and an extension, Hierarchical Constraint
Logic Programming, of the CLP scheme to include constraint
hierarchies.  We give an operational, model theoretic and fixed-point
semantics for the HCLP scheme.  Finally, we describe two interpreters
we have written for instances of the HCLP scheme, give example
programs,
and discuss related work.

TR # 93-02-01  
"Time-Space Tradeoffs for Undirected Graph Traversal"
Paul Beame, Allan Borodin, Prabhakar Raghavan, 
Walter L. Ruzzo, & Martin Tompa

We prove time-space tradeoffs for traversing undirected graphs, using
a variety of structured models that are all variants of Cook and
Rackoff's "Jumping Automata for Graphs". Our strongest tradeoff is a
quadratic lower bound on the product of time and space for graph
traversal. For example, achieving linear time requires linear space,
implying that depth-first search is optimal. Since our bound in fact
applies to nondeterministic  algorithms for nonconnectivity, it also
implies that closure under complimentation of nondeterministic
space-bounded complexity classes is achieved only at the expense of
increased time. To demonstrate that these structured models are
realistic, we also investigate their power. In addition to admitting
well known algorithms such as depth-first search and random walk, we
show that one simple variant of this model is nearly as powerful as a
Turing machine. Specifically, for general undirected graph problems, it
can simulate a Turing machine with only a constant factor increase in
space and a polynomial factor increase in time. 


TR # 93-02-02  
"Building FIFO and Priority-Queuing Spin Locks from Atomic Swap"
Travis S. Craig

We present practical new algorithms for FIFO or priority-ordered spin
locks on shared-memory multiprocessors with an atomic swap
instruction.  Different versions of these queuing spin locks are
designed for machines with coherent-cache and NUMA memory models.  We
include extensions to provide nested lock acquisition, conditional
locking, timeout of lock requests, and preemption of waiters.  These
locks apply to both real-time and non-real-time parallel systems and
we include a comparison of the traits of several lock schemes aimed at
those environments.  Our main technical contributions are our
techniques and algorithms that provide tight control over lock grant
order, use only the atomic swap instruction, use at most one (local
only) spin for lock acquisition and no spinning for lock release, and
need only O(L + P) space on either a coherent-cache or NUMA machine.


TR # 93-02-04  
"A Note on the Least Common Multiples of Dense Sets of Integers"
Wendy Thrash

No abstract for this report.


TR # 93-02-05
"Simulation of Multiprefix PRAMs by Unbounded Fan-in Circuits"
 Paul Beame and Rakesh Sinha

We show that PRAMs that have multiprefix (scan) operations available as
primitives may be simulated by unbounded fan-in circuits with gates for
related operations.  The complexity bounds depend on the word-size of the
PRAM but not on the multiprefix operations or the instruction set of the
processors.

TR # 93-03-01
"Implementing Network Protocols at User Level"
Chandramohan A. Thekkath, Thu D. Nguyen,
Evelyn Moy and Edward D. Lazowska

Traditionally, network transport protocols have been implemented in
the kernel due to performance and security concerns.  However,
considerations of code maintenance, ease of debugging, customization,
and the simultaneous existence of multiple protocols argue for
separating the implementation into more manageable user-level
libraries.  Our approach does not sacrifice performance, protection,
or functionality compared to monolithic implementations.

We begin by motivating the need for user-level protocol
implementations and placing our approach in the context of previous
work.  We then describe our system, which has been implemented on Mach
workstations connected not only to traditional Ethernet, but also to a
more modern network (the DEC SRC AN1).  Based on our experience, we
discuss the implications for host-network controller design and for
overall system structure to support efficient user-level
implementations of communication protocols.


TR # 93-03-02
"A Performance Study of a New Grid Protocol and General Grid 
Structures for Replicated Data"
Akhil Kumar, Michael Rabinovich and Rakesh Sinha

Recently there has been considerable interest in the study of replica
control protocols which are based on organizing several copies of an 
object into logical structures, such as rectangular grids.  In 
addition to high availability, another objective in exploiting 
such structures is to improve the degree of load sharing in a 
system.  In this paper, we extend the scope of grid structures to
general grids, which allow holes in various positions of a rectangular
structure and are useful to consider because they often produce 
availabilities that are higher than solid grids, where every 
position must be occupied by a node. In addition to proposing an 
improvement to the existing grid protocol, we also offer new insights 
into the performance of the grids, from both availability and load 
sharing points of view.  Algorithms for designing grinds to maximize 
availability independently  in conjunction with a load 
sharing constraint are given.  


TR # 93-03-03
"Latency Analysis of TCP on an ATM Network"
Alec Wolman, Geoff Voelker, and Chandramohan A. Thekkath

In this paper, we characterize the latency of TCP on an ATM network.
Latency reduction is a difficult task, and careful analysis is the
first step towards reduction.  We investigate the impact of both
the network controller and the protocol implementation on latency.
We find that a low latency network controller has a significant impact
on the overall latency even for a reliable transport protocol such as
TCP, and that replacing the ULTRIX TCP implementation with the BSD 4.4
alpha implementation improves the latency up to 20%.  We also
characterize the impact on latency of some widely discussed
improvements to TCP, such as header prediction and combining the
checksum calculation with data copying.
  
TR # 93-03-04
"Communication Complexity: Iterative Techniques for Lower Bounds"
Joan T. Lawry

We add to the repertoire of lower bound techniques for the two-party
communication complexity of Boolean functions by adapting the
iterative lower bound technique previously used for search problems to
decision problems.  Using this approach, we show the existence of a
Boolean function whose deterministic complexity is the square of its
nondeterministic complexity, with epsilon-error randomized complexity
no better than deterministic complexity - within a constant factor.  
However, our input domain is only a portion of the set that would
usually be associated with our function in the two-party communication
model.  Nonetheless, our result suggests that for the class of
functions with maximal separation between deterministic and
nondeterministic complexity of a decision problem is nontrivial in
that a randomized protocol for a decision problem has a significantly
lower tolerance for error than that for a search problem.


TR # 93-03-05  
"The Cecil Language - Specifications and Rationale"
Craig Chambers

Cecil is a new purely object-oriented language intended to support
rapid construction of high-quality, extensible software. Cecil
combines multi-methods with a classless object model, object-
based incapsulation, and optional static type checking. Cecil's static
type system distinguishes between subtyping and code inheritance, but
Cecil enables these two graphs to be described with a single set of
declarations, optimizing the common case where the two graphs 
are parallel. Cecil includes a fairly flexible form of parameterization,
including both explicitly parameterized objects, types, and methods 
and implicitly parameterized methods related to the polymorphic 
functions commonly found in functional languages. By making type 
declarations optional, Cecil aims to support mixed exploratory and 
production programming styles.

This document describes the design of the Cecil language as of March,
1993. It mixes the specification of the language with discussion of
design issues and explanations of the reasoning that led to various
design decisions.

TR # 93-03-06  
"Pointers versus Arithmetic in PRAMs"
Patrick W. Dymond, Faith E. Fich, Naomi Nishimara,
Prabhakar Ragde, Walter L. Ruzzo

Manipulation of pointers in shared data structures is an important
communication mechanism used in many parallel algorithms.  Indeed,
many fundamental algorithms do essentially nothing else.
A  Parallel Pointer Machine, (or PPM) is a 
parallel model having pointers as its principal data type.  
PPMs have been characterized as PRAMs obeying
two restrictions --- first, restricted arithmetic capabilities, and
second, the CROW memory access restriction (Concurrent Read, Owner
Write, a commonly occurring special case of CREW).  
We present results concerning the relative power of PPMs (and other
arithmetically restricted PRAMs) versus CROW PRAMs having ordinary
arithmetic capabilities.  First,
we prove lower bounds separating PPMs from CROW PRAMs.  For example,
any step-by-step simulation of an n-processor CROW PRAM by a PPM
requires time Omega(log log n) per step.  Second, we show that
this lower bound is tight --- we give such a step-by-step simulation
using O(log log n) time per step.  As a corollary, we obtain
sharply improved PPM algorithms for a variety of problems, including
deterministic context-free language recognition.

TR # 93-03-07
"How Reductions to Sparse Sets Collapse the Polynomial-time
Hierarchi: A Primer"
Paul Young

This expository paper is based on lectures given while the author was
a Visiting Professor in Dipartimento di Scienze dell'Informazione,
Universita degli Studi di Milano in November and December, 1991.
The author is grateful for the support given by Consiglio Nazionale 
delle Ricerche.  Presented also at the 4th European Summer School 
in Logic, Language and Information, Essex, England, August 1992. 
This revision of September, 1992, corrects a number of inaccurate  
statements involving Kadin's theorem which appeared in an earlier 
version of this paper appearing in the Summer, 1992, issue of SIGACT News.

TR # 93-04-01  
"Adhara: Runtime Support for Dynamic Space-Based Applications on
Distributed Memory MIMD Multiprocessors"
Immaneni Ashok and John Zahorjan

Dynamic space-based applications are simulations of objects moving 
through a closed  k-dimensional space subject to mutual forces.  There
are a wide variety of such applications, differing in the kinds of
objects and forces being simulated.

Dynamic space-based applications exhibit strong data locality
patterns, but these patterns change during the computation as 
objects change position in the simulated space.  To achieve
good performance when run on distributed memory multiprocessors,
two conflicting goals much be addressed:  the strong spatial data 
locality must be exploited, and the computational load must be
balanced across the processors.  These optimizations can be done
statically, by either the programmer or the compiler, or dynamically,
by handwritten application code or a runtime system.  Relying on the
programmer for these functions imposes an unreasonable burden on her,
as it is a hard and time consuming process to develop the code required.
At the same time, the compiler cannot be expected to automatically
generate code leading to good performance, since it cannot extract 
the space-based data dependencies from the program source.  Thus,
the most appropriate approach to providing the needed functions is
through a specialized runtime system that can be used with any
dynamic, space-based application.

In this paper we describe the design and the programming interface of
ADHARA, a runtime system tailored to the needs of dynamic space-based
applications.  Using a specific plasma physics application as an
example, we show that ADHARA enables the development of efficient
parallel codes that involve very few additional lines of code and very
few additional concepts beyond those required to construct a
sequential version of the application.

TR # 93-04-02  
"Sharing and Protection in a Single Address Space Operating System"
Jeffrey S. Chase, Henry M. Levy, Michael J. Feeley, and Edward D. Lazowska

The appearance of 64-bit address space architectures, such
as the DEC Alpha, HP PA-RISC, and MIPS R4000, signals a
radical shift in the amount of address space available to
operating systems and applications.  This shift provides the
opportunity to reexamine fundamental operating system
structure; specifically, to change the way that operating
systems use address space.

This paper explores memory sharing and protection support in
Opal, a single address space operating system designed for
wide-address architectures.  Opal threads execute within
protection domains in a single shared virtual address space.
Sharing is simplified, because addresses are context
independent.  There is no loss of protection, because
addressability and access are orthogonal; the right to
access a segment is determined by the protection domain in
which a thread executes.  This model enables beneficial code
and data sharing patterns that are currently prohibitive,
due in part to the inherent restrictions of multiple address
spaces, and in part to Unix programming style.

We have designed and implemented an Opal prototype using the
Mach 3.0 microkernel as a base.  Our implementation
demonstrates how a single address space structure can be
supported alongside of other environments on a modern
microkernel operating system, using modern wide-address
architectures.  This paper justifies the Opal model and its
goals for sharing and protection, presents the system and
its abstractions, describes the prototype implementation,
and reports experience with integrated applications.


TR # 93-04-03  
"Efficient Support for Multicomputing on ATM Networks"
Oren Etzioni, Hank Levy, Richard Segal, and Chandramohan A. Thekkath

The emergence of a new generation of networks will dramatically
increase the attractiveness of loosely-coupled multicomputers based on
workstation clusters. The key to achieving high performance in this
enviroment is efficient network access, because the cost of remote
access dictates the granularity of parallelism that can be supported.
Thus in addition to traditional distribution mechanisms such as RPC,
workstation clusters should support lightweight communication
paradigms for executing parallel applications

This paper describes a simple communication model based on the notion
of remote memory access. Applications executing on one host can
perform direct memory read or write operations on user-defined remote
memory buffers. We have implemented a prototype system based on this
model using commercially available workstations and ATM networks. Our
prototype uses kernel-based emulation of remote read and write
instructions, implemented through unused processor opcodes; thus,
applications (or runtime libraries) see direct machine support for
remote memory access. We show that this model can be supported safely
and efficiently on current systems; for example, a 40-byte remote
write operation completes in 30 microseconds on our 140 Mb/s ATM
-based network with 19.5 SPECint DECstation hosts. Using example 
applications, we show how this underlying model can provide excellent 
performance for both parallel applications and traditional distributed
system services.   


TR # 93-04-04  
"OS Agents:  Using AI Techniques in the Operating System Environment"
Oren Etzioni, Henry Levy, Richard Segal, and Chandramohan Thekkath

While recent decades have brought substantial change to the form of
the operating system interface, the  power of operating system
commands has remained nearly constant.  Conventional commands, whether
visual or textual, specify one particular action to perform.  To carry
out a complex task, such as reducing disk utilization, the
user is forced to explicitly specify each of the necessary steps.  Traditional
command-language extension mechanisms, such as shell scripts and
pipes, enable the user to aggregate and compose various commands, but
force him or her to write and debug programs --- a formidable challenge
for naive users.

This paper presents a goal-oriented approach to the operating
system command interface, realized through an implementation we call
OS agents.  Using OS agents, the user simply specifies a goal to 
accomplish, and the OS agent decides how to accomplish that goal 
using its knowledge base of the system state and its commands.  
The OS agent dynamically synthesizes the appropriate command sequence,
issues the required commands and system calls, handles errors, 
and retries commands if necessary.  With OS agents, we have applied 
AI planning and learning techniques to the operating system
environment to increase the power of the user's commands.

We have implemented OS agents within a distributed Unix environment.
Our experience indicates that it is practical to incorporate novel
ideas of automatic planning and learning into contemporary operating
systems with a modest amount of work and little performance penalty.
In this paper we present OS agents and their operation, and describe
the general-purpose mechanism we have provided to flexibly and
efficiently support the needs of OS agents within the Unix operating
system.

TR # 93-04-05  
"Vector Prefix Addition on Sub-Bus Mesh Computers"
Richard E. Ladner, Jordan Lampe and Richard Rogers

We investigate parallel algorithms for the sub-bus mesh
computer architecture which is exemplified by the MasPar MP-1
computer.  We consider the problems of vector addition (vector
"reduce-add") and vector prefix addition (vector 
"scan-add").  Both scalar addition and scalar prefix addition
can be solved in parallel time O(log p) on sub-bus mesh computers
with p processors.  We present a time analysis of two-
dimensional algorithms.  We present a time analysis of the
algorithms in an abstract model of the sub-bus mesh computer.
We then compare our analytical results with the timings from an
actual implementation of the algorithms on a MasPar MP-1 with 1024
processors.  The comparison demonstrates, to a degree, the validity of
the abstract model.


TR # 93-04-06  
"Fast Data Breakpoints" 
David Keppel

Debuggers allow a user to halt a program and examine its state.
The debugger stops execution when some user-specified condition is
satisfied: Code breakpoints halt program execution when a particular
instruction is executed. Data breakpoints halt program execution 
when a variable is referenced. Code breakpoints are supported 
directly in hardware on most machines and are fast. Data breakpoints, 
however, are notoriously slow. This note describes how data breakpoints
can be made fast.

TR # 93-04-07  
"An Exponential Separation between the Matching Principle
and the Pigeonhole Principle"
Paul Beame and Toniann Pitassi

The combinatorial matching principle states that there is no perfect
matching on an odd number of vertices. This principle generalizes the
pigeonhole principle, which states that for a fixed bi-partition of the
vertices, there is no perfect matching between them. Therefore, it
follows from recent lower bounds for the pigeonhole principle that the
matching principle requires exponential-size bounded-depth Frege
proofs. Ajtai [Ajt90] previously showed that the matching principle
does not have polynomial-sized bounded-depth Frege proofs even with
the pigionhole principle as an axiom schema. His proof utilizes
nonstandard model theory and is nonconstructive. We improve Atjai's
lower bound from barely superpolynominal to exponential and eliminate
the nonstandard model theory.

Our lower bound is also related to the inherent complexity of
particular search classes (see[Pap91]). In particular, oracle
separations between the complexity of classes PPA and PPAD,
and between PPA and PPP also follow from our techniques ([BP93a]).


TR # 93-04-08  
"An Automatic Verification Technique for Communicating Real-Time 
State Machines"
Sitaram C. V. Raju

We describe an automatic verification technique for distributed
real-time systems that are specified as Communicating Real-Time State
Machines (CRSMs). CRSMs are timed state machines that communicate
synchronously over uni-directional channels. The proposed approach is
to model the behavior of the system of (an expressive subclass of)
CRSMs by a timed reachability graph.  The system behavior of CRSMs is
characterized by a time-stamped trace of communication events.  We
provide a decision procedure for verifying timing and safety
properties (specified in a notation based on Real-Time Logic) of the
reachability graph, and hence of the corresponding system of CRSMs. We
also present a condition for the existence of deadlock in a system of
CRSMs. Finally, we briefly describe an implementation of a verifier
program based on the above algorithms.


TR # 93-04-09
"Building Counting Networks from Larger Balancers"
Edward W. Felten, Anthony LaMarca, Richard Ladner

We introduce a generalization of the counting networks of Aspnes, Herlihy,
and Shavit.  Our counting networks are constructed using 
k-balancers, rather than the 2-balancers of Aspnes  et al.
For reasonable values of k, k-balancers and 2-balancers can be
implemented with equal efficiency on existing computers.  Our
k-bitonic networks have depths ranging from O(1) to O(log^2 w), 
where w is the number of inputs to the network.  The depth
of our networks varies with the choice of k; choosing the optimal
k depends on a tradeoff between the desire for a shallow network and the
desire for low contention.

We present the k-bitonic construction, prove its correctness, and
introduce some useful variations to the construction.
We then compare the performance of our networks with the networks of Aspnes
et al, using simulation of idealized counting networks. Our networks 
perform at least as well in all cases, and typically have throughput 
about 25% higher than the networks of Aspnes  et al.

TR # 93-05-01  
"Hierarchical Constraint Logic Programming"
Molly Wilson

Many problems in computer science involve keeping track of relations.
These may be relations among graphical objects on a display,
arithmetic relations among numbers, information about which nodes in a
graph are adjacent to each other, or requirements about job
scheduling, to name just a few. These relationships can quickly grow
complex as the number of objects in a system increases. Frequently,
these problems can be solved more easily when these relations can be
represented declaratively: when there is a language that can capture
the underlying constraints on the objects without depending how the
actual relations are to be solved. The notion of a constraint language
arises from this desire for abstraction. Constraints provide an elegant
means of stating relationships and of declaratively characterizing
properties among objects that can be maintained by an underlying system.

TR # 93-05-02  
"A Periodic Object Model for Real-Time Systems"
H. R. Callison

We introduce time-sensitive objects (TSO's), a data-oriented model for
real-time systems.  The TSO model imposes time constraints
on object values through validity intervals and object histories.
Periodic objects, a class of objects within the TSO model,
are described in detail and compared with more traditional periodic processes.
We identify advantages of periodic objects including greater scheduling
independence, more opportunity for concurrency, and tolerance of
timing faults. 

TR # 93-05-03  
"The Practical Application of Retiming to the Design of 
High-Performance Systems"
Brian Lockyear, Carl Ebeling

Many advances have been made recently in the theory of circuit
retiming, especially for circuits that use level-sensitive latches.
In spite of this, automatic retiming tools have seen relatively little
use in practice.  One reason for this is the lack of good speedup
results when retiming has been applied to real circuits.  Another
reason is that retiming has used a rather simple circuit model which
reduces its utility in practice.  This paper addresses both of these
issues.  We suggest that the reason for the poor results reported for
retiming is that retiming has been applied too late in the design
process when there is little flexibility for performance improvement.
We give an example of using retiming early in the design process to
achieve better performance while at the same time simplifying the
design process itself.  We then describe an extension to the retiming
circuit model that includes clock skew as well as latch propagation
delay, setup and hold parameters.  Including these parameters allows
retiming to generate the fastest circuit subject to a given amount of
clock skew, or generate the most robust circuit with respect to skew
for a given clock frequency.  This gives level-clocked circuits yet
another advantage over edge-clocked circuits since edge-clocked
circuits require margin for clock skew while level-clocked circuits
can be retimed to be inherently skew-tolerant.  We illustrate these
techniques using a serial-parallel multiplier circuit.
	
TR # 93-05-04  
"Minimizing the Effect of Clock Skew Via Circuit Retiming"
 Brian Lockyear, Carl Ebeling

Clock skew is often cited as an impediment to designing
high-performance synchronous circuits.  Clock skew reduces the
performance of a circuit by reducing the time available for
computation, and it may even cause circuit failure by allowing
race conditions.  In this paper, we show how to use retiming
to reduce the effect of clock skew in both edge-clocked and
level-clocked circuits.  We include both fixed clock
skew, for example skew which results from clock 
distribution wiring and buffering, and  variable clock skew which
results from uncontrolled variation in process parameter and operating
conditions.  By including fixed skew in the maximum constraint
equations retiming can find the fastest circuit given that
skew.  By including variable skew, retiming can also generate
the circuit that tolerates the 
most variation in clock skew for a given clock frequency.  Clock skew
can also be used to speed up circuits using a technique similar to
retiming described by Fishburn~\cite{Fishburn}.  We describe a method
for combining this technique which uses added clock skew with
retiming to optimize the performance of edge-clocked circuits.

TR # 93-05-05  
"Training Compilers to Make Better Inlining Decisions"
Jeffrey Dean and Craig Chambers

Optimizing implementations for object-oriented languages rely on
aggressive inlining to achieve good performance.  Sometimes the
compiler is over-eager in its quest for good performance, however, and
inlines too many methods that merely increase compile time and consume
extra compiled code space with little benefit in run-time
performance.  We have designed and implemented a new approach to
inlining decision making in which the compiler performs inlining
experimentally and records the results in a database that can be
consulted to guide future inlining decisions of the same routine at
call sites that have similar static information.  Central to our
approach is a new technique, call type group analysis, that calculates
how much of the static information available at a call site was 
profitably used during inlining.  The results of type group analysis
enable the compiler to compute a generalization of the actual static
information for a particular experiment, significantly increasing
reuse of database entries.  Preliminary results indicate that compile time
is almost cut in half with only at 15% reduction in run-time performance.

TR # 93-05-07  
"Asynchronous Design Methodologies:  An Overview"
Scott Hauck

Asynchronous design has been an active area of research since at least
the mid 1950's, but has yet to achieve widespread use.  We examine the
benefits and problems inherent in asynchronous computations, and in
some of the more notable design methodologies.  These include Huffman
asynchronous circuits, burst-mode circuits, Micropipelines,
template-based and Trace Theory-based delay-insensitive circuits,
Signal Transition Graphs, Change Diagrams, and compilation-based
quasi-delay-insensitive circuits.

TR # 93-05-08
"Pixel Arithmetic in Mathematics Education Using Image Processing
Steve Tanimoto, James R. King, and Dennis Lee

A major challenge to mathematics educators for teenage
students nowadays is motivating them to pay attention to
the subject.  We recently began a program of software
development and testing which aims to interest 14-year-old
students in mathematical activities.  Our vehicle is image processing
on the personal computer.  A central aspect of our approach
is the use of ``pixel arithmetic'' as a computational
system with its own semantics which is related to, but distinct
from that of real-number arithmetic.  In this article, the necessity
for pixel arithmetic is explained, together with insights about
the pedagogical context in which it arises.