Simultaneous Multithreading: Maximizing On-Chip Parallelism

Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy,

The increase in component density on modern microprocessors has led to a substantial increase in on-chip parallelism. In particular, modern superscalar RISCs can issue several instructions to independent functional units each cycle. However, the benefit of such superscalar architectures is ultimately limited by the parallelism available in a single thread.

This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. In the most general case, the binding between thread and functional unit is completely dynamic. We present several models of simultaneous multithreading and compare them with wide superscalar, fine-grain multithreaded, and single-chip, multiple-issue multiprocessing architectures. To perform these evaluations, we simulate a simultaneous multithreaded architecture based on the DEC Alpha 21164 design, and execute code generated by the Multiflow trace scheduling compiler. Our results show that: (1) No single latency-hiding technique is likely to produce acceptable utilization of wide superscalar processors. Increasing processor utilization will therefore require a new approach, one that attacks multiple causes of processor idle cycles. (2) Simultaneous multithreading is such a technique. With our machine model, an 8-thread, 8-issue simultaneous multithreaded processor sustains over 5 instructions per cycle, while a single-threaded processor can sustain fewer than 1.5 instructions per cycle with similar resources and issue bandwidth. (3) Multithreaded workloads degrade cache performance relative to single-thread performance, as previous studies have shown. We evaluate several cache configurations and demonstrate that private instruction and shared data caches provide excellent performance regardless of the number of threads. (4) Simultaneous multithreading is an attractive alternative to single-chip multiprocessors. We show that simultaneous multithreaded processors with a variety of organizations are all superior to conventional multiprocessors with similar resources.

While simultaneous multithreading has excellent potential to increase processor utilization, it can add substantial complexity to the design. We examine many of these complexities and evaluate alternative organizations in the design space.

Proceedings of the 22rd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995.

To get the PostScript file, click here.