PARA'04 State-of-the-Art
in Scientific Computing
June 20-23, 2004 (Home page)

Updated: January 31, 2004

Thread-Level Speculation and Chip-Multiprocessors: An Overview

F. J. Villa, M. E. Acacio and J. M. Garcia
Faculty of Informatics
Campus de Espinardo
30071 Espinardo, Murcia, Espana
Email: fj.villa@ditec.um.es

Traditionally, scientific problems have been solved making use of mid- and large-range multiprocessors. Whereas these high-performance, expensive systems are oriented to the execution of programs with high computational requirements, there are alternative architectures with lower cost and complexity that are suitable for solving medium- and low-sized scientific problems.

Superscalar processors are a common approach when high performance computing is required. However, the complexity of the issue logic and the data dependences will limit the performance of these processors in a near future. Very Long Instruction Word (VLIW) processors are another architecture appeared in the last decade. This type of processor uses long instructions made up of several shorter instructions that execute in parallel. Both superscalar and VLIW processors try to extract instruction-level parallelism (ILP) in order to reduce the execution time of the applications; however, typical integer scientific programs have very low ILP, so other forms of parallelization are needed to achieve the performace required by these problems.

Simultaneous multithreading (SMT) is a novel technique aimed at maximizing on-chip parallelism. A SMT core permits to issue instructions belonging to several independent threads to multiple functional units each cycle. Using this mechanism, it is possible to outperform the instruction throughput of a single-threaded wide superscalar, increasing the utilization of the execution units. This technique has been adopted for example in the Power5 processor design [3].

Finally, thread-level speculation (TLS) allows to extract speculative parallelism from sequential applications. TLS has been sucessfully used in the parallelization of some bechmarks such as SPEC95 and SPEC2000 suites, what allows their execution on a single-chip multiprocessor [4]. A single-chip multiprocessor [1,2], also called chip-multiprocessor or CMP, is a multiprocessor architecture in which several processor cores are integrated in a single-chip. CMPs are a promising approach in order to execute sequential (making use of a parallelization technique as TLS) and parallel scientific applications, providing the benefits of traditional multiprocessors at a lower cost.

In this paper, we summarize recent proposals related to these high-performance architectures, making special emphasis on thread-level speculation and single-chip multiprocessors. We will show how these ongoing developments can provide the potencial for solving increasingly difficult problems, like high-performance linear algebra algorithms and scientific applications. Additionally, we outline the approach we are following in this field.

References: [1] L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets and B. Verghese. "Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing". In Proceedings of 27th International Symposium on Computer Architecture, pages 282-293, Vancouver, Canada, June 2000.
[2] L. Hammond, B. A. Hubbert, M. Siu. M. K. Prabhu, M. Chen and K. Olukotun. "The Stanford Hydra CMP". IEEE Micro, 20(2):71-84, March 2000. [3] R. Kalla, B. Sinharoy and J. Tendler. "Simultaneous Multithreading Implementation in POWER5". In 2003 Hotchips, Stanford, CA, August 2003.
[4] The STAMPede Project. http://www-2.cs.cmu.edu/ stampede/
Home page


Jerzy Wasniewski
2004-01-31