PARA'04 State-of-the-Art
in Scientific Computing
June 20-23, 2004 (Home page)

Updated: 6 February 2004

Towards Automatic Performance Tuning of Matrix Triple Products Based on Matrix Structure

Eun-Jin Im, Kookmin University
Seoul, Korea
email: ejim@eecs.berkeley.edu
and
Ismail Bustany
Barcelona Design Inc., U.S.A.
email: Ismail.Bustany@barcelonadesign.com
and
Cleve Ashcraft
Livermore Software Technology Corporation, U.S.A.
email: cleveashcraft@earthlink.net
and
James W. Demmel
U.C.Berkeley, U.S.A.
email: demmel@eecs.berkeley.edu
and
Katherine A. Yelick
U.C.Berkeley, U.S.A.
email: yelick@eecs.berkeley.edu

Sparse matrices arise in many scientific and engineering applications. However, the speed of operating on sparse matrices tends to be far below the theoretical peak performance of the processor, largely due to the low ratio of arithmetic operations to data accesses and the use of indirect data structures. In prior work we demonstrated the effectiveness of using data structure and algorithm transformations, such as register and cache blocking, to significantly improve the performance of sparse matrix-vector multiplication. The problem of automatically selecting optimization parameters, such as block size, proved to be the most difficult part of this work, so we developed an automatic tuning system called Sparsity to select these parameters based on performance heuristics and search. In more recent work by the Berkeley BeBOP group, this approach has been applied to several other operations, including sparse triangular solve, $A*A^T*x$, and to symmetric matrices.

One of the themes in this work is the identification of patterns within the sparse matrix, such as small dense sub-blocks and symmetry, that can be used to optimize performance. Performance modeling shows that for this class of optimizations, the current performance is near an upper bound for most matrices on most machines, so additional improvements in performance need to come from new data structures and algorithms.

In this paper we look at an important sparse matrix kernel that commonly arises in solving primal-dual optimization problems in many applications such as circuit design, optimal control, portfolio optimization, etc. This kernel is often a bottleneck in solving the large-scale symmetric definite sparse matrix problems associated with primal-dual methods. The sparse matrix patterns are particular to the primal-dual problem (e.g. linear, quadratic, etc.) and the application type. In this work, we address the following matrix problem: $P=A*H*A^T$ where A is a sparse matrix and H is a block diagonal matrix with each diagonal block of H, $H_i$, having a rank-1 structure, i.e. $H_i = D_i + r_i * r_i^t$ for a diagonal matrix $D_i$ and a column vector $r_i$.

The matrix triple product can be treated as two generic sparse matrix products by storing both A and its transpose and performing a full sweep through both instances. However, this generic approach places unnecessary demand on the memory systems and ignores the specialized structure of H. Instead, we introduce a single pass algorithm that exploits the block diagonal structure of the matrix H; our algorithm use fewer floating point operations and roughly half the memory of the two-pass algorithm. The speed-up of the one-phase scheme over the two-phase scheme is 2.12 on 900 MHz Intel Itanium 2, 1.87 on a 900 MHz Sun Ultra 3, and 1.29 on 1.8 GHz Intel Pentium 4, and the one-phase algorithm saves nearly half the total memory usage of the two-phase algorithm. We use memory system performance modeling to explain the improvements in performance, most of which come from the reduction in memory traffic.

This paper is the first step in expanding the class of operations and optimization strategies used by Sparsity. Not only is the matrix triple product is a higher level operation than those we considered in previous work, but the use of algebraic transformations to take advantage of the specialized structure of H represent a new class of optimizations that we believe will be important in other application areas.

Home page


2004-02-06