PARA'04 State-of-the-Art
in Scientific Computing
June 20-23, 2004 (Home page)

Updated: 6 February 2004

Study of Load Balancing Strategies for Finite Element Computations on Heterogeneous Clusters

Kalyani Munasinghe, University of Ruhuna, Sri Lanka
and
Richard Wait, Uppsala University, Sweden
email: kalyani@cc.ruh.ac.lk

In this paper we study strategies for redistributing the load in an adaptive finite element computation performed on a cluster of workstations. The load balancing has two parts: first there is a need to calculate the quantity of computational load to be redistributed, then it is necessary to identify the nature of the data transfers between the processors. The cluster is assumed to be a heterogeneous, multi-user computing environment so that efficient load balancing within the application must take external factors into account. At any time all the users of the network are competing for resources. The performance of a particular processor, as a component in the parallel (message passing) computation, depends on both static factors, such as the processor hardware, and dynamic factors, such as the system load and the activities of other users. For each processor, the external factors can be condensed into a single parameter, the load index, which is a normal! ised measure of the current spare capacity of the processor available to the application.

The system load can be measured in a variety of ways, for example there are UNIX utilities to capture different aspects of processor activity. It is possible to monitor other users in terms of keyboard activity and mouse clicks. Some machines on the network may be subject to special restrictions on the use by distributed applications. They may only be available at particular times or when certain other users are absent. Thus a mouse click may indicate a machine is no longer available and all the existing load must be removed and redistributed immediately. The experiments reported here were conducted with different formulations of the load index to show the differences in the re-balancing of the load and the effect on the overall run-time of the application on a multi-user network of workstations. The data for calculating the load is collected by agents. Using a load index, it is possible to calculate a distribution of the current computational load within the! application that would, in some sense, be ideal under the current system conditions. This assumes that the total cost of the redistributed computation is unaltered. This is not usually the case in distributed finite element computations. It also assumes that the load can be redistributed sufficiently rapidly so that the system parameters used to compute the load index are still relevant.

The second part of the dynamic load-balancing in a finite element application is therefore concerned with identifying which load components are to be redistributed and with moving the associated data as quickly as possible. The finite element grid is divided into sub-domains and each sub-domain is associated with a different processor. On a network, it is assumed that all processors are connected, but the topology of the finite element sub-domains can be interpreted as a processor topology and hence for each processor it is possible to define set of neighbours. If the sub-domains are modified, then this topology may change. In a finite element analysis, the quantity of computation on a processor is proportional to the size of the sub-domain plus some contribution from the neighbours (the edge cuts). We consider schemes that modify the sub-domains by, in general, moving data to adjacent processors: however, when dramatic changes in the load are required, such as! a processor being no-longer available, a more fundamental repartitioning of the grid is needed. The numerical experiments show the efficiency of the modification strategies and the effect on the overall computation time.

Home page


2004-02-06