Dynamic Load Balancing
DLB is a library devoted to speedup hybrid parallel applications (see fig. 1). And at the same time DLB improves the efficient use of the computational resources inside a computing node.
The DLB library will improve the load balance of the outer level of parallelism by redistributing the computational resources at the inner level of parallelism. This readjustment of resources will be done at dynamically at runtime.
This dynamism allows DLB to react to different sources of imbalance: Algorithm, data, hardware architecture and resource availability among others.
In fig. 2 we can see a typical life cycle when developing an HPC application. This process is time consuming for the programmer/expert and spends computational resources. Moreover when talking about highly tunned applications this cycle must be done for each different architecture the application will run.
To use DLB we do not need a previous performance analysis of the application (see fig. 3). Therefor we will reduce significantly the time spent by the programmers modifying the application, debugging and analyzing the sources of performance loss and computational resources necessary to run the performance tests.
The DLB library uses an interposition technique during runtime, therefor in most of the cases it is not necessary to modify the application nor recompile it (see fig. 4).
The DLB approach to redistribute the computational resource at runtime depending on the instantaneous demand can improve the performance in different situations:
Who can use DLB?
Any application written in C, C++ or Fortran in any of the supported parallel programming models.
The current supported parallel programming models are the following:
We are open to adding support for more programming models in both inner and outer level of parallelism.
How does DLB work?
DLB will use the malleability of the inner level of parallelism to change the number of threads of the different processes running in the same node. There are different load balancing algorithms implemented within DLB. They all relay on this main idea but they target different types of applications or situations.
What does DLB need?
DLB need more than one process running in the same computing node (with shared memory). These processes can be MPI processes of the same application or processes of different applications.
We need the processes to use a shared memory parallel programming model (current version supports, OpenMP and OmpSs).
The shared memory programming model must be malleable, both from the point of view of the programming model (OpenMP and OmpSs are highly malleable) and from the point of view of the application (do not rely on the number of threads).
fig. 1: Hybrid application with nested parallelism
fig. 2: Typical HPC application life cycle
fig. 3: HPC application life cycle with DLB
fig. 4: DLB interposition mechanism used with MPI applications
fig 5: Example of DLB load balancing algorithm.