Programming Models @ BSC

Boosting parallel computing research since 1989

Dynamic Load Balancing


DLB logo

DLB is a library devoted to speed up hybrid parallel applications and maximize the utilization of computational resources since 2009.

DLB stack
Figure 1: DLB interaction with HPC software stack
  • It is a dynamic library transparent to the user
    • No need to analyze nor modify the application
    • Transversal to the different layers of the HPC software stack (see fig. 1)
  • Maximizes the utilization of computational resources
    • Improves the load balance of hybrid applications (see fig. 2)
    • Manages the number of threads in the shared memory level
    • Compatible with MPI, OpenMP and OmpSs. (We are open to adding support for more programming models in both inner and outer level of parallelism.)
  • Since version 3.0 DLB includes three independent and complementary modules:
    • LeWI: Lend When Idle
    • DROM: Dynamic Resource Ownership Management
    • TALP: Tracking Application Live Performance
HPC App
Figure 2: Hybrid application with nested parallelism

LeWI: Lend When Idle

This module aims at optimizing the performance of hybrid applications without a previous analysis or modifying the code.

LeWI will improve the load balance of the outer level of parallelism by redistributing the computational resources at the inner level of parallelism. This readjustment of resources will be done dynamically at runtime.

This dynamism allows DLB to react to different sources of imbalance: Algorithm, data, hardware architecture and resource availability among others.

The DLB approach to redistributing the computational resource at runtime depending on the instantaneous demand can improve the performance in different situations:

dlb example
Figure 3: Example of DLB and LeWI load balancing algorithm
  • Hybrid applications with an imbalance problem at the outer level of parallelism
  • Hybrid applications with an imbalance problem at the inner level of parallelism
  • Hybrid applications with serialized parts of the code
  • Multiple applications with different parallelism patterns

In fig. 3 we can see an example of LeWI. In this case, the application is running two MPI processes in a computing node, with two OpenMP threads each one. When MPI process 1 reaches a blocking MPI call it will lend its assigned cpus (number 1 and 2) to the second MPI process running in the same node. This will allow MPI process 2 to finish its computation faster.

DROM: Dynamic Resource Ownership Management

This module allows reassigning computational resources from one process to another depending on the demand.

DROM offers and API that can be used by an external entity, ie. Job Scheduler, Resource Manager…

With the DROM API a CPU can be removed from a running application and given to another one:

  • To a new application to allow collocation of applications
  • To an existing application to speedup its execution

TALP: Tracking Application Live Performance

TALP is a lightweight, portable, extensible, and scalable tool for online parallel performance measurement. The efficiency metrics reported by TALP allow HPC users to evaluate the parallel efficiency of their executions, both post-mortem and at runtime.

The API that TALP provides allows the running application or resource managers to collect performance metrics at runtime. This enables the opportunity to adapt the execution based on the metrics collected dynamically. The set of metrics collected by TALP are well defined, independent of the tool, and consolidated.

Contact Information

Downloads and User Guide

The DLB source code is distributed under the terms of the LGPL-3.0 and can be downloaded here. You can also visit our github page.

The DLB user guide is available online here.

A DLB tutorial can be downloaded here.

Publications

To cite DLB
More details on DLB