New release of the DLB library (2.0) offering a new asynchronous API for runtime systems and the novel DROM module that provides a public API to manage the computational resources assigned to running processes
The Computer Sciences department at BSC is proud to announce the release of DLB 2.0 the Dynamic Load Balance library improves the load balance of parallel applications at runtime. “Fixing load imbalance on applications is not only important to improve a single application’s performance, but it is also key to boost the utilization of supercomputing systems” says Marta Garcia, lead scientist of the DLB tool.
“DLB has a tremendous potential to address for free the imbalance issues in hybrid applications that would otherwise require significant refactoring efforts” says Jesús Labarta, Computer Science department director.
DLB is today helping to improve balance in different European projects, such as the Human Brain Project, HPC Europa 3, MontBlanc 3, and it is used for a wide range of applications of different domains, like for instance neuroscience, computational mechanics, molecular dynamics, cosmological simulations or climate modeling.
“DLB is our preferred tool to mitigate imbalances occurring on Alya executions. These imbalances appear spontaneously or come from inaccurate load distributions. DLB solves both problems at runtime, acting only when necessary, making our code much more resilient for modern HPC systems. We save millions CPU hours every year by using DLB” says Ricard Borrell, Senior Researcher at BSC’s CASE department.
The newest version of DLB (2.0) offers a new module called the DROM module, which allows external entities (e.g. Job scheduler, resource manager), to request a change of resources for a running process. Thus, now the DLB library is organized in two different modules, LeWI and DROM, that are independent between them but can work coordinated.
More specifically, in this new version, apart from several bug fixes, we have introduced the following new features:
DROM (Dynamic Resource Ownership Management) module.
DROM offers an API for external entities (i.e. Job Scheduler, Resource Manager…), it allows to remove CPUs from a running process to assign them to a new process or an existing one.
Asynchronous version of LeWI (Lend When Idle) load balancing algorithm.
The load balancing algorithm LeWI can work in a synchronous and asynchronous mode. The new asynchronous mode provides an interaction between the runtime and DLB without polling.
New DLB public API
- More clear, with the unification of names
- More exhaustive, supporting more use cases
Callback system for parallel runtimes
The callback system allows registering functions as callbacks for DLB actions, providing a friendly interface for integrating new parallel runtimes with DLB.
Support for interoperability of multiple runtimes
DLB provides support for several parallel runtimes within the same process sharing computational resources.
New mechanism to set DLB options based in DLB_ARGS environment variable
Now all the options passed to DLB are contained in an environment variable, facilitating the configuration of DLB and the detection of errors when setting of options.
You can freely download DLB (distributed as open source under LGPL-3.0 license) and get more information at DLB’s website