Programming Models @ BSC

Boosting parallel computing research since 1989

Contributions at Joint laboratory for Petascale computing

- Written by Xavier Teruel

Workshop: Sophia Antipolis, FRANCE
Event Date: June 9-11, 2014
Speakers: Jesus Labarta, Victor Lopez and Florentino Sainz


  • Presentation of BSC activities (Jesus Labarta)
  • DLB: Dynamic Load Balancing Library (Victor Lopez)
  • DEEP Collective offload (Florentino Sainz)


Dynamic Load Balancing Library: DLB is a dynamic library designed to speed up hybrid applications by improving its load balance with little or none intervention from the user. The idea behind the library is to redistribute the computational resources of the second level of parallelism (OpenMP, OmpSs) to improve the load balance of the outer level of parallelism (MPI). DLB library uses an interposition technique at run time, so it is not necessary to do a previous analysis or modify the application; although finer control is also supported through an API. Finally, we also present a case study with CESM (Community Earth System Model), a global climate model that provides computer simulations of the Earth climate states. The application already uses a hybrid parallel programming model (MPI+OpenMp), so with few modifications in the source code we have compiled it to use the OmpSs programming model where DLB will benefit from the high malleability of it.

Deep Collective offload: We present a new extension of OmpSs programming model which allows users to dynamically offload C/C++ or Fortran code from one or many nodes to a group of remote nodes. Communication between remote nodes executing offloaded code is possible through MPI. It aims to improve programmability of Exascale and nowadays supercomputers which use different type of processors and interconnection networks which have to work together in order to obtain the best performance. We can find a good example of these architectures in the DEEP project, which has two separated clusters (CPUs and Xeon Phis). With our technology, which works in any architecture which fully supports MPI, users will be able to easily offload work from the CPU cluster to the accelerators cluster without the constraint of falling back to the CPU cluster in order to perform MPI communications.

More Info