You can get a DLB stable package from the links below, or visit https://github.com/bsc-pm/dlb for the most up to date version. Check also our installation guide.
- dlb-3.6.0+beta1.tar.gz (Release Notes)
- dlb-3.5.3.tar.gz (Release Notes)
- dlb-3.5.2.tar.gz (Release Notes)
- dlb-3.5.1.tar.gz (Release Notes)
- dlb-3.5.0.tar.gz (Release Notes)
- dlb-3.4.1.tar.gz (Release Notes)
- dlb-3.4.tar.gz (Release Notes)
- dlb-3.3.1.tar.gz (Release Notes)
- dlb-3.3.tar.gz (Release Notes)
- dlb-3.2.tar.gz (Release Notes)
- dlb-3.1.tar.gz (Release Notes)
- dlb-3.0.2.tar.gz
- dlb-3.0.1.tar.gz
- dlb-3.0.tar.gz (Release Notes)
- dlb-2.1.tar.gz (Release Notes)
- dlb-2.0.2.tar.gz
- dlb-2.0.1.tar.gz
- dlb-2.0.tar.gz (Release Notes)
- dlb-1.3.2.tar.gz
- dlb-1.3.1.tar.gz
- dlb-1.3.tar.gz (Release Notes)
- dlb-1.2.tar.gz (Release Notes)
- dlb-1.1.tar.gz (Release Notes)
- dlb-1.0.tar.gz (Release Notes)
Version 3.6.0-beta1
Added
- Initial support for GPU TALP metrics (includes NVIDIA and AMD support)
- Added a new and more robust support for MPI Fortran 2008 bindings
- New binary
dlb_mpito check affinity in MPI environments - Flags
--gpu-affinityand--uuidfordlbanddlb_mpito show GPU visibility - New instrumentation events for OMPT callbacks
- New API
DLB_DROM_SetProcessMaskStrto set masks using a human-readable input
Changed
--talp-output-filenow creates missing directories if able
Fixed
- Fixed TALP Global region not being started if no MPI or OpenMP
- Replace PAPI reset calls with regular reads to improve performance
- Add several PAPI init checks
- Fixed several LeWI features in async mode
Version 3.5.3 (2025-09-08)
Added
- Added documentation for
DLB_DROM_FLAGS_NONEargument - Other minor documentation corrections
Changed
- Remove MPI Fortran 2008 bindings check at configure time; bindings remain disabled pending new interception method
Fixed
- Fixed compilation error with GCC 15
- Fixed errors in the OpenMP thread manager during initialization
- Fixed some OpenMP thread manager logic for SMT systems
- Fixed bug in
DLB_MonitoringRegionReset; region was not being removed from an internal list of open regions - Stop instrumentation after
MPI_Finalizeto avoid unwanted interactions with external libraries
Version 3.5.2 (2025-05-02)
Added
- Add
dlb_mpibinary to display CPU affinity and MPI rank - Add more verbose messages for OMPT events
- Add implicit-task-end OMPT event to add consistency to Cray OpenMP
Fixed
- Fixed several errors with CPU topology parsing
- Removed deprecated options and struct members in examples
Version 3.5.1 (2025-03-05)
Added
- Add
--disable-sphinx-doc,--disable-doxygen, and--disable-pandoctoconfigurescript to disable the automatic detection of each tool
Fixed
- Fixed several bugs with CPU affinity masks detection and parsing
- POP metrics conditional printing improved for MPI/OpenMP detection
- Fix some compilation errors with
clang-19,nvc, andnvfortran - Improved documentation in user guide
- Several other minor fixes
Version 3.5.0 (2024-12-03)
Added
- Asynchronous support for classic LeWI
- Several SMT enhancements for LeWI policies
- Allowed to override lewi classic/mask with
--lewi-affinity - TALP POP metrics now includes experimental OpenMP hybrid metrics
- TALP global region is now exposed in the API
- TALP-Pages, a new tool for Continuous Performance Monitoring in static HTML pages
- Add flag
--talp-region-selectto filter active regions - SLURM integration via
dlb_taskset - CMake config for other projects to link with DLB
- Several examples and documentation reworked
- DLB version information can be accessed though the API
Changed
--talp-summaryhas been simplified and nowpop-metricsalso includes raw metrics if using an output file, andprocessmetrics now includes node identifiers- TALP now only stores monitoring regions in shared memory if
--talp-external-profileris set - TALP output structure has been reworked
- TALP main region is now called “Global”
Fixed
- LeWI mask now correctly supports threads blocked in MPI calls while pinned to multiple CPUs
- Add sanity checks for hardware counters in TALP
- Print JSON and CSV files in the proper locale
Deprecated
--talp-summaryvalues forpop-rawandnodeare deprecated- TALP output format XML is now deprecated
--talp-regions-per-procflag is deprecated for a new experimental--shm-size-multiplierflag- Several fields in
dlb_monitor_tare now deprecated - Several fields in
dlb_pop_metrics_tare now deprecated DLB_MonitoringRegionGetMPIRegiondeprecated in favor ofDLB_MonitoringRegionGetGlobalDLB_Stats_GetCpuStateIdlefunctionality no longer providedDLB_Stats_GetCpuStateOwnedfunctionality no longer providedDLB_Stats_GetCpuStateGuestedfunctionality no longer provided
Version 3.4.1 (2024-08-16)
Fixed
- Fix an error in the shared memory alignment that was causing
segmentation faults when compiling with
-march=native - Avoid registering role shifting callbacks for other non-related OpenMP thread managers
- Update examples with supported options
- Fix some parameters in the Fortran’08 interface
- Be more resilient if PAPI fails to initialize
- Enhance compatibility in other systems
- Quote string names in csv files
- Several other minor fixes
Version 3.4 (2023-12-22)
Added
- PAPI support for TALP metrics
libdlb_mpic.soandlibdlb_mpic_*.soare C MPI only libraries that may be built using--enable-c-mpi-libraryat configure time- Functions to reset, stop, start and report monitoring regions now accept the special argument DLB_MPI_REGION for the implicit region
- Function
DLB_TALP_QueryPOPNodeMetricsfor third-party applications to query pop metrics. Requires--talp-external-profiler. - Named barriers and several API functions to manage them
- Added
--lewi-barrierand--lewi-barrier-selectto fine-tune which barriers activate LeWI. - Added
--lewi-colorto select specific key only for LeWI
Changed
libdlb_mpif.soandlibdlb_mpif_*.soare no longer built by default, only if--enable-fortran-mpi-libraryis set at configure time- Flag
--quietnow only suppresses INFO and VERBOSE, added new flag--silentto keep the old functionality to suppress all messages - Refactor
DLB_TALP_CollectNodeMetricstoDLB_TALP_CollectPOPNodeMetricsand add communication efficiency - TALP now appends to CSV files if they already exist
Fixed
- Fixed wrong generated code for
MPI_InitializedandMPI_Finalized
Deprecated
--lewi-omptno longer accepts “mpi” nor “aggressive” as values. Automatic LeWI via synchronization calls is now done with--lewi-mpi-callsfor MPI and--lewi-barrieror--lewi-barrier-selectfor DLB Barriers.
Version 3.3.1 (2023-05-18)
Fixed
- Fixed wrong generated code for
MPI_InitializedandMPI_Finalized
Version 3.3 (2023-05-15)
Added
- Free agent and Role-shift OMPT thread managers to support LeWI with both implementations
- Flag
--ompt-thread-managerto select which OpenMP implementation to use - MPI Fortran 2008 bindings
- TALP flag to generate file in different output formats
--talp-output-file - New TALP collective functions to gather and compute metrics:
DLB_TALP_CollectPOPMetricsandDLB_TALP_CollectNodeMetrics
Changed
libdlb_mpi.soandlibdlb_mpi_*.sohave now both C and Fortran MPI symbols
Fixed
- Fixed DROM pre-initialization if child had empty cpuset affinity
- Fixed
--lewi-max-parallelism - Fixed several TALP bugs
- Fixed some finalization errors during MPI finalize
- Fixed cpuset parsing when provided a non-contiguous mask
Version 3.2 (2022-03-16)
Added
- Flag
--verboseto enable all verbose modes - Flag
--talp-summary=pop-rawto print raw POP metrics - Flag
--lewi-respect-cpusetto allow LeWI to use CPUs not yet registered
Changed
- DROM can now steal all CPUs from one process
- DROM can now inherit a subset of CPUs from other process
DLB_DROM_SetProcessMaskto oneself does not longer require aDLB_pollDROMDLB_Lendin OpenMP applications now invokes the OpenMP runtime to change the number of threads
Fixed
- Fixed TALP regions enabled or registered only on some processes
- Fixed minor option parsing
Version 3.1 (2021-11-05)
Added
- New
--lewi-mpi-callsvalue:none - New MPI runtime version check during initialization
- Experimental meson build files
- Add better support for getting/setting process mask from own process
Changed
- Enable
--barrierby default - Rename
--lewi-mpito its opposite:--lewi-keep-one-cpu, and change the default behavior - Rename
--talp-summary=appto--talp-summary=pop-metricsand make it the default value - Rename TALP to Tracking Application Live Performance
- CPU priority now follows a better topology order
- Properly clean up shared memory during initialization
Fixed
- Fixed several TALP issues
- Improves support for OMPT
- Proper shared memory clean up if running under
dlb_run
Deprecated
- Python viewer scripts are no longer installed
Version 3.0 (2021-01-15)
Added
- New TALP module: Tracking Application Low-level Performance
- TALP Monitoring Regions for user-defined regions
- Allow processes to attach to / detach from the Barrier module
- Improve the verbose messages for some modules
- Allow partial instrumentation of some events
- Man pages for DLB commands
Changed
- DLB library now always prints to stderr, DLB binaries may still use stdout
- Dropped support for binary mask old format
1000bin favor of0b0001 DLB_ARGSvariable now takes precedence overDLB_Initargument
Fixed
- Some callbacks not being invoked when the action involved some successful actions and some others not allowed
- Several DROM inconsistencies
- Several minor fixes
- Minor documentation fixes
Version 2.1 (2019-07-15)
Added
- OMPT full support
- Add function to API:
DLB_UnsetMaxParallelism - New binary
dlb_runto preinit masks, needed for OMPT applications - Improved
dlb_shm --listoutput - New verbose option
affinityto print hardware information - Add option
--quietto silent all info and warning messages - New test mechanism based on LIT
Changed
DLB_Lenddoes no longer keeps the current CPU- PreInit service now handles the timeout if the synchronous flag is provided
- DROM now accepts registering and setting empty masks
- Verbose options now affect all library versions
Fixed
- Added some mechanisms to clean shared memories when the program aborts
- Fixed several race conditions with the asynchronous messages
- Fixed an issue where
--lewi-affinitywas being ignored - Adapt Fortran headers to be fixed-form compatible
Version 2.0 (2017-12-21)
Added
- Callback system
- API for subprocesses
- New interaction mode choice (synchronous as it was before, or asynchronous)
- Doxygen HTML and man pages documentation
- Test coverage support
- OMPT experimental support
Changed
- API reworked
- A petition of a CPU now registers the process into a CPU petition queue. If this CPU becomes available, DLB can schedule which process will acquire it
- A petition of an unspecified CPU registers the process into a global petition queue, this queue has less priority than the CPU queue
- DLB acquire now does not schedule because it forces the acquisition, DLB Borrow does scheduling but only looks for idle CPUs and never creates a CPU request
- DLB options print format reworked,
DLB_ARGSis now used to pass options to DLB. - Fortran Interface is now a Fortran include with ISO C bindings.
- Shared Memory synchronization mechanism is now managed using a pthread spinlock
- DROM services now use the same shmem handler so no need to call
DLB_InitandDLB_DROM_Init - DROM services Get/Set process mask are now asynchronous functions
Fixed
- Test suite now follows a Unit Testing approach and does not need undocumented external tools
- Several minor bugs
- PreInit service now correctly handles environment variables and allows forks
Deprecated
DLB_MASKis no longer used, only registered CPUs may be used by other processesDLB_DROM_GetCpusservice has been removedLB_OPTIONtype of environment variables is deprecated
Version 1.3 (2017-07-12)
Added
- Shared Memory consistency checks upon registration
- The binary
dlb_tasksetcan now be used to launch processes under DLB - Service
DLB_Barrierto perform a barrier between all processes in the node - DLB Services now return an error code
- New User Option
LB_PRIORITYto manage how CPUs are distributed according to HW locality - Add services for DROM to PreInit and PostFinalize to manage process that will fork/exec
- Add services to check if the process mask needs to be changed
Changed
- Refactor policy based Shared Memory into two general purpose
cpuinfoandprocinfo - DROM interface is now considered stable. External processes can manage the CPU ownership of DLB processes
- DLB options are now parsed through the environment variable
DLB_ARGS
Fixed
- Several minor bugs
Deprecated
- Signal handler feature is no longer supported
Version 1.2 (2016-12-23)
Changed
- Improved User Guide
- Improved configure script and macros
Fixed
- Compatibility with gcc6
- Compatibility with icc17
- Compatibility with C11 standard
- Several minor bugs
Version 1.1 (2016-02-04)
Added
- Policy
Autonomous Lewi Maskto micro-manage each thread individually - DLB services to init/finalize without MPI support
- DLB services to enable/disable or mark a serial/parallel sections of code
- DLB service to modify User Options during program execution
- New library independent from MPI
- Binary
dlb_shmto manage Shared Memory from CLI - Binary
dlb_tasksetto manage processes’ mask from CLI - BG/Q support
- Add
LB_MASKenvironment variable to specify aDLB_MASKcommon to all processes - Experimental implementation of
DLB_Statsinterface - Experimental implementation of
DLB_DROMinterface - Signal handler feature to clean up DLB on termination
- In-code Doxygen documentation
- User guide using Sphynx generator
- Distributable examples
Changed
- Replace IPC with POSIX Shared Memory
- Replace Shared Memory synchronization mechanism from semaphores to pthread mutexes
- Shared Memory creation is now decentralized from rank 0 and it will be created when necessary
- Shared Memory internal structure reworked to host information about ownership and guesting of CPUs
- CPUs from
DLB_MASKcan guest any process, but they will not have an owner - Improve support for detection of Shared Memory runtimes
Fixed
- Compatibility with gcc 4.4 and 4.5
- Compatibility with MPI-3 standard
- Fotran interfaces now use the correct types
- Several minor bugs
Version 1.0 (2013-12-11)
Added
- DLB Interface
- MPI Interception Interface
- User options configured by environment variables
LB_* - Lend When Idle algorithm implementation
- DPD
- Extrae instrumentation
- Paraver configs for DLB events
- Inter-Process Communication (IPC) Shared Memory
- Inter-Process Communication (IPC) Sockets
- Thread binding interface
LeWIpolicies to support SMPSS and OpenMPLeWI maskpolicy to support OmpSs- RaL and PERaL policies for redistribution of resources
- HWLOC optional dependency to query HW info
- Scheduling decisions based on HW locality
- Binary
dlb