3.10.1. Benchmarking¶
Nanos6 loads an optimized runtime variant if the version.debug
option is disabled (by default is disabled).
The optimized runtime variants are compiled with high optimization flags, they do not perform validity checks and they do not provide debug information.
The benchmarking of OmpSs-2 applications should be performed with an optimized version (version.debug=false
) and no instrumentation (version.instrument=none
) to get optimal perfomance.
Additionally, Nanos6 offers an extra performance option called Turbo, which enables even more aggressive optimizations.
This option is disabled by default but it can be enabled by setting turbo.enabled=true
.
Currently, this option enables two Intel® floating-point (FP) unit optimizations in all tasks: flush-to-zero (FZ) and denormals are zero (DAZ).
However, please note these FP optimizations could alter the precision of floating-point computations.
We recommend to combine a optimized version without instrumentation (version.debug=false
and version.instrument=none
) with the Turbo feature (turbo.enabled=true
), the discrete dependency implementation (version.dependencies=discrete
) and the Jemalloc memory allocator to obtain the best performance.
Jemalloc must be enabled at configuration time by passing --with-jemalloc
to the Nanos6 configure script.
See the Nanos6 building requirements to check which are the requirements for supporting jemalloc.