Executing and controlling number of CPUs¶
Nanos6 applications can be compiled and executed in this way:
# Compile OmpSs-2 program with LLVM/Clang
$ clang -fompss-2 app.c -o app
# Execute on all available cores of the current session
$ ./app
The number of cores that are used is controlled by running the application through the taskset
command.
For instance:
# Execute on cores 0, 1, 2 and 4
$ taskset -c 0-2,4 ./app
Runtime configuration options¶
The behaviour of the Nanos6 runtime can be tuned after compilation by means of a configuration file.
All former Nanos6 environment variables are obsolete and will be ignored by the runtime system.
Currently, the supported configuration file format is TOML v1.0.0-rc1.
The default configuration file is named nanos6.toml
and can be found in the share
directory of the Nanos6 installation:
$INSTALLATION_PREFIX/share
Note
The nanos6.toml
file was recently moved from $INSTALLATION_PREFIX/share/doc/nanos6/scripts
to the
$INSTALLATION_PREFIX/share
directory
To override the default configuration, it is recommended to copy the default file and change the relevant options. The first configuration file found will be interpreted, according to the following order:
The file pointed by the
NANOS6_CONFIG
environment variable.The file
nanos6.toml
found in the current working directory.The file
nanos6.toml
found in the installation path (default file).
The configuration file is organized into different sections and subsections.
Option names have the format <section>[.<subsection>].<option_name>
.
For instance, the dependency system that the runtime should load is specified by the version.dependencies
option.
Inside the TOML configuration file, that option is placed inside the version
section as follows:
[version]
dependencies = "discrete"
Alternatively, if the configuration has to be changed programatically and creating new files is not practical, the configuration options can be overriden using the NANOS6_CONFIG_OVERRIDE
environment variable.
The content of this variable has to be in the format option1=value1,option2=value2,option3=value3,...
, providing a comma-separated list of assignations.
For example, you can run the following command to change the dependency implementation and use CTF instrumentation:
NANOS6_CONFIG_OVERRIDE="version.dependencies=discrete,version.instrument=ctf" ./ompss-program`
By default, the runtime system emits a warning during initialization when it detects the definition of irrelevant environment variables that start with the NANOS6
prefix.
The exceptions are the previous two variables NANOS6_CONFIG
and NANOS6_CONFIG_OVERRIDE
, but also the NANOS6_HOME
variable.
This latter is not relevant for the Nanos6 runtime system but it is often defined by users and it is relevant for the OmpSs-2 LLVM-based compiler.
The warning can be disabled by setting the loader.warn_envars
configuration option to false
.
Runtime variants¶
There are several Nanos6 runtime variants, each one focusing on different aspects of parallel executions: performance, debugging, instrumentation, etc.
OmpSs-2 applications do not require recompiling their code to run with instrumentation, e.g., to extract Extrae traces or to generate additional information.
This is instead controlled through configration options, at run-time.
Users can select a specific Nanos6 variant when running an application by setting the version.instrument
, version.debug
and version.dependencies
configuration variables.
This section explains the different values for these variables.
The instrumentation is specified by the version.instrument
configuration variable and can take the following values:
version.instrument = none
This is the default value and does not enable any kind of instrumentation. This is the variant that should be used when executing peformance experiments since it is the one that adds no instrumentation overhead. Continue reading this section for more information about performance runs with Nanos6.
version.instrument = ovni
Instrumented with ovni to generate Paraver traces. See Generating ovni traces.
version.instrument = extrae
Instrumented to produce Paraver traces. See: Generating Extrae traces.
version.instrument = ctf
Instrumented to produce CTF traces and convert them to the Paraver format. See: Generating CTF traces.
version.instrument = verbose
Instrumented to emit a log of the execution. See: Verbose instrumentation.
version.instrument = lint
Instrumented to support the OmpSs-2@Linter tool.
Note
The graph
and stats
instrumentations have been recently removed and are no longer available
By default, Nanos6 loads runtime variants that are compiled using high optimization flags and most of the internal assertions turned off.
This is the configuration that should be used along with the default none
instrumentation for benchmarking experiments.
However, these are not the only options that provide the best performance.
See Benchmarking for more information.
Sometimes it is useful to run an OmpSs-2 program with debug information for debugging purposes (e.g., when a program crashes).
The runtime system provides the option version.debug
to load a runtime variant that has been compiled without optimizations and with all internal assertions turned on.
The default value for this option is false
(no debug), but can be changed to true
to enable the debug information.
Please note that the runtime system will significantly decrease its performance when enabling this option.
Additionally, all instrumentation variants have their optimized and debug variants.
Finally, the last configuration variable used to specify a runtime variant is the version.dependencies
, which is explained in the next section.
Task data dependencies¶
The Nanos6 runtime has support for different dependency implementations.
The discrete
dependencies are the default dependency implementation.
This is the most optimized implementation but it does not fully support the OmpSs-2 dependency model since it does not support region dependencies.
In the case the user program requires region dependencies (e.g., to detect dependencies among partial overlapping dependency regions), Nanos6 privides the regions
implementation, which is completely spec-compliant.
This latter is also the only implementation that supports OmpSs-2@Cluster.
The dependency implementation can be selected at run-time through the version.dependencies
configuration variable. The available implementations are:
version.dependencies = "discrete"
Default and optimized implementation not supporting region dependencies. Region syntax is supported but will behave as a discrete dependency to the first address. Scales better than the default implementation thanks to its simpler logic and is functionally similar to traditional OpenMP model.
version.dependencies = "regions"
Supporting all dependency features. Default implementation in OmpSs-2@Cluster installations.
In cases where an OmpSs-2 program requires region dependency support, we recommended to add the declarative directive assert
in any of the program source files, as shown below.
Then, before the program is started, the runtime system will check whether the loaded dependency implementation is regions
and will abort the execution if it is not true.
#pragma oss assert("version.dependencies==regions")
int main() {
// ...
}
Notice that the assert
directive could also check whether the runtime is using discrete
dependencies.
Task scheduler¶
The scheduling infrastructure provides the following configuration options to modify the behavior of the task scheduler:
scheduler.policy
(default:fifo
)Specifies whether ready tasks are added to the ready queue using a FIFO (
fifo
) or a LIFO (lifo
) policy.scheduler.immediate_successor
(default:0.75
)Probability of enabling the immediate successor feature to improve cache data reutilization between successor tasks. If enabled, when a CPU finishes a task it starts executing the successor task (computed through their data dependencies).
scheduler.priority
(default:true
)Boolean indicating whether the scheduler should consider the task priorities defined by the user in the task’s priority clause.
Task worksharings¶
Important
Task worksharings, which were implemented by the task for
clause, are no longer part of the OmpSs-2 specification.
Stack size¶
By default, Nanos6 allocates stacks of 8 MB for its worker threads. In some codes this may not be enough. For instance, when converting Fortran codes, some global variables may need to be converted into local variables. This may increase substantially the amount of stack required to run the code and may surpass the space that is available.
To solve that problem, the stack size can be set through the misc.stack_size
configuration variable.
Its value is expressed in bytes but it also accepts the K
, M
, G
, T
and E
suffixes, that are interpreted as power of 2 multipliers.
For instance:
[misc]
stack_size = "16M"