2.5.3. How to execute hybrid (MPI+OmpSs) programs¶
You have an MPI source code annotated with OmpSs directives and you wonder how to compile and execute it. There are some additional steps in order to obtain a Paraver trace file.
2.5.3.1. Compilation¶
The idea here is to compile and link the source code with the Mercurium compiler
adding any library, include file and flags that are normally used by the MPI compilers.
Usually MPI compilers offer some option to get all the information needed. For example,
OpenMPI compiler offers the flags -showme:compile
and -showme:link
(others
have -compile-info
and -link-info
, consult your specific compiler
documentation). To compile your application you should do something like:
$ mcc --ompss $(mpicc -showme:compile) -c hello.c
$ mcc --ompss $(mpicc -showme:link) hello.o -o hello
If you need instrumentation just remember to include the --instrumentation
flag.
2.5.3.2. Execution¶
Just use the usual commands to execute an MPI application (consult your specific documentation). Regarding OmpSs the only requirements are that you export all needed variables in each MPI node. For example: To execute the previous binary with 2 MPI nodes and 3 threads per node using a SLURM queue management:
#@ job_name = hello
#@ total_tasks = 2
#@ wall_clock_limit = 00:20:00
#@ tasks_per_node = 1
#@ cpus_per_task = 3
export NX_ARGS="--pes 3"
srun ./hello
When running more than one process per node, it is recommended to double-check the
CPUs used by each OmpSs process. Job schedulers usually place MPI processes into
different CPU sets so they don’t overlap, but mpirun does not by default. You can
get which CPUs are being used by each process by adding the flag --summary
to
the NX_ARGS environment variable:
$ NX_ARGS="--summary" mpirun -n 4 --cpus-per-proc 4 ./a.out 2>&1 | grep CPUs
MSG: [?] === Active CPUs: [ 4, 5, 6, 7, ]
MSG: [?] === Active CPUs: [ 8, 9, 10, 11, ]
MSG: [?] === Active CPUs: [ 12, 13, 14, 15, ]
MSG: [?] === Active CPUs: [ 0, 1, 2, 3, ]
2.5.3.3. Instrumentation¶
The required steps to instrument both levels of parallelism are very similar to the process described here: Extrae. The only exception is that we have to preload the Extrae MPI library to intercept the MPI events.
When an MPI application is launched through an MPI driver (mpirun) or a Job Scheduler specific command (srun) it is not clear whether the environment variables are exported, or may be you need to define a variable after the MPI launcher has already done the setup but just before your binary starts. This is why we consider a good practice to define the environment variables we wish to set into a shell script file, which each spawned process will run. For instance, we create a script file trace.sh with execution permission that contains:
#!/bin/bash
### Other options ###
export NX_ARGS= ...
#####################
export NX_INSTRUMENTATION=extrae
export EXTRAE_CONFIG_FILE=extrae.xml
export LD_PRELOAD=$EXTRAE_HOME/lib/libnanosmpitrace.so # or libnanosmpitracef.so for Fortran
$*
We then run our MPI application like this:
$ mpirun --some-flags ./trace.sh ./hello_instr
If the extrae.xml
was configured as recommended in the Extrae
section, a hybrid MPI+OmpSs Paraver trace will be automatically merged after the
execution is ended. Otherwise just run:
$ mpi2prv -f TRACE.mpits