2.5.3. How to execute hybrid (MPI+OmpSs) programs¶

You have an MPI source code annotated with OmpSs directives and you wonder how to compile and execute it. There are some additional steps in order to obtain a Paraver trace file.

2.5.3.1. Compilation¶

The idea here is to compile and link the source code with the Mercurium compiler adding any library, include file and flags that are normally used by the MPI compilers. Usually MPI compilers offer some option to get all the information needed. For example, OpenMPI compiler offers the flags -showme:compile and -showme:link (others have -compile-info and -link-info, consult your specific compiler documentation). To compile your application you should do something like:

$ mcc --ompss $(mpicc -showme:compile) -c hello.c
$ mcc --ompss $(mpicc -showme:link) hello.o -o hello

If you need instrumentation just remember to include the --instrumentation flag.

2.5.3.2. Execution¶

Just use the usual commands to execute an MPI application (consult your specific documentation). Regarding OmpSs the only requirements are that you export all needed variables in each MPI node. For example: To execute the previous binary with 2 MPI nodes and 3 threads per node using a SLURM queue management:

#@ job_name = hello
#@ total_tasks = 2
#@ wall_clock_limit = 00:20:00
#@ tasks_per_node = 1
#@ cpus_per_task = 3

export NX_ARGS="--pes 3"
srun ./hello

When running more than one process per node, it is recommended to double-check the CPUs used by each OmpSs process. Job schedulers usually place MPI processes into different CPU sets so they don’t overlap, but mpirun does not by default. You can get which CPUs are being used by each process by adding the flag --summary to the NX_ARGS environment variable:

$ NX_ARGS="--summary" mpirun -n 4 --cpus-per-proc 4 ./a.out 2>&1 | grep CPUs
MSG: [?] === Active CPUs:    [ 4, 5, 6, 7, ]
MSG: [?] === Active CPUs:    [ 8, 9, 10, 11, ]
MSG: [?] === Active CPUs:    [ 12, 13, 14, 15, ]
MSG: [?] === Active CPUs:    [ 0, 1, 2, 3, ]

2.5.3.3. Instrumentation¶

The required steps to instrument both levels of parallelism are very similar to the process described here: Extrae. The only exception is that we have to preload the Extrae MPI library to intercept the MPI events.

When an MPI application is launched through an MPI driver (mpirun) or a Job Scheduler specific command (srun) it is not clear whether the environment variables are exported, or may be you need to define a variable after the MPI launcher has already done the setup but just before your binary starts. This is why we consider a good practice to define the environment variables we wish to set into a shell script file, which each spawned process will run. For instance, we create a script file trace.sh with execution permission that contains:

#!/bin/bash
### Other options ###
export NX_ARGS= ...
#####################
export NX_INSTRUMENTATION=extrae
export EXTRAE_CONFIG_FILE=extrae.xml
export LD_PRELOAD=$EXTRAE_HOME/lib/libnanosmpitrace.so  # or libnanosmpitracef.so for Fortran
$*

We then run our MPI application like this:

$ mpirun --some-flags ./trace.sh ./hello_instr

If the extrae.xml was configured as recommended in the Extrae section, a hybrid MPI+OmpSs Paraver trace will be automatically merged after the execution is ended. Otherwise just run:

$ mpi2prv -f TRACE.mpits