3.3.3.1.2. Extrae¶
- Configuration String:
NX_INSTRUMENTATION=extrae
orNX_ARGS="--instrumentation=extrae"
- Description: It generates a Paraver trace using Extrae library
Paraver is a flexible parallel program visualization and analysis tool based on an easy-to-use Motif GUI. Paraver was developed responding to the need of having a qualitative global perception of the application behavior by visual inspection and then to be able to focus on the detailed quantitative analysis of the problems. Paraver provides a large amount of information useful to decide the points on which to invest the programming effort to optimize an application.
Before getting any trace we will need to compile our application with instrumentation support. Compile your application using Mercurium with the --instrument
flag:
$ mcc --ompss --instrument my_program.c -o my_program
In order to get a Paraver’s Trace you will need to execute your OmpSs program with the instrumentation plugin Extrae using the environment variable NX_INSTRUMENTATION. That will enable the Extrae plugin which translates all internal events into a Paraver events using the Extrae library. Extrae is completely configured through a XML file that is set through the EXTRAE_CONFIG_FILE environment variable:
$ export NX_INSTRUMENTATION=extrae
$ export EXTRAE_CONFIG_FILE=config.xml
$ .\my_program
Important
Extrae library installation includes several XML file examples to serve as a basis for the end user (see $EXTRAE_DIR/share/example). Check your Extrae installation to get one of them and devote some time in tuning the parameters. This file is divided in diferent sections in which you can enable and disable some Extrae features.
3.3.3.1.2.1. XML’s merge section enabled (recommended)¶
If the merge section is enabled the merge process will be automatically invoked
after the application run. You can enable or disable this Extrae feature by
just setting the enabled flag to yes/no. The value of this XML node (in between
<merge>
and </merge>
tags) is used as the tracefile name (i.e.
my_program_trace.prv in the following example). You can check other flags
meanings in the Extrae User’s Guide:
<merge enabled="yes"
synchronization="default"
binary="my_program"
tree-fan-out="16"
max-memory="512"
joint-states="yes"
keep-mpits="no"
sort-addresses="yes"
remove-files="yes"
>
my_program_trace.prv
</merge>
3.3.3.1.2.2. XML’s merge section disabled¶
If the merge section is disabled (or you are not using the XML configuration file), once you have run your instrumented program Extrae library will produce several files containing all the information related with the execution. There will be as many .mpit files as tasks and threads where running the target application. Each file contains information gathered by the specified task/thread in raw binary format. A single .mpits file that contain a list of related .mpit files and, if the DynInst based instrumentation package was used, an addition .sym file that contains some symbolic information gathered by the DynInst library.
In order to use Paraver, those intermediate files (i.e., .mpit files) must be merged and translated into a Paraver trace file format. To proceed with any of these translation all the intermediate trace files must be merged into a single trace file using one of the available mergers in the Extrae bin directory.
3.3.3.1.2.3. Using Hardware Counters (PAPI)¶
In order to get a Paraver trace files including hardware counters (HWC) we need to verify that our Extrae library has been installed with PAPI support. Once we have verified this we can turn on the generation of PAPI hardware counter events through the XML Extrae configuration file section: counters. Here is an example of the counters section in the XML configuration file:
<counters enabled="yes">
<cpu enabled="yes" starting-set-distribution="1">
<set enabled="yes" domain="all" changeat-time="1s">
PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_L1_DCM
<sampling enabled="yes" period="100000000">PAPI_TOT_CYC</sampling>
</set>
<set enabled="yes" domain="all" changeat-time="1s">
PAPI_TOT_INS,PAPI_TOT_CYC,PAPI_FP_INS
</set>
</cpu>
<network enabled="no" />
<resource-usage enabled="no" />
</counters>
Processor performance counters are configured in the <cpu> nodes. The user can configure many sets in the <cpu> node using the <set> node, but just one set will be active at any given time. In the example two sets are defined. First set will read PAPI_TOT_INS (total instructions), PAPI_TOT_CYC (total cycles) and PAPI_L1_DCM (1st level cache misses). Second set is configured to obtain PAPI_TOT_INS (total instructions), PAPI_TOT_CYC (total cycles) and PAPI_FP_INS (floating point instructions). You can get a list of PAPI Standard Events By Architecture here. In order to change the active set you can use the changeat-time attribute specifying the minimum time to hold the set (i.e. changeat-time=”1s” in the example).
See also the list of PAPI Standard Events by Architecture and the Manuals of the BSC Performance Tools.