3.3.3. Instrumentation modules

The instrumentation flag selects among all available instrumentation plug-ins which one to use. An instrumentation plug-in generates output information which describes one or more aspects of the actual execution. This module is in charge of translate internal Nanos++ events into the desired output: it may be just showing the event description in the screen (as soon as they occur), generates a trace file which can be processed by other software component, build a dependence graph (taking into account only dependence events), or just call an external service when a given type of event occurs. In general the events are taking place inside the runtime but also they can be generated by the application’s programmers or introduced by the compiler (e.g. outlined functions created by Mercurium compiler).

The instrumentation plug-in can be specified by the NX_INSTRUMENTATION environment variable or the --instrumentation option included in the NX_ARGS environment variable:

$export NX_INSTRUMENTATION=<module>
$./my_program

$export NX_ARGS="--instrumentation[ \|=]<module> ..."
$./my_program
--instrumentation=<module>
 It sets the instrumentation backend to be used during the execution. Module can be one of the following options: empty, print_trace, extrae, graph, ayudame or tasksim tdg.

By default, only a set of Nanos++ events are enabled (see Nanos++ events section for further details). Additionally you can also use following options in order to specify the set of events that will be instrumented for a given execution.

--instrument-default=<mode>
 Set the instrumentation event set by default. Where mode can be one of the following options: none disabling all events, or all enabling all events. Additionally we can choose one instrumentation profile: advanced, developer and user. Interval and punctual event table shows which of these events are generated when activating one of these levels. (Default value is user).
--instrument-enable=<event-type>
 Add events to the current instrumentation event list. Where event-type can be one of the following options: state enabling all the state events, ptp enabling all the point-to-point events, or an specific event-prefix enabling those interval and/or punctual events whose name matches with event-prefix. Users can also combine several instrument-enable options in the same execution environment (e.g. NX_ARGS="... --instrument-enable=state --instrument-enable=for").
--instrument-disable=<event-type>
 Remove events to the current instrumentation event list. Where event-type can be one of the following options: state disabling all the state events, ptp disabling all the point-to-point events, or an specific event-prefix disabling those interval and/or punctual events whose name matches with event-prefix. Users can also combine several instrument-disable options in the same execution environment (e.g. NX_ARGS="... --instrument-disable=state --instrument-disable=for").
--instrument-cpuid, --no-instrument-cpuid
 Add cpuid event when binding is disabled (expensive).

Nanos++ implements several instrumentation plug-ins. The following sections will show a description of each of the available plug-ins with respect to that feature.

3.3.3.2. Nanos++ events

3.3.3.2.1. State events

In Nanos++ programming model a thread can represent whether a POSIX thread or a runtime Work Descriptor according with the instrumentation model we were using.

## Event Description
00 NANOS_NOT_CREATED Thread has not been created yet.
01 NANOS_NOT_RUNNING Thread is executing nothing (e.g. a Work Descriptor which is not in execution or sub state when state is enabled).
02 NANOS_STARTUP Thread is executing runtime start up.
03 NANOS_SHUTDOWN Thread is executing runtime shut down.
04 NANOS_ERROR Error while instrumenting.
05 NANOS_IDLE Thread is on idle loop.
06 NANOS_RUNTIME Thread is executing generic runtime code (not defined in other state).
07 NANOS_RUNNING Thread is executing user code. Main thread executing main(), or when executing an outlined user function. This state is directly related with “user-code” event which register Work Descriptor Id.
08 NANOS_SYNCHRONIZATION Thread is executing a synchronization mechanism: team barrier, task wait, synchronized condition (wait and signal), wait on dependencies, set/unset/try locks (through Nanos API), acquiring a single region, waitOnCondition idle loop and waking up a blocked Work Descriptor.
09 NANOS_SCHEDULING Thread is calling a scheduler policy method, submitting a new task or a Work Descriptor is yielding thread to a new one.
10 NANOS_CREATION Thread is creating a new Work Descriptor or setting translate function.
11 NANOS_MEM_TRANSFER_IN Thread is copying data to cache.

3.3.3.2.2. Point to Point events

## Event Description
0 NANOS_WD_DOMAIN WD’s Identifier
1 NANOS_WD_DEPENDENCY Data Dependency
2 NANOS_WAIT Hierarchical Dependency (e.g “omp taskwait”)
3 NANOS_WD_REMOTE Remote Workdescriptor Execution (among nodes)
4 NANOS_XFER_PUT Transfer PUT (Data Send)
5 NANOS_XFER_GET Transfer GET (Data Request)

Only in cluster branch:

## Event Description
6 NANOS_AM_WORK Sending WDs (master -> slave)
7 NANOS_AM_WORK_DONE WD is done (slave->master)

3.3.3.2.3. Interval/Punctual events

### Event Description ADV DEV USR
001 api Nanos Runtime API (interval) X X  
002 wd-id Work Descriptor id (interval) X X  
003 cache-copy-in Transfer data into device cache (interval)      
004 cache-copy-out Transfer data to main memory (interval)      
005 cache-local-copy Local copy in device memory (interval)      
006 cache-malloc Memory allocation in device cache (interval)      
007 cache-free Memory free in device cache (interval)      
008 cache-hit Hit in the cache (interval)      
009 copy-in Copying WD inputs (interval)      
010 copy-out Copying WD outputs (interval)      
011 user-funct-name User Function Name (interval) X X X
012 user-code User Code (wd) (interval)      
013 create-wd-id Create WD Id: (punctual)      
014 create-wd-ptr Create WD pointer: (punctual) X    
015 wd-num-deps Create WD num. deps. (punctual) X    
016 wd-deps-ptr Create WD dependence pointer (punctual) X    
017 lock-addr Lock address (interval) X    
018 num-spins Number of Spins (punctual) X X  
019 num-yields Number of Yields (punctual) X X  
020 time-yields Time on Yield (in nsecs) (punctual) X X  
021 user-funct-location User Function Location (interval) X X X
022 num-ready Number of ready tasks in the queues (punctual) X X X
023 graph-size Number tasks in the graph (punctual) X X X
024 loop-lower Loop lower bound (punctual) X X  
025 loop-upper Loop upper (punctual) X X  
### Event Description ADV DEV USR
026 loop-step Loop step (punctual) X X  
027 in-cuda-runtime Inside CUDA runtime (interval) X X  
028 xfer-size Transfer size (punctual) X X  
029 cache-wait Cache waiting for something (interval)      
030 chunk-size Chunk size (punctual) X X  
031 num-sleeps Number of sleeps (punctual) X X  
032 time-sleeps Time on Sleep (in nsecs) (punctual) X X  
033 num-scheds Number of scheduler operations (punctual) X X  
034 time-scheds Time in scheduler operations (in nsecs) (punctual) X X  
035 sched-versioning Versioning scheduler decisions (punctual) X    
036 dependence Dependence analysis, system have found a new dependence (punctual) X X  
037 dep-direction Dependence direction (punctual) X X  
038 wd-priority WD’s priority (punctual) X X  
039 in-opencl-runtime In OpenCL runtime (interval) X X  
040 taskwait Taskwait (interval) X X X
041 set-num-threads Set/change number of threads (punctual) X X X
042 cpuid CPU id (punctual) - Xdisabled by default X X X
043 dep-address Dependence address (punctual) X X  
044 copy-data-in Work descriptor id that is copying data in X X  
045 cache-copy-dada-in Work descriptor id that is copying data in X X  
046 cache-copy-data-out Work descriptor id that is copying data out X X  
047 sched-affinity-const Constraint used in affinity scheduler X X  
048 in-mpi-runtime Inside MPI runtime X X  
049 wd-ready Workdescriptor becomes ready X    
050 wd-blocked Workdescriptor becomes blocked X    
### Event Description ADV DEV USR
051 parallel-outline-fct Parallel outline function X    
052 async-thread Asynchronous thread state events X X  
053 copy-in-gpu Asynchronous memory copy from host to device      
054 copy-out-gpu Asynchronous memory copy from device to host      
055 gpu-wd-id GPU’s work descriptor id X X  
056 wd-criticality Work descriptor criticality X X  
057 blev-overheads Total overheads of botlev scheduler X X  
058 blev-overheads-break Overheads of botlev scheduler X X  
059 critical-wd-id A critical work descriptor is submitted X X  
060 copy-dir-devices Asynchronous memory copy between host and device X X X
061 concurrent-task Number of concurrent task in the ready queue X X  
062 network-transfer Network transfer to node X X  
064 thread-numa-node NUMA node of the worker thread X X  
065 wd-numa-node NUMA node assigned to the work descriptor X X  
066 steal If the work descriptor to be executed is the result of a steal operation X