2.3.3.3. Instrumentation modules
The instrumentation flag selects among all available instrumentation plug-ins which one to use.
An instrumentation plug-in generates output information which describes one or more aspects of the actual execution.
This module is in charge of translate internal Nanos++ events into the desired output: it may be just showing the event description in the screen (as soon as they occur), generates a trace file which can be processed by other software component, build a dependence graph (taking into account only dependence events), or just call an external service when a given type of event occurs.
In general the events are taking place inside the runtime but also they can be generated by the application’s programmers or introduced by the compiler (e.g. outlined functions created by Mercurium compiler).
The instrumentation plug-in can be specified by the NX_INSTRUMENTATION
environment variable or the --instrumentation
option included in the NX_ARGS
environment variable:
$export NX_INSTRUMENTATION=<module>
$./my_program
$export NX_ARGS="--instrumentation[ \|=]<module> ..."
$./my_program
--instrumentation=<module> |
| It sets the instrumentation backend to be used during the execution. Module can be one of the following options: empty , print_trace , extrae , graph , ayudame or tasksim tdg . |
By default, only a set of Nanos++ events are enabled (see Nanos++ events section for further details).
Additionally you can also use following options in order to specify the set of events that will be instrumented for a given execution.
--instrument-default=<mode> |
| Set the instrumentation event set by default. Where mode can be one of the following options: none disabling all events, or all enabling all events. Additionally we can choose one instrumentation profile: advanced , developer and user . Interval and punctual event table shows which of these events are generated when activating one of these levels. (Default value is user ). |
--instrument-enable=<event-type> |
| Add events to the current instrumentation event list. Where event-type can be one of the following options: state enabling all the state events, ptp enabling all the point-to-point events, or an specific event-prefix enabling those interval and/or punctual events whose name matches with event-prefix. Users can also combine several instrument-enable options in the same execution environment (e.g. NX_ARGS="... --instrument-enable=state --instrument-enable=for" ). |
--instrument-disable=<event-type> |
| Remove events to the current instrumentation event list. Where event-type can be one of the following options: state disabling all the state events, ptp disabling all the point-to-point events, or an specific event-prefix disabling those interval and/or punctual events whose name matches with event-prefix. Users can also combine several instrument-disable options in the same execution environment (e.g. NX_ARGS="... --instrument-disable=state --instrument-disable=for" ). |
--instrument-cpuid, --no-instrument-cpuid |
| Add cpuid event when binding is disabled (expensive). |
Nanos++ implements several instrumentation plug-ins.
The following sections will show a description of each of the available plug-ins with respect to that feature.
2.3.3.3.1. Plug-in description
2.3.3.3.2. Nanos++ events
2.3.3.3.2.1. State events
In Nanos++ programming model a thread can represent whether a POSIX thread or a runtime Work Descriptor according with the instrumentation model we were using.
## |
Event |
Description |
00 |
NANOS_NOT_CREATED |
Thread has not been created yet. |
01 |
NANOS_NOT_RUNNING |
Thread is executing nothing (e.g. a Work Descriptor which
is not in execution or sub state when state is enabled). |
02 |
NANOS_STARTUP |
Thread is executing runtime start up. |
03 |
NANOS_SHUTDOWN |
Thread is executing runtime shut down. |
04 |
NANOS_ERROR |
Error while instrumenting. |
05 |
NANOS_IDLE |
Thread is on idle loop. |
06 |
NANOS_RUNTIME |
Thread is executing generic runtime code (not defined in
other state). |
07 |
NANOS_RUNNING |
Thread is executing user code. Main thread executing
main(), or when executing an outlined user function.
This state is directly related with “user-code”
event which register Work Descriptor Id. |
08 |
NANOS_SYNCHRONIZATION |
Thread is executing a synchronization mechanism: team
barrier, task wait, synchronized condition (wait and
signal), wait on dependencies, set/unset/try locks
(through Nanos API), acquiring a single region,
waitOnCondition idle loop and waking up a blocked
Work Descriptor. |
09 |
NANOS_SCHEDULING |
Thread is calling a scheduler policy method, submitting
a new task or a Work Descriptor is yielding thread to
a new one. |
10 |
NANOS_CREATION |
Thread is creating a new Work Descriptor or setting
translate function. |
11 |
NANOS_MEM_TRANSFER_IN |
Thread is copying data to cache. |
2.3.3.3.2.2. Point to Point events
## |
Event |
Description |
0 |
NANOS_WD_DOMAIN |
WD’s Identifier |
1 |
NANOS_WD_DEPENDENCY |
Data Dependency |
2 |
NANOS_WAIT |
Hierarchical Dependency (e.g “omp taskwait”) |
3 |
NANOS_WD_REMOTE |
Remote Workdescriptor Execution (among nodes) |
4 |
NANOS_XFER_PUT |
Transfer PUT (Data Send) |
5 |
NANOS_XFER_GET |
Transfer GET (Data Request) |
Only in cluster branch:
## |
Event |
Description |
6 |
NANOS_AM_WORK |
Sending WDs (master -> slave) |
7 |
NANOS_AM_WORK_DONE |
WD is done (slave->master) |
2.3.3.3.2.3. Interval/Punctual events
### |
Event |
Description |
ADV |
DEV |
USR |
001 |
api |
Nanos Runtime API (interval) |
X |
X |
|
002 |
wd-id |
Work Descriptor id (interval) |
X |
X |
|
003 |
cache-copy-in |
Transfer data into device cache (interval) |
|
|
|
004 |
cache-copy-out |
Transfer data to main memory (interval) |
|
|
|
005 |
cache-local-copy |
Local copy in device memory (interval) |
|
|
|
006 |
cache-malloc |
Memory allocation in device cache (interval) |
|
|
|
007 |
cache-free |
Memory free in device cache (interval) |
|
|
|
008 |
cache-hit |
Hit in the cache (interval) |
|
|
|
009 |
copy-in |
Copying WD inputs (interval) |
|
|
|
010 |
copy-out |
Copying WD outputs (interval) |
|
|
|
011 |
user-funct-name |
User Function Name (interval) |
X |
X |
X |
012 |
user-code |
User Code (wd) (interval) |
|
|
|
013 |
create-wd-id |
Create WD Id: (punctual) |
|
|
|
014 |
create-wd-ptr |
Create WD pointer: (punctual) |
X |
|
|
015 |
wd-num-deps |
Create WD num. deps. (punctual) |
X |
|
|
016 |
wd-deps-ptr |
Create WD dependence pointer (punctual) |
X |
|
|
017 |
lock-addr |
Lock address (interval) |
X |
|
|
018 |
num-spins |
Number of Spins (punctual) |
X |
X |
|
019 |
num-yields |
Number of Yields (punctual) |
X |
X |
|
020 |
time-yields |
Time on Yield (in nsecs) (punctual) |
X |
X |
|
021 |
user-funct-location |
User Function Location (interval) |
X |
X |
X |
022 |
num-ready |
Number of ready tasks in the queues (punctual) |
X |
X |
X |
023 |
graph-size |
Number tasks in the graph (punctual) |
X |
X |
X |
024 |
loop-lower |
Loop lower bound (punctual) |
X |
X |
|
025 |
loop-upper |
Loop upper (punctual) |
X |
X |
|
### |
Event |
Description |
ADV |
DEV |
USR |
026 |
loop-step |
Loop step (punctual) |
X |
X |
|
027 |
in-cuda-runtime |
Inside CUDA runtime (interval) |
X |
X |
|
028 |
xfer-size |
Transfer size (punctual) |
X |
X |
|
029 |
cache-wait |
Cache waiting for something (interval) |
|
|
|
030 |
chunk-size |
Chunk size (punctual) |
X |
X |
|
031 |
num-sleeps |
Number of sleeps (punctual) |
X |
X |
|
032 |
time-sleeps |
Time on Sleep (in nsecs) (punctual) |
X |
X |
|
033 |
num-scheds |
Number of scheduler operations (punctual) |
X |
X |
|
034 |
time-scheds |
Time in scheduler operations (in nsecs) (punctual) |
X |
X |
|
035 |
sched-versioning |
Versioning scheduler decisions (punctual) |
X |
|
|
036 |
dependence |
Dependence analysis, system have found a new dependence (punctual) |
X |
X |
|
037 |
dep-direction |
Dependence direction (punctual) |
X |
X |
|
038 |
wd-priority |
WD’s priority (punctual) |
X |
X |
|
039 |
in-opencl-runtime |
In OpenCL runtime (interval) |
X |
X |
|
040 |
taskwait |
Taskwait (interval) |
X |
X |
X |
041 |
set-num-threads |
Set/change number of threads (punctual) |
X |
X |
X |
042 |
cpuid |
CPU id (punctual) - Xdisabled by default |
X |
X |
X |
043 |
dep-address |
Dependence address (punctual) |
X |
X |
|
044 |
copy-data-in |
Work descriptor id that is copying data in |
X |
X |
|
045 |
cache-copy-dada-in |
Work descriptor id that is copying data in |
X |
X |
|
046 |
cache-copy-data-out |
Work descriptor id that is copying data out |
X |
X |
|
047 |
sched-affinity-const |
Constraint used in affinity scheduler |
X |
X |
|
048 |
in-mpi-runtime |
Inside MPI runtime |
X |
X |
|
049 |
wd-ready |
Workdescriptor becomes ready |
X |
|
|
050 |
wd-blocked |
Workdescriptor becomes blocked |
X |
|
|
### |
Event |
Description |
ADV |
DEV |
USR |
051 |
parallel-outline-fct |
Parallel outline function |
X |
|
|
052 |
async-thread |
Asynchronous thread state events |
X |
X |
|
053 |
copy-in-gpu |
Asynchronous memory copy from host to device |
|
|
|
054 |
copy-out-gpu |
Asynchronous memory copy from device to host |
|
|
|
055 |
gpu-wd-id |
GPU’s work descriptor id |
X |
X |
|
056 |
wd-criticality |
Work descriptor criticality |
X |
X |
|
057 |
blev-overheads |
Total overheads of botlev scheduler |
X |
X |
|
058 |
blev-overheads-break |
Overheads of botlev scheduler |
X |
X |
|
059 |
critical-wd-id |
A critical work descriptor is submitted |
X |
X |
|
060 |
copy-dir-devices |
Asynchronous memory copy between host and device |
X |
X |
X |
061 |
concurrent-task |
Number of concurrent task in the ready queue |
X |
X |
|
062 |
network-transfer |
Network transfer to node |
X |
X |
|
064 |
thread-numa-node |
NUMA node of the worker thread |
X |
X |
|
065 |
wd-numa-node |
NUMA node assigned to the work descriptor |
X |
X |
|
066 |
steal |
If the work descriptor to be executed is the result of a steal operation |
X |
|
|