.. index:: double: instrumentation; plugins Instrumentation modules ======================= The instrumentation flag selects among all available instrumentation plug-ins which one to use. An instrumentation plug-in generates output information which describes one or more aspects of the actual execution. This module is in charge of translate internal Nanos++ events into the desired output: it may be just showing the event description in the screen (as soon as they occur), generates a trace file which can be processed by other software component, build a dependence graph (taking into account only dependence events), or just call an external service when a given type of event occurs. In general the events are taking place inside the runtime but also they can be generated by the application's programmers or introduced by the compiler (e.g. outlined functions created by Mercurium compiler). .. highlight:: bash The instrumentation plug-in can be specified by the ``NX_INSTRUMENTATION`` environment variable or the ``--instrumentation`` option included in the ``NX_ARGS`` environment variable:: $export NX_INSTRUMENTATION= $./my_program $export NX_ARGS="--instrumentation[ \|=] ..." $./my_program --instrumentation= It sets the instrumentation backend to be used during the execution. Module can be one of the following options: ``empty``, ``print_trace``, ``extrae``, ``graph``, ``ayudame`` or ``tasksim`` ``tdg``. By default, only a set of Nanos++ events are enabled (see Nanos++ events section for further details). Additionally you can also use following options in order to specify the set of events that will be instrumented for a given execution. --instrument-default= Set the instrumentation event set by default. Where ``mode`` can be one of the following options: ``none`` disabling all events, or ``all`` enabling all events. Additionally we can choose one instrumentation profile: ``advanced``, ``developer`` and ``user``. Interval and punctual event table shows which of these events are generated when activating one of these levels. (Default value is ``user``). --instrument-enable= Add events to the current instrumentation event list. Where ``event-type`` can be one of the following options: ``state`` enabling all the **state** events, ``ptp`` enabling all the **point-to-point** events, or an specific *event-prefix* enabling those **interval** and/or **punctual** events whose name matches with *event-prefix*. Users can also combine several instrument-enable options in the same execution environment (e.g. ``NX_ARGS="... --instrument-enable=state --instrument-enable=for"``). --instrument-disable= Remove events to the current instrumentation event list. Where ``event-type`` can be one of the following options: ``state`` disabling all the **state** events, ``ptp`` disabling all the **point-to-point** events, or an specific *event-prefix* disabling those **interval** and/or **punctual** events whose name matches with *event-prefix*. Users can also combine several instrument-disable options in the same execution environment (e.g. ``NX_ARGS="... --instrument-disable=state --instrument-disable=for"``). --instrument-cpuid, --no-instrument-cpuid Add cpuid event when binding is disabled (expensive). Nanos++ implements several instrumentation plug-ins. The following sections will show a description of each of the available plug-ins with respect to that feature. Plug-in description ------------------- .. toctree:: :maxdepth: 1 run-programs-plugin-instrument-empty run-programs-plugin-instrument-extrae run-programs-plugin-instrument-tasksim run-programs-plugin-instrument-ayudame run-programs-plugin-instrument-tdg .. index:: single: Events; Nanos State Events Nanos++ events -------------- State events """""""""""" In Nanos++ programming model a thread can represent whether a POSIX thread or a runtime Work Descriptor according with the instrumentation model we were using. .. tabularcolumns:: |l|l|J| +----+--------------------------------------+-----------------------------------------------------------+ | ## | Event | Description | +====+======================================+===========================================================+ | 00 | NANOS_NOT_CREATED | Thread has not been created yet. | +----+--------------------------------------+-----------------------------------------------------------+ | 01 | NANOS_NOT_RUNNING | Thread is executing nothing (e.g. a Work Descriptor which | | | | is not in execution or sub state when state is enabled). | +----+--------------------------------------+-----------------------------------------------------------+ | 02 | NANOS_STARTUP | Thread is executing runtime start up. | +----+--------------------------------------+-----------------------------------------------------------+ | 03 | NANOS_SHUTDOWN | Thread is executing runtime shut down. | +----+--------------------------------------+-----------------------------------------------------------+ | 04 | NANOS_ERROR | Error while instrumenting. | +----+--------------------------------------+-----------------------------------------------------------+ | 05 | NANOS_IDLE | Thread is on idle loop. | +----+--------------------------------------+-----------------------------------------------------------+ | 06 | NANOS_RUNTIME | Thread is executing generic runtime code (not defined in | | | | other state). | +----+--------------------------------------+-----------------------------------------------------------+ | 07 | NANOS_RUNNING | Thread is executing user code. Main thread executing | | | | main(), or when executing an outlined user function. | | | | This state is directly related with "user-code" | | | | event which register Work Descriptor Id. | +----+--------------------------------------+-----------------------------------------------------------+ | 08 | NANOS_SYNCHRONIZATION | Thread is executing a synchronization mechanism: team | | | | barrier, task wait, synchronized condition (wait and | | | | signal), wait on dependencies, set/unset/try locks | | | | (through Nanos API), acquiring a single region, | | | | waitOnCondition idle loop and waking up a blocked | | | | Work Descriptor. | +----+--------------------------------------+-----------------------------------------------------------+ | 09 | NANOS_SCHEDULING | Thread is calling a scheduler policy method, submitting | | | | a new task or a Work Descriptor is yielding thread to | | | | a new one. | +----+--------------------------------------+-----------------------------------------------------------+ | 10 | NANOS_CREATION | Thread is creating a new Work Descriptor or setting | | | | translate function. | +----+--------------------------------------+-----------------------------------------------------------+ | 11 | NANOS_MEM_TRANSFER_IN | Thread is copying data to cache. | +----+--------------------------------------+-----------------------------------------------------------+ .. index:: triple: traces; Paraver; Point-to-point events Point to Point events """"""""""""""""""""" .. tabularcolumns:: |l|l|J| +----+--------------------------------------+-----------------------------------------------------------+ | ## | Event | Description | +====+======================================+===========================================================+ | 0 | NANOS_WD_DOMAIN | WD's Identifier | +----+--------------------------------------+-----------------------------------------------------------+ | 1 | NANOS_WD_DEPENDENCY | Data Dependency | +----+--------------------------------------+-----------------------------------------------------------+ | 2 | NANOS_WAIT | Hierarchical Dependency (e.g "omp taskwait") | +----+--------------------------------------+-----------------------------------------------------------+ | 3 | NANOS_WD_REMOTE | Remote Workdescriptor Execution (among nodes) | +----+--------------------------------------+-----------------------------------------------------------+ | 4 | NANOS_XFER_PUT | Transfer PUT (Data Send) | +----+--------------------------------------+-----------------------------------------------------------+ | 5 | NANOS_XFER_GET | Transfer GET (Data Request) | +----+--------------------------------------+-----------------------------------------------------------+ .. index:: triple: traces; Paraver; Point-to-point events (cluster) double: cluster; Point-to-point events Only in cluster branch: .. tabularcolumns:: |l|l|J| +----+--------------------------------------+-----------------------------------------------------------+ | ## | Event | Description | +====+======================================+===========================================================+ | 6 | NANOS_AM_WORK | Sending WDs (master -> slave) | +----+--------------------------------------+-----------------------------------------------------------+ | 7 | NANOS_AM_WORK_DONE | WD is done (slave->master) | +----+--------------------------------------+-----------------------------------------------------------+ .. index:: single: punctual events Interval/Punctual events """""""""""""""""""""""" .. tabularcolumns:: |l|l|J|l|l|l| +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | ### | Event | Description | ADV | DEV | USR | +=====+======================+===================================================================================+=====+=====+=====+ | 001 | api | Nanos Runtime API (interval) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 002 | wd-id | Work Descriptor id (interval) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 003 | cache-copy-in | Transfer data into device cache (interval) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 004 | cache-copy-out | Transfer data to main memory (interval) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 005 | cache-local-copy | Local copy in device memory (interval) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 006 | cache-malloc | Memory allocation in device cache (interval) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 007 | cache-free | Memory free in device cache (interval) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 008 | cache-hit | Hit in the cache (interval) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 009 | copy-in | Copying WD inputs (interval) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 010 | copy-out | Copying WD outputs (interval) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 011 | user-funct-name | User Function Name (interval) | X | X | X | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 012 | user-code | User Code (wd) (interval) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 013 | create-wd-id | Create WD Id: (punctual) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 014 | create-wd-ptr | Create WD pointer: (punctual) | X | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 015 | wd-num-deps | Create WD num. deps. (punctual) | X | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 016 | wd-deps-ptr | Create WD dependence pointer (punctual) | X | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 017 | lock-addr | Lock address (interval) | X | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 018 | num-spins | Number of Spins (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 019 | num-yields | Number of Yields (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 020 | time-yields | Time on Yield (in nsecs) (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 021 | user-funct-location | User Function Location (interval) | X | X | X | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 022 | num-ready | Number of ready tasks in the queues (punctual) | X | X | X | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 023 | graph-size | Number tasks in the graph (punctual) | X | X | X | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 024 | loop-lower | Loop lower bound (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 025 | loop-upper | Loop upper (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ .. tabularcolumns:: |l|l|J|l|l|l| +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | ### | Event | Description | ADV | DEV | USR | +=====+======================+===================================================================================+=====+=====+=====+ | 026 | loop-step | Loop step (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 027 | in-cuda-runtime | Inside CUDA runtime (interval) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 028 | xfer-size | Transfer size (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 029 | cache-wait | Cache waiting for something (interval) | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 030 | chunk-size | Chunk size (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 031 | num-sleeps | Number of sleeps (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 032 | time-sleeps | Time on Sleep (in nsecs) (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 033 | num-scheds | Number of scheduler operations (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 034 | time-scheds | Time in scheduler operations (in nsecs) (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 035 | sched-versioning | Versioning scheduler decisions (punctual) | X | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 036 | dependence | Dependence analysis, system have found a new dependence (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 037 | dep-direction | Dependence direction (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 038 | wd-priority | WD's priority (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 039 | in-opencl-runtime | In OpenCL runtime (interval) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 040 | taskwait | Taskwait (interval) | X | X | X | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 041 | set-num-threads | Set/change number of threads (punctual) | X | X | X | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 042 | cpuid | CPU id (punctual) - Xdisabled by default | X | X | X | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 043 | dep-address | Dependence address (punctual) | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 044 | copy-data-in | Work descriptor id that is copying data in | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 045 | cache-copy-dada-in | Work descriptor id that is copying data in | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 046 | cache-copy-data-out | Work descriptor id that is copying data out | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 047 | sched-affinity-const | Constraint used in affinity scheduler | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 048 | in-mpi-runtime | Inside MPI runtime | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 049 | wd-ready | Workdescriptor becomes ready | X | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 050 | wd-blocked | Workdescriptor becomes blocked | X | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ .. tabularcolumns:: |l|l|J|l|l|l| +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | ### | Event | Description | ADV | DEV | USR | +=====+======================+===================================================================================+=====+=====+=====+ | 051 | parallel-outline-fct | Parallel outline function | X | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 052 | async-thread | Asynchronous thread state events | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 053 | copy-in-gpu | Asynchronous memory copy from host to device | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 054 | copy-out-gpu | Asynchronous memory copy from device to host | | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 055 | gpu-wd-id | GPU's work descriptor id | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 056 | wd-criticality | Work descriptor criticality | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 057 | blev-overheads | Total overheads of botlev scheduler | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 058 | blev-overheads-break | Overheads of botlev scheduler | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 059 | critical-wd-id | A critical work descriptor is submitted | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 060 | copy-dir-devices | Asynchronous memory copy between host and device | X | X | X | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 061 | concurrent-task | Number of concurrent task in the ready queue | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 062 | network-transfer | Network transfer to node | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 064 | thread-numa-node | NUMA node of the worker thread | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 065 | wd-numa-node | NUMA node assigned to the work descriptor | X | X | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+ | 066 | steal | If the work descriptor to be executed is the result of a steal operation | X | | | +-----+----------------------+-----------------------------------------------------------------------------------+-----+-----+-----+