|
Dynamic Load Balance 3.6.1+32-59d1
|
#include <dlb_talp.h>
POP metrics (of one monitor) collected among all processes
| char name[DLB_MONITOR_NAME_MAX] |
Name of the monitor
| int num_cpus |
Total number of CPUs used by the processes that have used the region
| int num_mpi_ranks |
Total number of mpi processes that have used the region
| int num_nodes |
Total number of nodes used by the processes that have used the region
| float avg_cpus |
Total average of CPUs used in the region. Only meaningful if LeWI enabled.
| int num_gpus |
TBD
| double cycles |
Total number of CPU cycles elapsed in that region during useful time
| double instructions |
Total number of instructions executed during useful time
| int64_t num_measurements |
Number of times that the region has been started and stopped among all processes
| int64_t num_mpi_calls |
Number of executed MPI calls combined among all MPI processes
| int64_t num_omp_parallels |
Number of encountered OpenMP parallel regions combined among all processes
| int64_t num_omp_tasks |
Number of encountered OpenMP tasks combined among all processes
| int64_t num_gpu_runtime_calls |
Number of executed GPU Runtime calls combined among all processes
| int64_t elapsed_time |
Time (in nanoseconds) of the accumulated elapsed time inside the region
| int64_t useful_time |
Time (in nanoseconds) of the accumulated CPU time of useful computation in the application
| int64_t mpi_time |
Time (in nanoseconds) of the accumulated CPU time (not useful) in MPI
| int64_t omp_load_imbalance_time |
Time (in nanoseconds) of the accumulated CPU time (not useful) spent due to load imbalance in OpenMP parallel regions
| int64_t omp_scheduling_time |
Time (in nanoseconds) of the accumulated CPU time (not useful) spent inside OpenMP parallel regions due to scheduling and overhead, not counting load imbalance
| int64_t omp_serialization_time |
Time (in nanoseconds) of the accumulated CPU time (not useful) spent outside OpenMP parallel regions
| int64_t gpu_runtime_time |
Time (in nanoseconds) of the accumulated CPU time (not useful) in GPU calls
| double min_mpi_normd_proc |
MPI time normalized at process level of the process with less MPI
| double min_mpi_normd_node |
MPI time normalized at node level of the node with less MPI
| int64_t gpu_useful_time |
| int64_t gpu_communication_time |
| int64_t gpu_inactive_time |
| int64_t max_gpu_useful_time |
| int64_t max_gpu_active_time |
| float parallel_efficiency |
Efficiency number [0.0 - 1.0] of the impact in the application's parallelization
| float mpi_parallel_efficiency |
Efficiency number of the impact in the MPI parallelization
| float mpi_communication_efficiency |
Efficiency lost due to MPI transfer and serialization
| float mpi_load_balance |
Efficiency of the MPI Load Balance
| float mpi_load_balance_in |
Intra-node MPI Load Balance coefficient
| float mpi_load_balance_out |
Inter-node MPI Load Balance coefficient
| float omp_parallel_efficiency |
Efficiency number of the impact in the OpenMP parallelization
| float omp_load_balance |
Efficiency of the OpenMP Load Balance inside parallel regions
| float omp_scheduling_efficiency |
Efficiency of the OpenMP scheduling inside parallel regions
| float omp_serialization_efficiency |
Efficiency lost due to OpenMP threads outside of parallel regions
| float device_offload_efficiency |
Efficiency of the Host offloading to the Device
| float gpu_parallel_efficiency |
TBD
| float gpu_load_balance |
TBD
| float gpu_communication_efficiency |
TBD
| float gpu_orchestration_efficiency |
TBD