5. nOS-V Library

This section describes how to use and configure the nOS-V library.

5.1. Linking with nOS-V

nOS-V applications can be linked by leveraging the -lnosv linker flag. The main API is available in the nosv.h header.

5.2. Runtime configuration

nOS-V comes with an extensive list of tuning, configuration and customization features. There are two complementary ways to change nOS-V runtime configuration, which are through the use of the configuration file and through presets.

5.2.1. Configuration file

The main way to change this configuration is through the use of the nosv.toml configuration file. When a nOS-V program calls nosv_init(), the library will search in the following order for a suitable configuration file:

  1. The file specified through the NOSV_CONFIG environment variable

  2. A file named nosv.toml in the current directory

  3. The default configuration file, located in the installation directory

After the configuration file has loaded, the override configuration is parsed, which is a set of comma-separated TOML-like variables specified in the NOSV_CONFIG_OVERRIDE environment variable. This is an example using the override mechanism to change some configurations:

$ export NOSV_CONFIG_OVERRIDE="shared_memory.name=example,turbo.enabled=true"
$ ./nos-v-application

Configurations are described in the default nosv.toml file, which is reproduced here:

# Default nOS-V configuration file.
# Note that every process in the same nOS-V instance should load an identical configuration file,
# otherwise, the behaviour is unspecified

# Miscellaneous configurations
[misc]
     # Default stack size when creating worker threads
     stack_size = "8M"

# Shared memory configuration
[shared_memory]
     # Name of the shared memory section. This variable acts as an "id" for each nOS-V instance
     # Two applications with the same shared memory name will share a nOS-V instance, while different names
     # will result in independent nOS-V instances. Note that both the name of the shared memory as well as the
     # isolation level have to match for nOS-V instances to connect
     name = "nosv"
     # Isolation level of nOS-V. Possible values are:
     # "process" - Each process has a separate nOS-V instance
     # "user" - nOS-V instances are shared for the same user
     # "group" - nOS-V instances are shared for the same group
     # "public" - nOS-V instances are shared for the whole system
     isolation_level = "user"
     # Size of the shared memory section. Choose powers of two
     size = "2G"
     # Start of the shared memory mapping
     # This option should be memory address codified in hexadecimal (start with 0x)
     start = 0x0000200000000000

# Scheduler configuration
# These parameters allow to fine tune the scheduler's behaviour
[scheduler]
     # Number of logical CPUs mapped to a single scheduler SPSC for ready tasks
     # Minimum is 1, there is no maximum. Try to choose numbers that divide evenly the number
     # of CPUs in the system.
     # A lower number will yield more scheduling throughput and is adequate when using
     # multiple task creator processes.
     # A higher number will yield lower scheduling latency and is adequate when using one or few
     # task creator processes
     cpus_per_queue = 1
     # Number of tasks that are grabbed off of a queue when processing ready tasks.
     # Lowering the number may be better if there are very few ready tasks during execution.
     # Increasing the batch may help if your CPU has bad single-thread performance.
     # In general, the default value of 64 should result in a good trade-off
     queue_batch = 64
     # Scheduler quantum in ns. This is a guideline for how long should we execute one process' tasks
     # until we have to switch to the next one. However, as nOS-V tasks are not preemptable, it isn't enforced.
     # This parameter is specially relevant for the nosv_schedpoint function, which will try to schedule
     # each quantum ns.
     # A lower value will cause more inter-process context switches but may provide more uniform progress,
     # while a higher value will minimize context switches but may stall applications for longer
     quantum_ns = 20000000 # nanoseconds
     # Size of the queues that are used to place tasks into the scheduler
     # A good value should be a multiple of queue_batch
     in_queue_size = 256
     # Enable the immediate successor policy, which controls wether nOS-V honors the NOSV_SUBMIT_IMMEDIATE
     # to execute a task immediately after the current one
     immediate_successor = true

# CPU Governor configuration
# Controls the policy that nOS-V follows to block idle CPUs to save energy and resources
[governor]
     # There is a choice between three different governor policies:
     # - hybrid: CPUs will spin for governor.spins before going to sleep when no work is available
     # - idle: CPUs will sleep immediately when no work is available
     # - busy: CPUs will never sleep
     # In general, use idle when targeting minimum power usage, and busy when targeting maximum performance
     # The default is hybrid as it provides a good balance between power and performance.
     policy = "hybrid"
     # Number of times a CPU will spin without sleeping in the "hybrid" policy.
     # When "idle" or "busy" are selected, this setting is ignored
     spins = 10000

# Topology config
[topology]
     # CPU binding for the execution. The valid values are "inherit" which inherits the cpu mask of the
     # first process that initializes the runtime, or a CPU mask in hexadecimal with the format "0xFFFF"
     binding = "inherit"
     # It is also possible to define the composition of the NUMA Nodes, which will override the autodiscovered
     # composition that nOS-V can do through libnuma.
     # It should be deffined as an array of strings, each being a cpu list. Use the system ids for the cpus.
     # Consider the following example in a system with 2 NUMA Nodes, each one with 8 cores:
     # numa_nodes = ["0-3,8-11", "4-7,12-15"]
     # Complex sets are disjoint sets of cores.
     # complex_sets = ["0,1", "2,3", "4,5", "6,7"]
     # Set to true if you want to print the topology at nOS-V initialization. Useful to check the computing
     # resources available to nOS-V.
     print = false

# Task affinity config
[task_affinity]
     # Default affinity for all created tasks. This will be the default value for nosv task affinity in the
     # current process. This setting is normally used in conjunction with "binding" and makes it possible to
     # easily combine multiple processes in the same nOS-V instance.
     # Values can be: "all" or "cpu-X", "numa-X", "complex-X" where "X" is the physical index of the resource.
     # For example, "numa-1" will set affinity of all tasks to the NUMA Node 1.
     default = "all"
     # Default affinity policy, either "strict" or "preferred". When the default affinity is "all", this value
     # is not used.
     default_policy = "strict"

# Thread affinity config
[thread_affinity]
     # Enable affinity compatibility support. An attached thread affinity is changed by nosv. Enabling this
     # option makes standard {sched,pthread}_{get,set}affinity calls return the affinity the thread had before
     # attach, as if it was not attached. This is useful for libraries that read the thread affinity to setup
     # startup resources, such as OpenMP or MPI threads.
     compat_support = true


# Debug options
[debug]
     # Dump all the configuration options nOS-V is running with, its final parsed values and the
     # path of the config file being used
     dump_config = false

# Hardware Counters configuration
[hwcounters]
     # Whether to print verbose information if a backend is enabled
     verbose = false
     # The enabled HWCounter backends. Possible options: "papi", "none"
     backend = "none"
     # The list of PAPI counters to read. By default only "PAPI_TOT_INS" and "PAPI_TOT_CYC"
     papi_events = [
          "PAPI_TOT_INS",
          "PAPI_TOT_CYC"
     ]

# Enabling turbo will cause nOS-V to set architecture-specific optimization flags on every created
# or attached threads. In x86, this will cause the FTZ and DAZ flags of the SSE FPU to be enabled,
# causing a significant performance increase in floating-point applications, but disabling IEEE-754
# compatibility.
[turbo]
     enabled = true

# Monitoring cappabilities and configuration.
[monitoring]
     enabled = false
     # Whether to print verbose information if monitoring is enabled
     verbose = false

# Instrumentation. Only applies if nOS-V was configured with --with-ovni
[instrumentation]
     # Select the instrumentation to be used. Current options are "none" or "ovni".
     version = "none"

# Configuration specific to the ovni instrumentation
[ovni]
     # Instrumentation levels [0-4] regulate the amount of events that are emitted into an ovni trace.
     # Higher levels emit more events, giving more information but incurring larger runtime overheads
     # and occupying more space in the final trace.
     level = 2
     # It is also possible to select a more fine-grained list of event groups, for advanced users who
     # want to disable some events selectively.
     # This list is parsed in order starting from an empty list, the keyword "all" denotes all event types
     # and the token "!" disables event types.
     # Thus, a list of ["all", "!task" ] enables everything and then disables only task-related events.
     # Note that with this option uncommented the instrumentation level will be ignored
     ## events = ["basic", "worker", "scheduler", "scheduler_submit", "scheduler_hungry", "memory", "api_attach",
     ##           "api_create", "api_destroy", "api_submit", "api_pause", "api_yield", "api_waitfor", "api_schedpoint",
     ##           "api_mutex_lock", "api_mutex_trylock", "api_mutex_unlock", "api_barrier_wait", "api_cond_wait",
     ##           "api_cond_signal", "api_cond_broadcast", "task", "kernel", "breakdown" ]
     #
     # Size in bytes of the ring buffer for kernel events (per thread)
     kernel_ringsize = "4M"

5.2.2. Presets

In addition to the configuration file, nOS-V supports some conveniency configuration presets, which can be activated through the NOSV_PRESET environment variable. These presets are applied after parsing the configuration, and change some of the runtime options to achieve a specific effect. Currently available presets are:

  • isolated, which runs each nOS-V application in a separate shared memory, disabling global scheduling between processes

  • shared-mpi, which runs nOS-V applications with global scheduling in user scope, binds applications to all available CPUs in the system, and establishes default affinities for each processes to the set of affine CPUs that were configured through the taskset command or MPI.

5.3. Debugging

Although nOS-V APIs have error checking, it may be useful to build the library with debug symbols enabled in order to pinpoint bugs and issues. To do so, you can use the --enable-debug flag at configure time.

Furthermore, it is also possible to build nOS-V with Address Sanitizer enabled. In this mode, the runtime will assist ASAN by poisoning areas of the shared memory when they are not in use, which may be useful to find memory bugs in user programs that end up corrupting nOS-V memory. To enable ASAN mode, you can use the --enable-asan flag at configure time.

5.4. Generating ovni traces

nOS-V can generate execution traces with the ovni library, which generates lightweight binary traces. Traces can then be transformed into a Paraver trace. It is worth noting that multiple nOS-V programs running in the same or different instances can be combined to create a coherent system-wide trace.

To enable the generation of ovni traces, nOS-V must be configured with the --with-ovni option. Once nOS-V has been built with ovni support, it is up to the user to enable it, as it is disabled by default. To enable ovni instrumentation a user must set the instrumentation.version configuration option to ovni.

Instrumentation points can be enabled or disabled in either a fine-grained manner through the ovni.events configuration variable or in a more coarse-grained manner using the ovni.level variable.

The trace will be left in the ovni/ directory, which can be transformed into a Paraver trace with the ovniemu utility. The Paraver configuration files (views) can be found in the ovni/cfg directory.

See the ovni documentation for more details.