Advanced Mechanisms¶

NODES also supports several advanced mechanisms and prototypes. In this section, we briefly introduce them.

Coroutine Support¶

The use of Coroutines is provided if a compiler with C++20 support is used. To compile with Coroutine support, the -fcoroutines flag must be passed, as shown in the example below:

clang++ -std=c++20 -fcoroutines -o test-coroutines.bin test-coroutines.cpp

For more detailed examples on the usage of Coroutines, check our correctness tests in the tests/correctness/coroutine subdirectory, and the original published paper on coroutines:

Arnau Cinca, Aleix Roca, Kevin Sala, Raúl Peñacoba, David Álvarez, Vicenç Beltran. Enhancing OmpSs-2 Suspendable Tasks by Combining Operating System and User-Level Threads with C++ Coroutines. In 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Milano, Italy, 2025, pp. 67–80. https://doi.org/10.1109/IPDPS64566.2025.00015

Taskiter Construct¶

Given an iterative application with loops presenting an identifiable task directed-acyclic graph (DAG) that repeats itself at each iteration, the taskiter construct optimizes their execution as follows:

The first iteration of the loop is executed, and its generated DAG is recorded.
The recorded DAG is transformed to generate a directed cyclic task graph (DCTG).
The generated DCTG is automatically optimized and utilized to execute the remaining iterations.

The following code, extracted from tests/correctness/taskiter/taskiter-unroll.cpp, shows an example usage of the taskiter clause:

#pragma oss taskiter shared(A) unroll(UNROLL)
for (int i = 0; i < NUM_ITERS; ++i) {
    for (int j = 0; j < tilesize; ++j) {
        #pragma oss task shared(A) firstprivate(j) inout(A[j])
        {
            A[j]++;
        }

        #pragma oss task shared(A) firstprivate(j) inout(A[j])
        {
            A[j]++;
        }
    }
}

#pragma oss taskwait

More information about the taskiter construct and its benefits can be found in its original paper.

Inner Parallelism¶

NODES offers a mechanism to parallelize inner runtime code using OmpSs-2 without having to pass through the compiler. Below we list the main features of this mechanism:

It is enabled through the runtime.enable_inner_parallelism configure variable.
At the moment, only specific parts of the taskiter code leverage its potential.
InnerParallelism::taskLoop allows parallelizing an inner loop within the NODES runtime as-is.
InnerParallelism::taskWait allows to synchronize the internally created tasks up until that point.
Task synchronization falls entirely upon the developer, as the implicit taskwait in programs will not enforce their finalization.