Advanced Mechanisms¶
NODES also supports several advanced mechanisms and prototypes. In this section, we briefly introduce them.
Coroutine Support¶
The use of Coroutines is provided if a compiler with C++20 support is used. To compile with Coroutine support, the -fcoroutines flag must be passed, as shown in the example below:
clang++ -std=c++20 -fcoroutines -o test-coroutines.bin test-coroutines.cpp
For more detailed examples on the usage of Coroutines, check our correctness tests in the tests/correctness/coroutine subdirectory, and the original published paper on coroutines:
Arnau Cinca, Aleix Roca, Kevin Sala, Raúl Peñacoba, David Álvarez, Vicenç Beltran. Enhancing OmpSs-2 Suspendable Tasks by Combining Operating System and User-Level Threads with C++ Coroutines. In 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Milano, Italy, 2025, pp. 67–80. https://doi.org/10.1109/IPDPS64566.2025.00015
Taskiter Construct¶
Given an iterative application with loops presenting an identifiable task directed-acyclic graph (DAG) that repeats itself at each iteration, the taskiter construct optimizes their execution as follows:
The first iteration of the loop is executed, and its generated DAG is recorded.
The recorded DAG is transformed to generate a directed cyclic task graph (DCTG).
The generated DCTG is automatically optimized and utilized to execute the remaining iterations.
The following code, extracted from tests/correctness/taskiter/taskiter-unroll.cpp, shows an example usage of the taskiter clause:
#pragma oss taskiter shared(A) unroll(UNROLL)
for (int i = 0; i < NUM_ITERS; ++i) {
for (int j = 0; j < tilesize; ++j) {
#pragma oss task shared(A) firstprivate(j) inout(A[j])
{
A[j]++;
}
#pragma oss task shared(A) firstprivate(j) inout(A[j])
{
A[j]++;
}
}
}
#pragma oss taskwait
More information about the taskiter construct and its benefits can be found in its original paper.
Inner Parallelism¶
NODES offers a mechanism to parallelize inner runtime code using OmpSs-2 without having to pass through the compiler. Below we list the main features of this mechanism:
It is enabled through the
runtime.enable_inner_parallelismconfigure variable.At the moment, only specific parts of the
taskitercode leverage its potential.InnerParallelism::taskLoopallows parallelizing an inner loop within the NODES runtime as-is.InnerParallelism::taskWaitallows to synchronize the internally created tasks up until that point.Task synchronization falls entirely upon the developer, as the implicit taskwait in programs will not enforce their finalization.