Taskloop constructΒΆ

Programmers may specify that a for loop be taskified with the taskloop construct. The taskloop construct specifies that the iteration space of its subsequent for loop should be split into blocks. The various workers of the runtime system will, to the best of their ability, coordinate to concurrently execute these blocks. Programmers, as with the task construct, can leverage several clauses to separate, coordinate, and synchronize the iterations of the loop.

The syntax of the taskloop construct is as follows:

#pragma oss taskloop [clauses]
for-loop-block

Listed next are the clauses that are valid (and equivalent) for both the task and taskloop constructs:

  • private(<list>)

  • firstprivate(<list>)

  • shared(<list>)

  • depend(<type>: <memory-reference-list>)

  • <depend-type>(<memory-reference-list>)

  • reduction(<operator>:<memory-reference-list>)

  • cost(<expression>)

Furthermore, the taskloop construct uniquely accepts the following clauses:
  • grainsize(<expression>)

  • collapse(<expression>)

For a detailed description of the private, firstprivate, shared, depend, <depend-type>, reduction, and cost clauses, check section Task construct, as the features, requirements, and restrictions are equivalent for both constructs.

The grainsize construct allows choosing the granularity of blocks of iterations into which the iteration space of the loop is split. The expression in this clause must evaluate to a strictly positive integer value. If this clause is unused, the default granularity of a taskloop depends on the implementation of the runtime. For instance, in the case of NODES, the default granularity divides the iteration space between the maximum number of CPUs of the system. The following example shows how the iteration space of a loop could be divided into blocks depending on the number of CPUs in a system:

#define MAX_CPUS 10
int x[100];

int main() {

   int iters_per_cpu = 100 / MAX_CPUs;

   // The following taskloop will split the iteration space by
   // generate tasks which will process "iters_per_cpu" iterations
   #pragma oss taskloop grainsize(iters_per_cpu)
   for (int i = 0; i < 100; ++i) {
      do_computation(x[i]);
   }
}

The collapse construct allows collapsing two or more loops resulting in an optimized and simplified loop. The expression in this clause must specify the level at which the loops are collapsed, and the evaluation of the expression must result in a strictly positive integer value. The following example shows how a taskloop construct may collapse two nested loops with a reduction to generate a single combined loop:

int x[100];

int main() {

   int sum = 0;

   // Both for loops will be collapsed into a single one
   #pragma oss taskloop collapse(2) reduction(+ : sum)
   for (int i = 0; i < 10; ++i) {
      for (int j = 0; j < 10; ++j) {
         sum += x[(i*10)+j)];
      }
   }
}