3. Language Directives¶

Warning

If you find any missing or outdated information regarding directives, clauses, or others, we would greatly appreciate your feedback. Please contact us at: star + at + bsc + . + es.

This chapter describes the OmpSs-2 language and all the necessary elements to understand how an OmpSs-2 application executes and/or behaves in a parallel architecture. OmpSs-2 provides a simple path for users already familiarized with the OpenMP programming model to easily write (or port) their programs to OmpSs-2.

This description is completely guided by the list of OmpSs-2 directives. Each of the following sections shows a short description of each directive, its specific syntax, and a list of its related clauses with its valid parameters and a short description. In addition, each section includes a simple example showing how to leverage each directive from an OmpSs-2 program.

As is the case of OpenMP in C and C++, OmpSs-2 directives are specified using the #pragma mechanism (provided by the base language). In Fortran, they are specified using special comments that are identified by a unique sentinel. The sentinel used in OmpSs-2 is oss. Compilers should ignore OmpSs-2 directives if support is disabled or not provided.

C/C++ format:

#pragma oss directive-name [clause[ [,] clause] ... ] new-line

Fortran format:

sentinel directive-name [clause[ [,] clause]...]

Where depend on Fortran fixed/free form:

The sentinels for fixed form can be: !$oss, c$oss or *$oss. Sentinels must start in column 1. Continued directive line must have a character other than a space or a zero in column 6.
The sentinel for free form must be !$oss. This sentinel can appear in any column as long as is not preceded by any character different than space. Continued directive line must have an ampersand (&).

3.1. Task construct¶

The programmer can specify that a block of code is a task using the task construct. This construct can appear inside any code block of the program, which will mark its following statement as a task.

The syntax of the task construct is the following:

#pragma oss task [clauses]
structured-block

The valid clauses for the task construct are:

private(<list>)

firstprivate(<list>)

shared(<list>)

depend(<type>: <memory-reference-list>)

<depend-type>(<memory-reference-list>)

reduction(<operator>:<memory-reference-list>)

priority(<expression>)

cost(<expression>)

if(<scalar-expression>)

final(<scalar-expression>)

wait

onready(<statement>)

label(<string>)

create(<scalar-expression>)

The private, firstprivate and shared clauses allow specifying the data sharing attribute of the variables referenced in the construct. A description of these clauses can be found in Data sharing attributes section.

The depend clause allows inferring additional task scheduling restrictions from the parameters it defines. These restrictions are known as dependences. The syntax of the depend clause includes a dependence type, followed by a colon and its associated list of items. The list of valid type of dependences are defined in section Dependence model in the previous chapter. In addition to this syntax, OmpSs-2 allows specifying this information using the type of dependence as the name of the clause. Thus, the following code:

#pragma oss task depend(in: a,b,c) depend(out: d)

Is equivalent to:

#pragma oss task in(a,b,c) out(d)

The reduction clause allows defining the task as a participant of a reduction operation. The first occurrence of a participating task defines the begin of the scope for the reduction. The end of the scope is implicitly ended by a taskwait or a dependence over the memory-reference-item. Further information on task reductions in OmpSs-2 can be found in the following Master Thesis: https://upcommons.upc.edu/handle/2117/129246.

The priority clause indicates a priority hint for the task. Greater numbers indicate higher priority, and lower numbers indicate lower priority. By default, tasks have a priority of 0. The expression of the priority clause is evaluated as a signed integer. This way, strictly positive priorities indicate higher priority than the default, and negative priorities indicate lower than default priority.

For the if clause, if its expression evaluates to true, the execution of the newly created task can be deferred. Otherwise the current task must suspend its execution until the newly created task has completed its execution.

For the final clause, if its expression evaluates to true, the newly created task will be a final task and all the task generating code encountered when executing its dynamic extent will also generate final tasks. In addition, when executing within a final task, all the encountered task generating codes will execute these tasks immediately after its creation as if they were simple routine calls. Finally, tasks created within a final task can use the data environment of its parent task.

Tasks including the wait clause will perform a taskwait-like operation immediately after exiting from its code. Since it is performed outside the scope of the task’s code, this happens once the task has abandoned the stack. For this reason, its use is restricted to tasks that upon exiting do not have any subtask accessing its local variables. Otherwise, the regular taskwait shall be used instead.

The onready clause allows defining an action in the form of a statement (e.g., a call to a function) that will be executed once the task becomes ready. This is explained in more detail in section Task Onready clause.

The label clause defines a string literal that can be used by any performance or debugger tool to identify the task with a more human-readable format. The string literal must be wrapped in double quotes. For instance, a task that initializes an array could be labeled as label("init array").

The cost clause serves as a hint on the computational weight of the task. It is an optional clause merely used for research purposes. Using it by default will not have any effect on the program or its execution.

The following C code shows an example of creating tasks using the task construct:

float x = 0.0;
float y = 0.0;
float z = 0.0;

int main() {
  #pragma oss task
  do_computation(x);

  #pragma oss task
  {
    do_computation(y);
    do_computation(z);
  }
  #pragma oss taskwait

  return 0;
}

When the control flow reaches the #pragma oss task construct, a new task instance is created. Since there is no synchronization between them, the execution order of the tasks is not guaranteed. Moreover, upon reaching the #pragma oss taskwait construct, the previously created tasks may not have been executed yet by the OmpSs-2 run-time system. After potentially being blocked in the taskwait construct for a while, it is guaranteed that both tasks have already deeply completed.

The task construct is extended to allow the annotation of function declarations or definitions in addition to structured-blocks. When a function is annotated with the task construct each invocation of that function becomes a task creation point. The following C code is an example of how task functions are used:

extern void do_computation(float a);

#pragma oss task
extern void do_computation_task(float a);

float x = 0.0;
int main() {
  do_computation_task(x); //this will create a task
  do_computation(x); //regular function call

  #pragma oss taskwait
}

The invocation of do_computation_task inside main function creates an instance of a task. Note that OmpSs-2 does not gaurantee that the task has been already executed after returning from the regular function call do_computation(x).

Note that only the execution of the function itself is part of the task not the evaluation of the task arguments. Another restriction is that the task is not allowed to have any return value, that is, the return must be void.

Warning

The for clause from the task directive is no longer part of OmpSs-2.

3.1.1. Task Onready clause¶

The onready clause allows defining an action in the form of a statement (e.g., a call to a function) that will be executed once the task becomes ready. The run-time system will execute the statement only once, at any moment after the task satisfies all its data dependencies and before the task runs its body. The onready action cannot assume that is running within a task context; it should not reach any task scheduling point. Moreover, the action is recommended to be lightweight and should not perform blocking operations.

The onready action can register external events to the ready task to delay its execution until all the events are fulfilled. As an example, the callback could execute an asynchronous TAMPI operation, such as TAMPI_Iwait. Such a call would delay the task’s execution until the corresponding MPI communications are completed. In that way, the data dependencies allow tasks to define local dependencies with other tasks on the same process, whereas the onready clause allows defining remote dependencies with other processes. See Task external events for more information about the management of task external events.

The data sharing and dependency rules for the variables used in the onready action are the same that apply to the task body.

We show an example below where a task safely increases by two the value of a variable from the onready action and the task body:

void function(int *a)
{
  // This is the first increase because it's the onready action
  *a += 1;
}

int main()
{
  int a = 0;

  #pragma oss task inout(a) onready(function(&a))
  {
    // At this point, the onready action was already executed
    ++a;
  }

  #pragma oss taskwait

  fprintf(stdout, "a: %d\n", a); // Should print "a: 2"
}

Below there is an example where we asynchronously receive (through TAMPI services) and process remote data using a single task with an onready action:

void function(int *a, int rank)
{
  MPI_Request request;
  MPI_Irecv(a, 1, MPI_INT, src, 0, MPI_COMM_WORLD, &request);

  // Asynchronously delay the execution of the task until the communication
  // has completed. This service will register an external event if the
  // receive has not completed immediately
  TAMPI_Iwait(&request, MPI_STATUS_IGNORE);
}

int main()
{
  // Initialize MPI...

  int a = 0;
  int rank = ...;

  #pragma oss task inout(a) onready(function(&a, rank))
  {
    fprintf(stdout, "received data: %d\n", a);
    process(a);
  }

  #pragma oss taskwait

  // Finalize MPI...
}

Please note this is just an example of how onready can be used to delay the execution of a task by registering external events. Performing heavy operations like MPI_Irecv in an onready action is not recommended because the onready action does not run in the context of a task.

3.1.2. Task Create clause (experimental)¶

Important

The create clause is experimental and may change or be removed in the future without any notice.

The optional create clause can inhibit the creation of a task and just execute the body directly. It is typically used to override the behavior of the final clause for a specific task.

The clause expects a conditional argument create(cond) which is evaluated when the task is to be created. If the condition evaluates to true the task is created, even if the task is final. Otherwise, if evaluates to false, the task is never created.

The following create clause:

#pragma oss task create(cond)
do_work(size)

Is equivalent to this code:

if (cond) {
        #pragma oss task create(true)
        do_work(size)
} else {
        do_work(size)
}

An example of the create clause to override the final effect for some specific tasks is depicted in the following diagram:

The tasks in grey won’t be created due to the final clause, but the ones in green with the create(true) clause, will always be created.

A task without the create clause will follow the normal creation process, following the rules imposed by the final clause if used.

The condition used in the create clause must not have side effects, otherwise the behavior is undefined. For example, create(size++ > 20) may or may not increase the value of the variable size.

It is important to note that when the task is not created and the body runs as-is, the dependencies or data sharing clauses of that task won’t have any effect (as there won’t be any task). It is the programmer’s responsibility to ensure that the program is still correct.

3.2. Taskloop construct¶

Programmers may specify that a for loop be taskified with the taskloop construct. The taskloop construct specifies that the iteration space of its subsequent for loop should be split into blocks. The various workers of the runtime system will, to the best of their ability, coordinate to concurrently execute these blocks. Programmers, as with the task construct, can leverage several clauses to separate, coordinate, and synchronize the iterations of the loop.

The syntax of the taskloop construct is as follows:

#pragma oss taskloop [clauses]
for-loop-block

Listed next are the clauses that are valid (and equivalent) for both the task and taskloop constructs:

private(<list>)

firstprivate(<list>)

shared(<list>)

depend(<type>: <memory-reference-list>)

<depend-type>(<memory-reference-list>)

reduction(<operator>:<memory-reference-list>)

cost(<expression>)

Furthermore, the taskloop construct uniquely accepts the following clauses:

grainsize(<expression>)
collapse(<expression>)

For a detailed description of the private, firstprivate, shared, depend, <depend-type>, reduction, and cost clauses, check section Task construct, as the features, requirements, and restrictions are equivalent for both constructs.

The grainsize construct allows choosing the granularity of blocks of iterations into which the iteration space of the loop is split. The expression in this clause must evaluate to a strictly positive integer value. If this clause is unused, the default granularity of a taskloop depends on the implementation of the runtime. For instance, in the case of NODES, the default granularity divides the iteration space between the maximum number of CPUs of the system. The following example shows how the iteration space of a loop could be divided into blocks depending on the number of CPUs in a system:

#define MAX_CPUS 10
int x[100];

int main() {

   int iters_per_cpu = 100 / MAX_CPUs;

   // The following taskloop will split the iteration space by
   // generate tasks which will process "iters_per_cpu" iterations
   #pragma oss taskloop grainsize(iters_per_cpu)
   for (int i = 0; i < 100; ++i) {
      do_computation(x[i]);
   }
}

The collapse construct allows collapsing two or more loops resulting in an optimized and simplified loop. The expression in this clause must specify the level at which the loops are collapsed, and the evaluation of the expression must result in a strictly positive integer value. The following example shows how a taskloop construct may collapse two nested loops with a reduction to generate a single combined loop:

int x[100];

int main() {

   int sum = 0;

   // Both for loops will be collapsed into a single one
   #pragma oss taskloop collapse(2) reduction(+ : sum)
   for (int i = 0; i < 10; ++i) {
      for (int j = 0; j < 10; ++j) {
         sum += x[(i*10)+j)];
      }
   }
}

3.3. Taskwait construct¶

Apart from implicit synchronization (task data dependences), OmpSs-2 also offers a mechanism that allow users to synchronize task execution. The taskwait construct is a standalone directive (with no code block associated) and specifies a wait on the deep completion of all descendant tasks, including the non-direct children tasks.

The syntax of the taskwait construct is the following:

#pragma oss taskwait [clauses]

The valid clauses for the taskwait construct are the following:

on(list-of-variables): Specifies to wait only for the subset (not all of them) of descendant tasks that declared a dependency on any of the variables that appear on the list of variables.

The on clause allows to wait only on the tasks that produces some data in the same way as the inout clause. It suspends the current task until all previous tasks with any dependency on the expression are completed. The following example illustrates its use:

int compute1(void);
int compute2(void);

int main()
{
  int result1, result2;

  #pragma oss task out(result1)
  result1 = compute1();

  #pragma oss task out(result2)
  result2 = compute2();

  #pragma oss taskwait on(result1)
  printf("result1 = %d\n", result1);

  #pragma oss taskwait on(result2)
  printf("result2 = %d\n", result2);

  return 0;
}

3.4. Release directive¶

The release directive asserts that a task will no longer perform accesses that conflict with the contents of the associated depend clause. The contents of the release clause must be a subset of the contents of the depend clause of the task, and the former will no longer be referenced for the remaining lifetime of the task and its future subtasks. The release directive does not have an associated structured block.

The syntax of the release directive is as follows:

#pragma oss release [clauses]

The valid clauses for the release directive are:

depend(<type>: <memory-reference-list>)

<depend-type>(<memory-reference-list>)

The following C code shows an example of partial release of task dependences using the release directive:

#define SIZE 4096

float x[SIZE];
float y[SIZE];

int main() {

  #pragma oss task depend(out:x,y)
  {
     for (int i=0; i<SIZE; i++) x[i] = 0.0;
     #pragma oss release depend(out:x)

     for (int i=0; i<SIZE; i++) y[i] = 0.0;
  }
}

Warning

At this moment, the run-time system only supports releasing a dependency of the same dependency type that was specified at the task depend clause.

3.5. Atomic construct¶

Warning

The following information applies to clang. Mercurium does support atomic, but without clauses.

The atomic construct ensures that a specific storage location is accessed atomically, rather than exposing it to the possibility of multiple, simultaneous reading and writing threads that may result in indeterminate values:

#pragma oss atomic
statement

The construct allows the clauses read, write, and update to define the semantics for which a directive enforces atomicity. If no clause is present, the behavior is as if the update clause is specified.

read results in an atomic read of the location designated by x. The statement has the following form:
```
v = x;
```
write results in an atomic write of the location designated by x. The statement has the following form:
```
v = expr;
```
update results in an atomic update of the location designated by x using the designated operator or intrinsic. Only the read and write of the location designed by x are performed mutually atomically. The statement has the following form:
```
x++;
x--;
++x;
--x;
x binop= expr;
x = x binop expr;
x = expr binop x;
```
capture is an atomic capture update. That is, an atomic update to the location designed by x using the designated operator or intrinsic while also capturing the original or final value of the location designed by x with respect to the atomic update. The original or final value of the location designated by x is written in the location designated by v. Only the read and write of the location designed by x are performed mutually atomically. The statement has the following form:
```
v = expr-stmt
{ v = x; expr-stmt }
{ expr-stmt v = x; }
```
where expr-stmt is either an atomic write or update statement.

The atomic construct also allows memory order clauses: relaxed, acquire, release, acq_rel, seq_cst. If no memory order clause is specified the default memory ordering is relaxed.

3.6. Critical construct¶

The critical construct allows programmers to specify regions of code that will be executed in mutual exclusion. The associated region will be executed by a single thread at a time, other threads will wait at the beginning of the critical section until no thread is executing it.

The syntax of the critical construct is the following:

#pragma oss critical
structured-block

The syntax also allows named criticals with the following syntax:

#pragma oss critical(<name>)
structured-block

Named criticals prevent concurrency between threads with respect to all critical regions with the same name. Unnamed criticals prevent concurrency between threads with respect to all unnamed critical regions.

The critical construct has no related clauses. The beginning and ending of a critical section may be task scheduling points.

3.7. Assert directive¶

The assert is a declarative directive that checks whether any runtime configuration option is enabled or has a specific value. The directive expects a single string with comma-separated conditions. Each condition is composed of the option name, a comparison operator (==, !=, >, >=, < and <=), and the value to compare.

All conditions are checked before starting the program, and if any condition fails, the program aborts showing an error with the incorrect option. The configuration options are runtime-specific, so each runtime system may define its valid configuration options. Since this directive is declarative, it should be written in any part of the program’s source code files. However, the directive must appear at the file scope, as shown in the examples below.

The syntax of the assert directive is:

#pragma oss assert("option1==40,option2<10,option3!=stringvalue")

int main() {
  // ...
}

Check the OmpSs-2 User Guide to see the runtime configuration options that can be asserted. For instance, we can assert that a program is running using the regions dependency system and that the thread stack size is greater than 4MB with the following code:

#pragma oss assert("version.dependencies==regions,misc.stack_size>4M")

int main() {
  // ...
}

3.8. Restrictions¶

The point of exit of a stuctured block cannot be a branch out of it. The following C code shows an example where any of the exit calls may lead to undefined behavior:

int main() {

  for (int i = 0; i < 10; i++) {
    if (i > 5) exit(1);

    #pragma oss task
    {
      // computation
      exit(1);
    }
  }
  #pragma oss taskwait
}