3. Language Directives

This chapter describes the OmpSs-2 language, this is, all the necessary elements to understand how an OmpSs-2 application executes and/or behaves in a parallel architecture. OmpSs-2 provides a simple path for users already familiarized with the OpenMP programming model to easily write (or port) their programs to OmpSs-2.

This description is completely guided by the list of OmpSs-2 directives. In each of the following sections we will find a short description of the directive, its specific syntax, the list of clauses (including the list of valid parameters for each clause and a short description for them). In addition, each section finalizes with a simple example showing how this directive can be used in a valid OmpSs-2 program.

As is the case of OpenMP in C and C++, OmpSs-2 directives are specified using the #pragma mechanism (provided by the base language) and in Fortran they are specified using special comments that are identified by a unique sentinel. The sentinel used in OmpSs-2 is oss. Compilers will typically ignore OmpSs-2 directives if support is disabled or not provided.

C/C++ format:

#pragma oss directive-name [clause[ [,] clause] ... ] new-line

Fortran format:

sentinel directive-name [clause[ [,] clause]...]

Where depend on Fortran fixed/free form:

  • The sentinels for fixed form can be: !$oss, c$oss or *$oss. Sentinels must start in column 1. Continued directive line must have a character other than a space or a zero in column 6.

  • The sentinel for free form must be !$oss. This sentinel can appear in any column as long as is not preceeded by any character different than space. Continued directive line must have an ampersand (&).

3.1. Task construct

The programmer can specify a task using the task construct. This construct can appear inside any code block of the program, which will mark the following statement as a task.

The syntax of the task construct is the following:

#pragma oss task [clauses]
structured-block

The valid clauses for the task construct are:

  • private(<list>)

  • firstprivate(<list>)

  • shared(<list>)

  • depend(<type>: <memory-reference-list>)

  • <depend-type>(<memory-reference-list>)

  • reduction(<operator>:<memory-reference-list>)

  • priority(<expression>)

  • cost(<expression>)

  • if(<scalar-expression>)

  • final(<scalar-expression>)

  • wait

  • onready(<statement>)

  • label(<string>)

  • create(<scalar-expression>)

The private, firstprivate and shared clauses allow to specify the data sharing attribute of the variables referenced in the construct. A description of these clauses can be found in Data sharing attributes section.

The depend clause allows to infer additional task scheduling restrictions from the parameters it defines. These restrictions are known as dependences. The syntax of the depend clause include a dependence type, followed by colon and its associated list items. The list of valid type of dependences are defined in section Dependence model in the previous chapter. In addition to this syntax, OmpSs-2 allows to specify this information using the type of dependence as the name of the clause. Then, the following code:

#pragma oss task depend(in: a,b,c) depend(out: d)

Is equivalent to this one:

#pragma oss task in(a,b,c) out(d)

The reduction clause allows to define the task as a participant of a reduction operation. The first occurrence of a participating task defines the begin of the scope for the reduction. The end of the scope is implicitly ended by a taskwait or a dependence over the memory-reference-item. More information about task reductions on OmpSs-2 at the following Master Thesis: https://upcommons.upc.edu/handle/2117/129246.

The priority clause indicates a priority hint for the task. Greater numbers indicate higher priority, and lower numbers indicate less priority. By default, tasks have priority 0. The expression of the priority is evaluated as a singed integer. This way, strictly positive priorities indicate higher priority than the default, and negative priorities indicate lower than default priority.

If the expression of the if clause evaluates to true, the execution of the new created task can be deferred, otherwise the current task must suspend its execution until the new created task has complete its execution.

If the expression of the final clause evaluates to true, the new created task will be a final task and all the task generating code encountered when executing its dynamic extent will also generate final tasks. In addition, when executing within a final task, all the encountered task generating codes will execute these tasks immediately after its creation as if they were simple routine calls. And finally, tasks created within a final task can use the data environment of its parent task.

Tasks with the wait clause will perform a taskwait-like operation immediately after exiting from the task code. Since it is performed outside the scope of the code of the task, this happens once the task has abandoned the stack. For this reason, its use is restricted to tasks that upon exiting do not have any subtask accessing its local variables. Otherwise, the regular taskwait shall be used instead.

The onready clause allow defining an action in the form of a statement (e.g., a call to a function) that will be executed once the task becomes ready. This is explained in more detail in section Task Onready clause.

The label clause defines a string literal that can be used by any performance or debugger tool to identify the task with a more human-readable format. The string literal must be wrapped in double quotes. For instance, a task that initializes an array could be labeled as label("init array").

The following C code shows an example of creating tasks using the task construct:

float x = 0.0;
float y = 0.0;
float z = 0.0;

int main() {
  #pragma oss task
  do_computation(x);

  #pragma oss task
  {
    do_computation(y);
    do_computation(z);
  }
  #pragma oss taskwait

  return 0;
}

When the control flow reaches #pragma oss task construct, a new task instance is created. The execution order of the tasks is not guaranteed. Moreover, when the program reaches the #pragma oss taskwait the previously created tasks may not have been executed yet by the OmpSs-2 run-time system. After potentially being blocked in the taskwait construct for a while, it is guaranteed that both tasks have already deeply completed.

The task construct is extended to allow the annotation of function declarations or definitions in addition to structured-blocks. When a function is annotated with the task construct each invocation of that function becomes a task creation point. The following C code is an example of how task functions are used:

extern void do_computation(float a);

#pragma oss task
extern void do_computation_task(float a);

float x = 0.0;
int main() {
  do_computation_task(x); //this will create a task
  do_computation(x); //regular function call

  #pragma oss taskwait
}

The invocation of do_computation_task inside main function creates an instance of a task. Note that OmpSs-2 does not gaurantee that the task has been already executed after returning from the regular function call do_computation(x).

Note that only the execution of the function itself is part of the task not the evaluation of the task arguments. Another restriction is that the task is not allowed to have any return value, that is, the return must be void.

Warning

The for clause from the task directive is no longer part of OmpSs-2.

3.1.1. Task Onready clause

The onready clause allows defining an action in the form of a statement (e.g., a call to a function) that will be executed once the task becomes ready. The run-time system will execute the statement only once, at any moment after the task satisfies all its data dependencies and before the task runs its body. The onready action cannot assume that is running within a task context; it should not reach any task scheduling point. Moreover, the action is recommended to be lightweight and should not perform blocking operations.

The onready action can register external events to the ready task to delay its execution until all the events are fulfilled. As an example, the callback could execute an asynchronous TAMPI operation, such as TAMPI_Iwait. Such a call would delay the task’s execution until the corresponding MPI communications are completed. In that way, the data dependencies allow tasks to define local dependencies with other tasks on the same process, whereas the onready clause allows defining remote dependencies with other processes. See Task external events for more information about the management of task external events.

The data sharing and dependency rules for the variables used in the onready action are the same that apply to the task body.

We show an example below where a task safely increases by two the value of a variable from the onready action and the task body:

void function(int *a)
{
  // This is the first increase because it's the onready action
  *a += 1;
}

int main()
{
  int a = 0;

  #pragma oss task inout(a) onready(function(&a))
  {
    // At this point, the onready action was already executed
    ++a;
  }

  #pragma oss taskwait

  fprintf(stdout, "a: %d\n", a); // Should print "a: 2"
}

Below there is an example where we asynchronously receive (through TAMPI services) and process remote data using a single task with an onready action:

void function(int *a, int rank)
{
  MPI_Request request;
  MPI_Irecv(a, 1, MPI_INT, src, 0, MPI_COMM_WORLD, &request);

  // Asynchronously delay the execution of the task until the communication
  // has completed. This service will register an external event if the
  // receive has not completed immediately
  TAMPI_Iwait(&request, MPI_STATUS_IGNORE);
}

int main()
{
  // Initialize MPI...

  int a = 0;
  int rank = ...;

  #pragma oss task inout(a) onready(function(&a, rank))
  {
    fprintf(stdout, "received data: %d\n", a);
    process(a);
  }

  #pragma oss taskwait

  // Finalize MPI...
}

Please note this is just an example of how onready can be used to delay the execution of a task by registering external events. Performing heavy operations like MPI_Irecv in an onready action is not recommended because the onready action does not run in the context of a task.

3.1.2. Task Create clause (experimental)

Important

The create clause is experimental and may change or be removed in the future without any notice.

The optional create clause can inhibit the creation of a task and just execute the body directly. It is typically used to override the behavior of the final clause for an specific task.

The clause expects a conditional argument create(cond) which is evaluated when the task is to be created. If the condition evaluates to true the task is created, even if the task is final. Otherwise, if evaluates to false, the task is never created.

The following create clause:

#pragma oss task create(cond)
do_work(size)

Is equivalent to this code:

if (cond) {
        #pragma oss task create(true)
        do_work(size)
} else {
        do_work(size)
}

An example of the create clause to override the final effect for some specific tasks is depicted in the following diagram:

../_images/create-true.png

The tasks in grey won’t be created due to the final clause, but the ones in green with the create(true) clause, will always be created.

A task without the create clause will follow the normal creation process, following the rules imposed by the final clause if used.

The condition used in the create clause must not have side-effects, otherwise the behavior is undefined. For example, create(size++ > 20) may or may not increase the value of the variable size.

It is important to note that when the task is not created and the body runs as-is, the dependencies or data sharing clauses of that task won’t have any effect (as there won’t be any task). It is the programmer responsibility to ensure that the program it is still correct.

3.2. Taskwait construct

Apart from implicit synchronization (task data dependences), OmpSs-2 also offers a mechanism that allow users to synchronize task execution. The taskwait construct is an stand-alone directive (with no code block associated) and specifies a wait on the deep completion of all descendant tasks, including the non-direct children tasks.

The syntax of the taskwait construct is the following:

#pragma oss taskwait [clauses]

The valid clauses for the taskwait construct are the following:

  • on(list-of-variables): It specifies to wait only for the subset (not all of them) of descendant tasks that declared a dependency on any of the variables that appear on the list of variables.

The on clause allows to wait only on the tasks that produces some data in the same way as the inout clause. It suspends the current task until all previous tasks with any dependency on the expression are completed. The following example illustrates its use:

int compute1(void);
int compute2(void);

int main()
{
  int result1, result2;

  #pragma oss task out(result1)
  result1 = compute1();

  #pragma oss task out(result2)
  result2 = compute2();

  #pragma oss taskwait on(result1)
  printf("result1 = %d\n", result1);

  #pragma oss taskwait on(result2)
  printf("result2 = %d\n", result2);

  return 0;
}

3.3. Release directive

The release directive asserts that a task will no longer perform accesses that conflict with the contents of the associated depend clause. The contents of the depend clause must be a subset of that of the task construct that is no longer referenced in the rest of the lifetime of the current task and its future subtasks. The release directive has not associated structured block.

The syntax of the release directive is the following:

#pragma oss release [clauses]

The valid clauses for the release directive are:

  • depend(<type>: <memory-reference-list>)

  • <depend-type>(<memory-reference-list>)

The following C code shows an example of partial release of the task dependences using the release directive:

#define SIZE 4096

float x[SIZE];
float y[SIZE];

int main() {

  #pragma oss task depend(out:x,y)
  {
     for (int i=0; i<SIZE; i++) x[i] = 0.0;
     #pragma oss release depend(out:x)

     for (int i=0; i<SIZE; i++) y[i] = 0.0;
  }
}

Warning

At this moment, the run-time system only supports releasing a dependency of the same dependency type that was specified at the task depend clause.

3.4. Atomic construct

Warning

The following information applies to clang. Mercurium does support atomic, but without clauses.

The atomic construct ensures that a specific storage location is accessed atomically, rather than exposing it to the possibility of multiple, simultaneous reading and writing threads that may result in indeterminate values:

#pragma oss atomic
statement

The construct allows the clauses read, write, and update to define the semantics for which a directive enforces atomicity. If no clause is present, the behavior is as if the update clause is specified.

  • read results in an atomic read of the location designated by x. The statement has the following form:

    v = x;
    
  • write results in an atomic write of the location designated by x. The statement has the following form:

    v = expr;
    
  • update results in an atomic update of the location designated by x using the designated operator or intrinsic. Only the read and write of the location designed by x are performed mutually atomically. The statement has the following form:

    x++;
    x--;
    ++x;
    --x;
    x binop= expr;
    x = x binop expr;
    x = expr binop x;
    
  • capture is an atomic capture update. That is, an atomic update to the location designed by x using the designated operator or intrinsic while also capturing the original or final value of the location designed by x with respect to the atomic update. The original or final value of the location designated by x is written in the location designated by v. Only the read and write of the location designed by x are performed mutually atomically. The statement has the following form:

    v = expr-stmt
    { v = x; expr-stmt }
    { expr-stmt v = x; }
    

    where expr-stmt is either an atomic write or update statement.

The atomic construct also allows memory order clauses: relaxed, acquire, release, acq_rel, seq_cst. If no memory order clause is specified the default memory ordering is relaxed.

3.5. Critical construct

The critical construct allows programmers to specify regions of code that will be executed in mutual exclusion. The associated region will be executed by a single thread at a time, other threads will wait at the beginning of the critical section until no thread is executing it.

The syntax of the critical construct is the following:

#pragma oss critical
structured-block

The syntax also allows named criticals with the following syntax:

#pragma oss critical(<name>)
structured-block

Named criticals prevent concurrency between threads with respect to all critical regions with the same name. Unnammed criticals prevent concurrency between threads with respect to all unnamed critical regions.

The critical construct has no related clauses. The beginning and ending of a critical section may be task scheduling points.

3.6. Assert directive

The assert is a declarative directive that checks whether any runtime configuration option is enabled or has a specific value. The directive expects a single string with comma-separated conditions. Each condition is composed of the option name, a comparison operator (==, !=, >, >=, < and <=), and the value to compare.

All conditions are checked before starting the program, and if any fails, the program aborts showing an error with the incorrect option. The configuration options are runtime-specific, so each runtime system may define its valid configuration options. Since this directive is declarative, it should be written in any part of the program’s source code files.

The syntax of the assert directive is:

#pragma oss assert("option1==40,option2<10,option3!=stringvalue")

int main() {
  // ...
}

Check the OmpSs-2 User Guide to see the runtime configuration options that can be asserted. For instance, we can assert that a program is running using the regions dependency system and that the thread stack size is greater than 4MB with the following code:

#pragma oss assert("version.dependencies==regions,misc.stack_size>4M")

int main() {
  // ...
}

3.7. Restrictions

The point of exit of a stuctured block cannot be a branch out of it. The following C code shows an example where any of the exit calls may lead to undefined behavior:

int main() {

  for (int i = 0; i < 10; i++) {
    if (i > 5) exit(1);

    #pragma oss task
    {
      // computation
      exit(1);
    }
  }
  #pragma oss taskwait
}