3. Language description¶
This section describes the OmpSs language, this is, all the necessary elements to understand how an application programmed in OmpSs executes and/or behaves in a parallel architecture. OmpSs provides a simple path for users already familiarized with the OpenMP programming model to easily write (or port) their programs to OmpSs.
This description is completely guided by the list of OmpSs directive constructs. In each of the following sections we will find a short description of the directive, its specific syntax, the list of its clauses (including the list of valid parameters for each clause and a short description for them). In addition, each section finalizes with a simple example showing how this directive can be used in a valid OmpSs program.
As is the case of OpenMP in C and C++, OmpSs directives are specified using the #pragma mechanism (provided by the base language) and in Fortran they are specified using special comments that are identified by a unique sentinel. The sentinel used in OmpSs (as is the case of OpenMP) is omp. Compilers will typically ignore OmpSs directives if support is disabled or not provided.
3.1. Task construct¶
The programmer can specify a task using the task
construct. This construct
can appear inside any code block of the program, which will mark the following statement
as a task.
The syntax of the task
construct is the following:
#pragma omp task [clauses]
structured-block
The valid clauses for the task
construct are:
- private(<list>)
- firstprivate(<list>)
- shared(<list>)
- depend(<type>: <memory-reference-list>)
- <depend-type>(<memory-reference-list>)
- reduction(<operator>: <memory-reference-list>)
- priority(<value>)
- if(<scalar-expression>)
- final(<scalar-expresion>)
- label(<string>)
- tied
The private
and firstprivate
clauses declare one or more list items to be private
to a task (i.e. the task receives a new list item). All internal references to the original
list item are replaced by references to this new list item.
List items privatized using the private
clause are uninitialized when the execution of
the task begins. List items privatized using the firstprivate
clause are initialized
with the value of the corresponding original item at task creation.
The shared
clause declare one or more list items to be shared to a task (i.e. the task
receives a reference to the original list item). The programmer must ensure that shared
storage does not reach the end of its lifetime before tasks referencing this storage have
finished.
The depend
clause allows to infer additional task scheduling restrictions from the
parameters it defines. These restrictions are known as dependences. The syntax of the
depend
clause include a dependence type, followed by colon and its associated list
items. The list of valid type of dependences are defined in section “Dependence flow” in
the previous chapter. In addition to this syntax, OmpSs allows to specify this information
using as the name of the clause the type of dependence. Then, the following code:
#pragma omp task depend(in: a,b,c) depend(out: d)
Is equivalent to this one:
#pragma omp task in(a,b,c) out(d)
The reduction
clause allows to define the task as a participant of a
reduction operation. The first occurrence of a participating task defines the
begin of the scope for the reduction. The end of the scope is implicitly ended
by a taskwait
or a dependence over the memory-reference-item
.
If the expression of the if
clause evaluates to true, the execution of the new created
task can be deferred, otherwise the current task must suspend its execution until the new
created task has complete its execution.
If the expression of the final
clause evaluates to true, the new created task will be
a final tasks and all the task generating code encountered when executing its dynamic extent
will also generate final tasks. In addition, when executing within a final task,
all the encountered task generating codes will execute these tasks immediately after its creation
as if they were simple routine calls. And finally, tasks created within a final task can use
the data environment of its parent task.
The tied
clause defines a new task scheduling restriction for the newly created tasks. Once a
thread begins the execution of this task, the task becomes tied to this thread. In the case this
task has suspended its execution by any task scheduling point only the same thread (i.e. the
thread to which the task is tied to) may resume its execution.
The label
clause defines a string literal that can be used by any performance or debugger tool
to identify the task with a more human-readable format.
The following C code shows an example of creating tasks using the task
construct:
float x = 0.0;
float y = 0.0;
float z = 0.0;
int main() {
#pragma omp task
do_computation(x);
#pragma omp task
{
do_computation(y);
do_computation(z);
}
return 0;
}
When the control flow reaches #pragma omp task
construct, a new task instance is created,
however when the program reaches return 0
the previously created tasks
may not have been executed yet by the OmpSs run-time.
The task construct is extended to allow the annotation of function declarations or definitions in addition to structured-blocks. When a function is annotated with the task construct each invocation of that function becomes a task creation point. Following C code is an example of how task functions are used:
extern void do_computation(float a);
#pragma omp task
extern void do_computation_task(float a);
float x = 0.0;
int main() {
do_computation(x); //regular function call
do_computation_task(x); //this will create a task
return 0;
}
Invocation of do_computation_task
inside main
function create an instance of a task. As in the example above, we cannot guarantee that the task has been executed before the execution of the main
finishes.
Note that only the execution of the function itself is part of the task not the evaluation of the task arguments. Another restriction is that the task is not allowed to have any return value, that is, the return must be void.
3.2. Target construct¶
To support heterogeneity a new construct is introduced: the target construct. The intent of the target construct is to specify that a given element can be run in a set of devices. The target construct can be applied to either a task construct or a function definition. In the future we will allow to allow it to work on worksharing constructs.
The syntax of the target
construct is the following:
#pragma omp target [clauses]
task-construct | function-definition | function-header
The valid clauses for the target
construct are the following:
- device(target-device) - It allows to specify on which devices should be targeting the construct. If no device clause is specified then the SMP device is assumed. Currently we also support the CUDA device that allows the execution of native CUDA kernels in GPGPUs.
- copy_in(list-of-variables) - It specifies that a set of shared data may be needed to be transferred to the device before the associated code can be executed.
- copy_out(list-of-variables) - It specifies that a set of shared data may be needed to be transferred from the device after the associated code is executed.
- copy_inout(list-of-variables) - This clause is a combination of copy_in and copy_out.
- copy_deps - It specifies that if the attached construct has any dependence clauses then they will also have copy semantics (i.e., in will also be considered copy_in, output will also be considered copy_out and inout as copy_inout).
- implements(function-name) - It specifies that the code is an alternate implementation for the target devices of the function name specified in the clause. This alternate can be used instead of the original if the implementation considers it appropriately.
- shmem(size_t) - It specifies the amount of memory the runtime will allocate for CUDA and OpenCL kernels. Shared memory should be handled by the user in the kernel code.
- ndrange(<syntax>) - It specifies the thread hierarchy used to execute the kernel in the device. It may be expressed by means of two different syntax:
- ndrange(n, G1,…, Gn, L1,…,Ln ) - The ‘n’ parameter determines the number of dimmensions, (i.e., 1, 2 or 3), and the ‘Gx’ and ‘Lx’ are sequence of scalars determining the global and local sizes respectively. There will be as many ‘G’ and ‘L’ as the number of dimmensions (e.g., ndrange(2, 1024, 1024, 128, 128), will create a thread hierarchy of 1024x1024 elements, grouped in blocks of 128x128 elements).
- ndrange(n, G[n], L[n] ) - The ‘n’ parameter determines the number of dimmensions, (i.e., 1, 2 or 3), and the ‘G’ and ‘L’ vectors contain as many elements as the dimension parameters (e.g., ndrange(2, Global, Local), where ‘Global’ is an int[ ] = {1024, 1024} and ‘Local’ is an int [ ] = {128,128}, will create a thread hierarchy of 1024x1024 elements, grouped in blocks of 128x128 elements).
Additionally, both SMP and CUDA tasks annotated with the target construct are eligible for execution a cluster environment in an experimental implementation. Please, contact us if you are interested in using it.
Restrictions:
- At most only one
device
clause must appear in the target construct. - At most only one
shmem
clause must appear in the target construct. - At most only one
implements
clause must appear in the target construct. - At most only one
ndrange
clause must appear in the target construct.
3.3. Loop construct¶
When a task encounters a loop construct it starts creating tasks for each of the chunks in which the loop’s iteration space is divided. Programmers can choice among different schedule policies in order to divide the iteration space.
The syntax of the loop
construct is the following:
#pragma omp for [clauses]
loop-block
The valid clauses for the loop
construct are the following:
schedule(schedule-policy[, chunk-size]) - It specifies one of the valid partition policies and, optionally, the chunk-size used to divide the iteration space. Valid schedule policies are one the following options:
- dynamic - loop is divided to team’s threads in tasks of chunk-size granularity. Tasks are assigned as threads request them. Once a thread finishes the execution of one of these tasks it will request another task. Default chunk-size is 1.
- guided - loop is divided as the executing threads request them. The chunk-size is proportional to the number of unassigned iterations, so it starts to be bigger at the beginning, but it becomes smaller as the loop execution progresses. Chunk-size will never be smaller than chunk-size parameter (except for the last iteration chunk).
- static - loop is divided into chunks of size chunk-size. Each task is divided among team’s threads following a round-robin fashion. If no chunk-size is provided all the iteration space is divided by number-of-threads chunks of the same size (or proximately the same size if number-of-threads does not divide number-of-iterations).
nowait - When this option is specified the encountering task can immediately proceed with the following code without wait for all the tasks created to execute each of the loop’s chunks.
Following C code shows an example on loop distribution:
float x[10];
int main() {
#pragma omp for schedule(static)
for (int i = 0; i < 10; i++) {
do_computation(x[i]);
}
return 0;
}
3.4. Taskwait construct¶
Apart from implicit synchronization (task dependences) OmpSs also offers mechanism which allow users to synchronize task execution.
taskwait
construct is an stand-alone directive (with no code block associated) and specifies a wait on the completion of all direct descendant tasks.
The syntax of the taskwait
construct is the following:
#pragma omp taskwait [clauses]
The valid clauses for the taskwait
construct are the following:
- on(list-of-variables) - It specifies to wait only for the subset (not all of them) of direct descendant tasks.
taskwait
with anon
clause only waits for those tasks referring any of the variables appearing on the list of variables.
The on
clause allows to wait only on the tasks that produces some data in the same way as in clause. It suspends the current task until all previous tasks with an out over the expression are completed. The following example illustrates its use:
int compute1 (void);
int compute2 (void);
int main()
{
int result1, result2;
#pragma omp task out(result1)
result1 = compute1();
#pragma omp task out(result2)
result2 = compute2();
#pragma omp taskwait on(result1)
printf("result1 = %d\n",result1);
#pragma omp taskwait on(result2)
printf("result2 = %d\n",result2);
return 0;
}
3.5. Taskyield directive¶
The taskyield
directive specifies that the current task can be suspended and the scheduler runtime is
allowed to scheduler a different task. The taskyield
explicitly includes a task schedule point.
The syntax of the taskyield
directive is the following:
#pragma omp taskyield
The taskyield
directive has no related clauses.
In the following example we can see how to use the taskyield
directive:
void compute ( void ) {
int a=0,b=0;
#pragma omp task shared(a)
{ a++;}
#pragma omp taskyield
#pragma omp task shared(b)
{ b++; }
#pragma omp taskwait
}
int main() {
#pragma omp task
compute();
#pragma omp taskwait
return 0;
}
When encountering the taskyield
directive the runtime system may decide among continue execute
the task compute (i.e. the current task) or begins the execution of the a++ task (if not yet executed).
3.6. Atomic construct¶
The atomic construct ensures that following expression is executed atomically. Runtime systems will use native machine mechanism to guarantee atomic execution. If there is no native mechanism to guarantee atomicity (e.g. function call) it will use a regular critical section to implement the atomic construct.
The syntax of the atomic
construct is the following:
#pragma omp atomic
structured-block
Atomic construct has no related clauses.
3.7. Critical construct¶
The critical
construct allows programmers to specify regions of code that will be executed in mutual exclusion. The associated region will be executed by a single thread at a time, other threads will wait at the beginning of the critical section until no thread in the team was executing it.
The syntax of the critical
construct is the following:
#pragma omp critical
structured-block
Critical construct has no related clauses.
3.8. Declare reduction construct¶
The user can define its own reduction-identifier using the declare reduction
directive.
After declaring the UDR, the reduction-identifier can be used in a reduction clause. The syntax of
this directive is the following one:
#pragma omp declare reduction(reduction-identifier : type-list : combiner-expr) [initializer(init-expr)]
- where:
- reduction-identifier is the identifier of the reduction which is being declared
- type-list is a list of types
- combiner-expr is the expression that specifies how we have to combine the partial results. We can use two predefined identifiers in this expression: omp_out and omp_in. The omp_out identifier is used to represent the result of the combination whereas the omp_in identifier is used to represent the input data.
- init-expr is the expression that specifies how we have to initialize the private copies. We can use also two predefined identifiers in this expression: omp_priv and omp_orig. The omp_priv identifier is used to represent a private copy whereas the omp_orig identifier is used to represent the original variable that is being involved in a reduction.
In the following example we can see how we declare a UDR and how we use it:
struct C {
int x;
};
void reducer(struct C* out, struct C* in) {
(*out).x += (*in).x;
}
#pragma omp declare reduction(my_UDR : struct C : reducer(&omp_out,&omp_in)) initializer(omp_priv = {0})
int main() {
struct C res = { 0 };
struct C v[N];
init(&v);
for (int i = 0; i < N; ++i) {
#pragma omp task reduction(my_UDR : res) in(v) firstprivate(i)
{
res.x += v[i].x;
}
}
#pragma omp taskwait
}