OmpSs is an effort to integrate features from the StarSs programming model developed at BSC into a single programming model. In particular, our objective is to extend OpenMP with new directives to support asynchronous parallelism and heterogeneity (devices like GPUs, FPGAs). However, it can also be understood as new directives extending other accelerator-based APIs like CUDA or OpenCL. Our OmpSs environment is built on top of our Mercurium compiler and Nanos++ runtime system.
Asynchronous parallelism is enabled in OmpSs by the use data-dependencies between the different tasks of the program. To support heterogeneity, a new construct is introduced: the target construct.
In OmpSs the task construct also allows the annotation of function declarations or definitions in addition to structured-blocks. When a function is annotated with the task construct each invocation of that function becomes a task creation point. Note that only the execution of the function itself is part of the task not the evaluation of the task arguments (which are firstprivatized). Another restriction is that the task is not allowed to have any return value, that is, the return must be void.
The task construct allows expressing data-dependences among tasks using the
inout clauses (standing for input, output and input/Output respectively).
They allow to specify for each task in the program what data a task is waiting for and signaling is readiness.
Note, that whether the task really uses that data in the specified way its the programmer responsibility.
Each time a new task is created, its in and out dependences are matched against those of existing tasks.
If a dependency, either RaW, WaW or WaR, is found the task becomes a successor of the corresponding tasks.
This process creates a task dependency graph at runtime.
Tasks are scheduled for execution as soon as all their predecessor in the graph have finished (which does not mean they are executed immediately) or at creation if they have no predecessors.
The task construct is extended with the concurrent clause. The concurrent clause is a special version of the inout clause where the dependences are computed with respect to in, out and inout but not with respect to other concurrent clauses. As it relaxes the synchronization between tasks the programmer must ensure that either the task can executed concurrently or that additional synchronization is used.
The use of the data dependency clauses allows the execution of tasks from the multiple iterations at the same time. All dependency clauses allow extended lvalues from those of C/C++. Two different extensions are allowed: array sections allow to refer to multiple elements of an array (or pointed data) in a single expression and shaping expressions allow to recast pointers into recover the size of dimensions that could have been lost across calls.
The taskwait construct is also extended with the on clause. This clause allows waiting only on the tasks that produce some data in the same way as in clause. It suspends the current task until all previous tasks with an out over the expression are completed.
The goal of the target construct is to specify that a given element can be run in a set of devices. The target construct can be applied to either a task construct, which means that the task can be executed on a device, or a function definition, which means that this function has to be present in the device code.
The OmpSs Programming Model infrastructure is currently running on:
- Intel 32/64-bit platforms, including support for CUDA on NVidia GPUs.
- Intel MIC in native and offload modes.
- IBM Power8 platforms, including support for CUDA on Nvidia GPUs.
- ARM 32/64-bit platforms, including support for OpenCL MALI GPUs (32 bits version only).