Execution model

The most notable difference between OmpSs-2 and OpenMP is the absence of the parallel clause to specify where a parallel region starts and ends. This clause is required in OpenMP since it uses the fork-join execution model, in which users must specify when parallelism starts and ends. OmpSs-2 uses the model implemented by StarSs where parallelism is implicitly created when the application starts. Parallel resources can be seen as a pool of threads–hence the name, thread-pool execution model–that the underlying run-time will use during the execution.

Warning

The user has no control over this pool of threads, so the standard OpenMP methods omp_get_num_threads() or its variants are not available to use.

At the beginning of the execution, the OmpSs-2 run-time system creates an initial pool of worker threads (i.e., there is no master thread in OmpSs-2). The main function is wrapped in an implicit task, which is called the main task, and it is added to the queue of ready tasks. Then, one of the worker threads gets that task from the queue, and it starts executing it. Meanwhile, the rest of threads wait for other ready tasks, which could be instantiated by the main task or other running tasks.

Multiple threads execute tasks that are defined implicitly or explicitly by annotating programs with OmpSs-2 directives. The OmpSs-2 programming model is intended to support programs that will execute correctly both as parallel and as sequential programs (if the OmpSs-2 language is ignored).

OmpSs-2 allows the expression of parallelism through tasks. Tasks are independent pieces of code that can be executed by parallel resources at run-time. Whenever the program flow reaches a section of code that has been declared as a task, instead of executing the task’s body code, the program will create an instance of the task and will delegate its execution to the OmpSs-2 run-time system. The OmpSs-2 run-time system will eventually execute the task on a parallel resource. The execution of these explicitly generated tasks is assigned to one of the worker threads in the pool of threads, subject to the thread’s availability to execute work. Thus, the execution of the new task could be immediate, or deferred until later according to task scheduling constraints and thread availability. Threads are allowed to suspend the current task region at a task scheduling point in order to execute a different task.

Any directive that defines a task or a series of tasks can also appear within a task definition. This allows the definition of multiple levels of parallelism. Defining multiple levels of parallelism can lead to a better performance of applications, since the underlying OmpSs-2 run-time system can exploit factors like data or temporal locality between tasks. Supporting multi-level parallelism is also required to allow the implementation of recursive algorithms.

The synchronization of tasks of an application is required in order to produce a correct execution, since usually some tasks depend on data computed by other tasks. The OmpSs programming model offers two ways of expressing this: data dependencies, and explicit directives to set synchronization points.

Regular OmpSs-2 applications are linked as whole OmpSs-2 applications, however, the run-time system also has a library mode. In library mode, there is no implicit task for the main function. Instead, the user code defines functions that contain regular OmpSs-2 code (i.e., tasks, …), and offloads them to the run-time system through an API. In this case, the code is not linked as an OmpSs-2 application, but as a regular application that is linked to the run-time library.