OmpSs tutorial at ISCA 2013

Hybrid and Heterogeneous Parallel Programming with MPI/OmpSs for Exascale Systems

24 June, Tel Aviv, Israel 


Due to its asynchronous nature and look-ahead capabilities, MPI/OmpSs is a promising programming model approach for future exascale systems, with the potential to exploit unprecedented amounts of parallelism, while coping with memory latency, network latency and load imbalance. Many large-scale applications are already seeing very positive results from their ports to MPI/OmpSs (see EU projects Montblanc, TEXT). We will first cover the basic concepts of the programming model. OmpSs can be seen as an extension of the OpenMP model. Unlike OpenMP, however, task dependencies are determined at runtime thanks to the directionality of data arguments. The OmpSs runtime supports asynchronous execution of tasks on heterogeneous systems such as SMPs, GPUs and clusters thereof. The integration of OmpSs with MPI facilitates the migration of current MPI applications and improves, automatically, the performance of these applications by overlapping computation with communication between tasks on remote nodes. The tutorial will also cover the constellation of development and performance tools available for the MPI/OmpSs programming model: the methodology to determine OmpSs tasks, the Tareador tool, and the Paraver performance analysis tools.  The tutorial will also include practical sessions on application development and analysis on single many-core nodes, heterogeneous environments with GPUs, and cluster environments with MPI/OmpSs.


  1. Introduction to the StarSs family of programming models (15 minutes)
    • Exascale computing
      • Multi-  and many-core programming model challenges
    • The StarSs family of task-based programming models
    • StarSs vs. OpenMP and other alternatives
    • General features of StarSs
      • Flat global address and name space
      • Task dependency graph generation
      • Exploiting critical path traversal to increase parallelism
  2. OmpSs basic features (45 minutes)
    • Outlined/inlined tasks
    • Defining task dependencies
    • Synchronization
    • Defining array regions
    • Concurrent/commutative constructs
    • Task nesting
  3. Development methodology and infrastructure (30 minutes)
    • Defining tasks
    • Finding tasks with Tareador
    • Examples
    • Mercurium compiler
    • Nanos++ runtime system
  4. Overlapping computation and communication using tasks (30 minutes)
    • Motivation for a hybrid model
    • Advantages of MPI/OmpSs
    • Overlapping computation and communication
    • Examples
  5. Hands-on (60 minutes)
    • Compiling and running OmpSs applications
    • Tracing OmpSs applications
    • Single node examples (matmul, cholesky, heat-diffusion, multisort)
    • Trace generation and analysis with Paraver
  6. Support for heterogeneous platforms (60 minutes)
    • OmpSs and GPUs
    • Data transfers
    • Data cache management
    • Leveraging CUDA and OpenCL kernels
  7. Hands-on (60 minutes)
    • Heterogeneous examples (matmul, cholesky, heat-diffusion, multisort)
    • Tracing and analysis
  8. Hands-on (60 minutes)
    • MPI/OmpSs examples (matmul, cholesky, heat-diffusion, multisort)
    • Tracing and analysis