For the last several months we have been working on NANOS++, the replacement of our old OpenMP nanos4 runtime. The design objectives that are driving the development are:
- Extensible support
- Heterogeneity support
- Multiple programming model support
Our aim has been to enable easy development of different parts of the runtime so researchers have a platform that allows them to try different mechanisms. So far, there are several parts of the runtime which are quite extensible: the scheduling policy, the throttling policy, the barrier implementations, the slicers implementation, the instrumentation layer and the architectural level. This extensibility does not come for free. The runtime overheads are slightly increased, but there should be low enough for results to be meaningful except for cases of extreme-fine grain applications.
The execution model of the runtime is asynchronous task parallelism. Task can be spawned and then synchronized between them based on point-to-point dependencies. This execution model allows us to implement several programming models on top. Currently we support the StarSs model, partially the OpenMP model and the Chapel language model. This model also simplifies support for heterogeneity as one task is easy to spawn to accelerators.
One of our current focus is to bring our OpenMP support to the level that we had with the previous runtime (note that you also need our Mercurium compiler for this). Another current focus is management of data transfer for heterogeneous platforms such as GPUs or other environments where explicit data management may make sense (such as NUMA architectures). The current git version already has some minimal support for GPUs that use CUDA and where the runtime does all the data transfers.
Of course the runtime has a lot internal development to enable support features such as instrumentation, debug, synchronization mechanism, etc.
We’ll probably to do more posts on different specific aspects of the runtime in the future.