# Heat Benchmark ## Description The Heat simulation uses an iterative Gauss-Seidel method to solve the heat equation, which is a parabolic partial differential equation that describes the distribution of heat (or variation in temperature) in a given region over time. The heat equation is of fundamental importance in a wide range of science fields. In mathematics, it is the parabolic partial differential equation par excellence. In statistics, it is related to the study of the Brownian motion. Also, the diffusion equation is a generic version of the heat equation, and it is related to the study of chemical diffusion processes. ## Requirements The requirements of this application are shown in the following lists. The main requirements are: * **GNU Compiler Collection**. * **OmpSs-2**: OmpSs-2 is the second generation of the OmpSs programming model. It is a task-based programming model originated from the ideas of the OpenMP and StarSs programming models. The specification and user-guide are available at https://pm.bsc.es/ompss-2-docs/spec/ and https://pm.bsc.es/ompss-2-docs/user-guide/, respectively. OmpSs-2 requires both Mercurium and Nanos6 tools. Mercurium is a source-to-source compiler which provides the necessary support for transforming the high-level directives into a parallelized version of the application. The Nanos6 runtime system library provides the services to manage all the parallelism in the application (e.g. task creation, synchronization, scheduling, etc). Both can be downloaded from https://github.com/bsc-pm. * **MPI**: This application requires an MPI library supporting the multi-threading mode. It mainly targets the MPICH implementation, however, it should work with other libraries by adding the needed implementation-specific flags in the Makefile. In addition, there are optional tools which enable the building of other application versions: * **Task-Aware MPI (TAMPI)**: The Task-Aware MPI library provides the interoperability mechanism for MPI and OmpSs. All TAMPI releases and information are available at https://github.com/bsc-pm/tampi. * **GPI-2**: The GPI-2 library implements the GASPI distributed programming model. All releases and information are available at http://www.gpi-site.com. ## Versions The heat application has several versions which are compiled in different binaries, by executing the `make` command. They are: * **01.heat_seq.bin**: Sequential version of this benchmark. * **02.heat_ompss.bin**: Parallel version with OmpSs-2. It divides the 2-D matrix into different 2-D blocks of consecutive elements, and a task is created to compute each block. Tasks declare fine-grained dependencies on the target block and its adjacent blocks. * **03.heat_mpi.bin**: Parallel version using MPI. It divides horizontally the 2-D matrix and each MPI process is responsible for computing its assigned rows. * **04.heat_mpi_ompss_forkjoin.bin**: Parallel version using MPI + OmpSs-2. It uses a fork-join parallelization strategy, where computation phases are parallelized with tasks and communication phases are sequential. * **05.heat_mpi_ompss_tasks.bin**: Parallel version using MPI + OmpSs-2 tasks. Both computation and communication are parallelized with tasks. However, communication tasks are serialized declaring a dependency on a sentinel variable. This is to prevent deadlocks between processes, since communications tasks perform blocking MPI calls. * **06.heat_mpi_ompss_tasks_interop.bin**: Parallel version using MPI + OmpSs-2 tasks + TAMPI library. This version disables the artificial dependencies on the sentinel variable, so communication tasks can run totally in parallel. The TAMPI library is in charge of managing the blocking MPI calls to avoid the blocking of the underlying execution resources. * **07.heat_mpi_ompss_tasks_interop_async.bin**: Parallel version using MPI + OmpSs-2 tasks + TAMPI library. This version disables the artificial dependencies on the sentinel variable, so communication tasks can run totally in parallel. These tasks do not call to blocking MPI procedures (MPI_Recv & MPI_Send), but their non-blocking counterparts (MPI_Irecv & MPI_Isend). The resulting MPI requests are bound to the completion of the communication task by calling to the non-blocking TAMPI_Iwaitall function (or TAMPI_Iwait), and then, the task can finish its execution. Once the bound requests complete, the communication task will automatically fully completes, also releasing its data dependencies. * **08.heat_mpi_nbuffer.bin**: Parallel version using MPI, but it is a more elaborate version than **03.heat_mpi.bin**. It exchanges the block boundaries as soon as possible. In addition, it tries to overlap computation and communication phases by using non-blocking MPI calls. * **09.heat_ompss_residual.bin**: The same version as **03.heat_ompss.bin** but stopping the execution once a residual threshold has been reached. It implements a simple but optimal mechanism to avoid the closing of parallelism in each timestep. * **10.heat_gaspi.bin**: Parallel version using GASPI. It divides horizontally the 2-D matrix and each MPI process is responsible for computing its assigned rows. * **11.heat_gaspi_nbuffer.bin**: Parallel version using GASPI, but it is a more elaborate version than **10.heat_gaspi.bin**. It exchanges the block boundaries as soon as possible. In addition, it tries to overlap computation and communication phases. * **12.heat_gaspi_ompss_forkjoin.bin**: Parallel version using GASPI + OmpSs-2. It uses a fork-join parallelization strategy, where computation phases are parallelized with tasks and communication phases are sequential. * **13.heat_gaspi_ompss_tasks.bin**: Parallel version using GASPI + OmpSs-2 tasks. Both computation and communication are parallelized with tasks. However, communication tasks are serialized declaring a dependency on a sentinel variable. This is to prevent deadlocks between processes, since communications tasks perform blocking GASPI calls. The simplest way to compile this package is: 1. Stay in Heat root directory to recursively build all the versions. The Heat MPI + OmpSs-2 tasks + TAMPI library versions are compiled only if the environment variable `TAMPI_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory. Similarly, GASPI versions are only compiled if the environment variable `GASPI_HOME` is set to the GPI-2 library's installation directory. 2. Type `make` to compile the selected benchmark's version(s). Optionally, you can use a different block size in each dimension (BSX and BSY for vertical and horizontal dimensions, respectively) when building the benchmark (by default 1024). Type `make BSX=MY_BLOCK_SIZE_X BSY=MY_BLOCK_SIZE_Y` in order to change this value. If you want the same value in each dimension, type `make BSX=MY_BLOCK_SIZE`. 3. In addition, you can type 'make check' to check the correctness of the built versions. By default, the pure MPI version runs with 4 processes and the hybrid versions run with 2 MPI processes and 2 hardware threads for each process. You can change these parameters when executing `scripts/run-tests.sh` (see the available options passing `-h`). ## Execution instructions The binaries accept several options. The most relevant options are the size of the matrix in each dimension with `-s`, and the number of timesteps with `-t`. More options can be seen passing the `-h` option. An example of execution could be: ``` $ mpiexec -n 4 -bind-to hwthread:16 ./heat_mpi.task.1024bs.exe -t 150 -s 8192 ``` in which the application will perform 150 timesteps in 4 MPI processes with 16 hardware threads in each process (used by the OmpSs-2 runtime). The size of the matrix in each dimension will be 8192 (8192^2 elements in total), this means that each process will have 2048 * 8192 elements (16 blocks per process).