# Heat Benchmark ## Description The Heat simulation uses an iterative Gauss-Seidel method to solve the heat equation, which is a parabolic partial differential equation that describes the distribution of heat (or variation in temperature) in a given region over time. The heat equation is of fundamental importance in a wide range of science fields. In mathematics, it is the parabolic partial differential equation par excellence. In statistics, it is related to the study of the Brownian motion. Also, the diffusion equation is a generic version of the heat equation, and it is related to the study of chemical diffusion processes. ## Requirements The requirements of this application are shown in the following lists. The main requirements are: * **GNU Compiler Collection**. * **OmpSs-2**: OmpSs-2 is the second generation of the OmpSs programming model. It is a task-based programming model originated from the ideas of the OpenMP and StarSs programming models. The specification and user-guide are available at https://pm.bsc.es/ompss-2-docs/spec/ and https://pm.bsc.es/ompss-2-docs/user-guide/, respectively. OmpSs-2 requires both Mercurium and Nanos6 tools. Mercurium is a source-to-source compiler which provides the necessary support for transforming the high-level directives into a parallelized version of the application. The Nanos6 runtime system library provides the services to manage all the parallelism in the application (e.g. task creation, synchronization, scheduling, etc). Both can be downloaded from https://github.com/bsc-pm. * **MPI**: This application requires an MPI library supporting the multi-threading mode. It mainly targets the MPICH implementation, however, it should work with other libraries by adding the needed implementation-specific flags in the Makefile. In addition, there are optional tools which enable the building of other application versions: * **Task-Aware MPI (TAMPI)**: The Task-Aware MPI library provides the interoperability mechanism for MPI and OmpSs. Please contact to get access to the TAMPI library. ## Versions The heat application has several versions which are compiled in different binaries, by executing the `make` command. They are: * **01.heat_seq.bin**: Sequential version of this benchmark. * **02.heat_ompss.bin**: Parallel version with OmpSs-2. It divides the 2-D matrix into different 2-D blocks of consecutive elements, and a task is created to compute each block. Tasks declare fine-grained dependencies on the target block and its adjacent blocks. * **03.heat_mpi.bin**: Parallel version using MPI. It divides horizontally the 2-D matrix and each MPI process is responsible for computing its assigned rows. * **04.heat_mpi_ompss_forkjoin.bin**: Parallel version using MPI + OmpSs-2. It uses a fork-join parallelization strategy, where computation phases are parallelized with tasks and communication phases are sequential. * **05.heat_mpi_ompss_tasks.bin**: Parallel version using MPI + OmpSs-2 tasks. Both computation and communication are parallelized with tasks. However, communication tasks are serialized declaring a dependency on a sentinel variable. This is to prevent deadlocks between processes, since communications tasks perform blocking MPI calls. * **06.heat_mpi_ompss_tasks_interop.bin**: Parallel version using MPI + OmpSs-2 tasks + Interoperability library. This version disables the artificial dependency on the sentinel variable, so communication tasks can run totally in parallel. The interoperability library is in charge of intercepting the blocking MPI calls to avoid the blocking of the underlying hardware thread. * **07.heat_mpi_ompss_tasks_interop_async.bin**: Parallel version using MPI + OmpSs-2 tasks + Interoperability library. This version disables the artificial dependency on the sentinel variable, so communication tasks can run totally in parallel. These tasks do not call to blocking MPI procedures (MPI_Recv & MPI_Send), but their non-blocking counterparts (MPI_Irecv & MPI_Isend). The resulting MPI requests are linked to the release of the dependencies of the communication task by calling to the non-blocking TAMPI_Iwaitall function (or TAMPI_Iwait), and then, the task can finish its execution. Once the linked requests complete, the dependencies of the task will be released by the Interoperability library services. The simplest way to compile this package is: 1. Stay in Heat root directory to recursively build all the versions. The Heat MPI + OmpSs-2 tasks + Interoperability library version is compiled only if the environment variable `MPI_INTEROP_HOME` is set to the Task-Aware MPI (TAMPI) library's installation directory. 2. Type `make` to compile the selected benchmark's version(s). Optionally, you can use a different block size in each dimension (BSX and BSY for vertical and horizontal dimensions, respectively) when building the benchmark (by default 1024). Type `make BSX=MY_BLOCK_SIZE_X BSY=MY_BLOCK_SIZE_Y` in order to change this value. If you want the same value in each dimension, type `make BSX=MY_BLOCK_SIZE`. 3. In addition, you can type 'make check' to check the correctness of the built versions. By default, the pure MPI version runs with 4 processes and the hybrid versions run with 2 MPI processes and 2 hardware threads for each process. You can change these parameters when executing `scripts/run-tests.sh` (see the available options passing `-h`). ## Execution instructions The binaries accept several options. The most relevant options are the size of the matrix in each dimension with `-s`, and the number of timesteps with `-t`. More options can be seen passing the `-h` option. An example of execution could be: ``` $ mpiexec -n 4 -bind-to hwthread:16 heat_mpi.task.1024bs.exe -t 150 -s 8192 ``` in which the application will perform 150 timesteps in 4 MPI processes with 16 hardware threads in each process (used by the OmpSs-2 runtime). The size of the matrix in each dimension will be 8192 (8192^2 elements in total), this means that each process will have 2048 * 8192 elements (16 blocks per process).