Newer
Older
# Heat Benchmark
## Description
The Heat simulation uses an iterative Gauss-Seidel method to solve the heat equation,
which is a parabolic partial differential equation that describes the distribution of
heat (or variation in temperature) in a given region over time.
The heat equation is of fundamental importance in a wide range of science fields. In
mathematics, it is the parabolic partial differential equation par excellence. In statistics,
it is related to the study of the Brownian motion. Also, the diffusion equation is a generic
version of the heat equation, and it is related to the study of chemical diffusion processes.
## Requirements
The requirements of this application are shown in the following lists. The main requirements are:
* **GNU Compiler Collection**.
* **OmpSs-2**: OmpSs-2 is the second generation of the OmpSs programming model. It is a task-based
programming model originated from the ideas of the OpenMP and StarSs programming models. The
specification and user-guide are available at https://pm.bsc.es/ompss-2-docs/spec/ and
https://pm.bsc.es/ompss-2-docs/user-guide/, respectively. OmpSs-2 requires both Mercurium and
Nanos6 tools. Mercurium is a source-to-source compiler which provides the necessary support for
transforming the high-level directives into a parallelized version of the application. The Nanos6
runtime system library provides the services to manage all the parallelism in the application (e.g. task
creation, synchronization, scheduling, etc). Both can be downloaded from https://github.com/bsc-pm.
* **MPI**: This application requires an MPI library supporting the multi-threading mode. It mainly targets
the MPICH implementation, however, it should work with other libraries by adding the needed implementation-specific
flags in the Makefile.
In addition, there are optional tools which enable the building of other application versions:
* **Task-Aware MPI (TAMPI)**: The Task-Aware MPI library provides the interoperability mechanism for MPI
and OmpSs. All TAMPI releases and information are available at https://github.com/bsc-pm/tampi.
* **GPI-2**: The GPI-2 library implements the GASPI distributed programming model. This example requires
GPI-2 to be compiled with MPI support. All releases and information are available at http://www.gpi-site.com.
## Versions
The heat application has several versions which are compiled in different
binaries, by executing the `make` command. They are:
* **01.heat_seq.bin**: Sequential version of this benchmark.
* **02.heat_ompss.bin**: Parallel version with OmpSs-2. It divides the 2-D matrix into different 2-D blocks of consecutive elements, and
a task is created to compute each block. Tasks declare fine-grained dependencies on the target block and its adjacent blocks.
* **03.heat_mpi.bin**: Parallel version using MPI. It divides horizontally the 2-D matrix and each MPI process is responsible for computing
its assigned rows.
* **04.heat_mpi_ompss_forkjoin.bin**: Parallel version using MPI + OmpSs-2. It uses a fork-join parallelization strategy, where computation phases
are parallelized with tasks and communication phases are sequential.
* **05.heat_mpi_ompss_tasks.bin**: Parallel version using MPI + OmpSs-2 tasks. Both computation and communication are parallelized with tasks.
However, communication tasks are serialized declaring a dependency on a sentinel variable. This is to prevent deadlocks between processes,
since communications tasks perform blocking MPI calls.
* **06.heat_mpi_ompss_tasks_interop.bin**: Parallel version using MPI + OmpSs-2 tasks + TAMPI library. This version disables the artificial
dependencies on the sentinel variable, so communication tasks can run totally in parallel. The TAMPI library is in charge of managing the
blocking MPI calls to avoid the blocking of the underlying execution resources.
* **07.heat_mpi_ompss_tasks_interop_async.bin**: Parallel version using MPI + OmpSs-2 tasks + TAMPI library. This version disables the
artificial dependencies on the sentinel variable, so communication tasks can run totally in parallel. These tasks do not call to blocking
MPI procedures (MPI_Recv & MPI_Send), but their non-blocking counterparts (MPI_Irecv & MPI_Isend). The resulting MPI requests are bound
to the completion of the communication task by calling to the non-blocking TAMPI_Iwaitall function (or TAMPI_Iwait), and then, the task
can finish its execution. Once the bound requests complete, the communication task will automatically fully completes, also releasing its
data dependencies.
* **08.heat_mpi_nbuffer.bin**: Parallel version using MPI, but it is a more elaborate version than **03.heat_mpi.bin**. It exchanges the
block boundaries as soon as possible. In addition, it tries to overlap computation and communication phases by using non-blocking MPI
calls.
* **09.heat_ompss_residual.bin**: The same version as **03.heat_ompss.bin** but stopping the execution once a residual threshold has been
reached. It implements a simple but optimal mechanism to avoid the closing of parallelism in each timestep.
* **10.heat_gaspi.bin**: Parallel version using GASPI. It divides horizontally the 2-D matrix and each MPI process is responsible for computing
its assigned rows.
* **11.heat_gaspi_nbuffer.bin**: Parallel version using GASPI, but it is a more elaborate version than **10.heat_gaspi.bin**. It exchanges the
block boundaries as soon as possible. In addition, it tries to overlap computation and communication phases.
* **12.heat_gaspi_ompss_forkjoin.bin**: Parallel version using GASPI + OmpSs-2. It uses a fork-join parallelization strategy, where computation
phases are parallelized with tasks and communication phases are sequential.
* **13.heat_gaspi_ompss_tasks.bin**: Parallel version using GASPI + OmpSs-2 tasks. Both computation and communication are parallelized with
tasks. However, communication tasks are serialized declaring a dependency on a sentinel variable. This is to prevent deadlocks between
processes, since communications tasks perform blocking GASPI calls.
The simplest way to compile this package is:
1. Stay in Heat root directory to recursively build all the versions.
The Heat MPI + OmpSs-2 tasks + TAMPI library versions are compiled
only if the environment variable `TAMPI_HOME` is set to the
Task-Aware MPI (TAMPI) library's installation directory. Similarly,
GASPI versions are only compiled if the environment variable
`GASPI_HOME` is set to the GPI-2 library's installation directory.
2. Type `make` to compile the selected benchmark's version(s).
Optionally, you can use a different block size in each dimension
(BSX and BSY for vertical and horizontal dimensions, respectively)
when building the benchmark (by default 1024). Type
`make BSX=MY_BLOCK_SIZE_X BSY=MY_BLOCK_SIZE_Y` in order to change
this value. If you want the same value in each dimension, type
`make BSX=MY_BLOCK_SIZE`.
3. In addition, you can type 'make check' to check the correctness
of the built versions. By default, the pure MPI version runs with
4 processes and the hybrid versions run with 2 MPI processes and 2
hardware threads for each process. You can change these parameters
when executing `scripts/run-tests.sh` (see the available options
passing `-h`).
## Execution instructions
The binaries accept several options. The most relevant options are the size
of the matrix in each dimension with `-s`, and the number of timesteps with
`-t`. More options can be seen passing the `-h` option. An example of execution
could be:
```
$ mpiexec -n 4 -bind-to hwthread:16 ./heat_mpi.task.1024bs.exe -t 150 -s 8192
```
in which the application will perform 150 timesteps in 4 MPI processes with 16
hardware threads in each process (used by the OmpSs-2 runtime). The size of the
matrix in each dimension will be 8192 (8192^2 elements in total), this means
that each process will have 2048 * 8192 elements (16 blocks per process).