05-IFSKER
=========
The IFSKer code is a representative kernel extracted from the OpenIFS
application which is, by the way, an easy-to-use version of the IFS (Integrated
Forecastin System) application. OpenIFS provides the same forecast capability
as IFS but without the data assimilation system. The original application is
parallelized using the Message Passing Interface.

.. highlight:: fortran
This exercise is splitted into two different phases. In the first one, the
student will have to analyse the performance of the pure MPI parallelization
approach. This code is included in the ``mpi`` directory and it is already
prepared with a configuration file and the corresponding Makefile. To carry
out the study, we firstly need to get familiar with certain parts of the code.
It is highly recommended to pay special attention to the iterative loop of the
simulation (pseudo-code)::

 do nstep=1,nstop
   ! Verbose outputs
   ! 3D Variables: if nstep < 3 .... else ... endif
   ! Calling dummy physics (GP): do ... call physics enddo
   ! Prepare for direct transforms (ffts) 4 stages: unblock, pack, comms, unpack
   ! Compute the pseudo ffts
   ! Return to GP space 4 stages: pack, comms, unpack, block
 enddo ! nstep
 
 call MPI_Barrier(MPI_COMM_WORLD,ierr)

After this initial performance analysis, in a second phase, we can start
planning the approach of the proof of concept to improve program's behaviour.
The only restrictions imposed when solving the exercise, given the contents of
this tutorial, are 1) to use a second level parallelization based on tasks; and
2) to consider one of the two techniques presented during the lecture (i.e.,
TAMPI or DLB). Additionally, the student may also propose new extensions that
allow to improve the execution of the code.

The contents of this practice delivery will consists on:

* The modified source code files, configuration and build (if changed).
* A report containing the description of the work done: results of
  the initial analysis, discussion of the results, presentation of the proof-of-concept,
  comments on the changes made to the code, impact of these changes on
  the performance, and a conclusion's section.

If you choose to follow the TAMPI approach, you can try the parallelization
with OpenMP or with OmpSs-2. Subfolders `tampi-openmp` and `tampi-ompss2`
hold the necessary auxiliary files and configuration scripts. The objective
is to apply the ideas of TAMPI that you have learned in the previous exercises:
taskifying computations and communications avoiding unnecessary serializations,
and finally, leveraging the TAMPI library. Although you are probably not very
familiar with OmpSs-2, this programming model will facilitate the taskification
of the code when leveraging some of its distinguished features, such as linear
region dependencies and multi-dependencies!

You will also find a directory prepared to use DLB with IFSKER, named `dlb-openmp`.
It contains the DLB configure.sh file, the Makefile and the initial MPI
implementation, ready to be modified. You will need to remember what you have done
with LULESH previously and place DLB API's calls to make it work. Remember, it is
important to have multiple parallel regions when using DLB, so you are not aiming
for just one and big parallel region!

Note that each subfolder contains the original pure MPI version `ifsker.mpi.f90`.
In order to compile your hybrid extended version, you will have to copy that file and
name it as in the corresponding target of the Makefile. You will also have to uncomment
that target from the Makefile to actually compile your program. For instance, in
`tampi-openmp`, you will have to copy the original file to `ifsker.mpi.omp.tampi.f90`,
and then, uncomment the corresponding target from the `PROGRAMS` variable in the Makefile.

Additionally, each subfolder contains job scripts for both pure MPI version and your hybrid
extended version.

.. include:: tampi-ompss2/README.rst