05-IFSKER ========= The IFSKer code is a representative kernel extracted from the OpenIFS application which is, by the way, an easy-to-use version of the IFS (Integrated Forecastin System) application. OpenIFS provides the same forecast capability as IFS but without the data assimilation system. The original application is parallelized using the Message Passing Interface. .. highlight:: fortran This exercise is splitted into two different phases. In the first one, the student will have to analyse the performance of the pure MPI parallelization approach. This code is included in the ``mpi`` directory and it is already prepared with a configuration file and the corresponding Makefile. To carry out the study, we firstly need to get familiar with certain parts of the code. It is highly recommended to pay special attention to the iterative loop of the simulation (pseudo-code):: do nstep=1,nstop ! Verbose outputs ! 3D Variables: if nstep < 3 .... else ... endif ! Calling dummy physics (GP): do ... call physics enddo ! Prepare for direct transforms (ffts) 4 stages: unblock, pack, comms, unpack ! Compute the pseudo ffts ! Return to GP space 4 stages: pack, comms, unpack, block enddo ! nstep call MPI_Barrier(MPI_COMM_WORLD,ierr) After this initial performance analysis, in a second phase, we can start planning the approach of the proof of concept to improve program's behaviour. The only restrictions imposed when solving the exercise, given the contents of this tutorial, are 1) to use a second level parallelization based on tasks; and 2) to consider one of the two techniques presented during the lecture (i.e., TAMPI or DLB). Additionally, the student may also propose new extensions that allow to improve the execution of the code. The contents of this practice delivery will consists on: * The modified source code files, configuration and build (if changed). * A report containing the description of the work done: results of the initial analysis, discussion of the results, presentation of the proof-of-concept, comments on the changes made to the code, impact of these changes on the performance, and a conclusion's section. If you choose to follow the TAMPI approach, you can try the parallelization with OpenMP or with OmpSs-2. Subfolders `tampi-openmp` and `tampi-ompss2` hold the necessary auxiliary files and configuration scripts. The objective is to apply the ideas of TAMPI that you have learned in the previous exercises: taskifying computations and communications avoiding unnecessary serializations, and finally, leveraging the TAMPI library. Although you are probably not very familiar with OmpSs-2, this programming model will facilitate the taskification of the code when leveraging some of its distinguished features, such as linear region dependencies and multi-dependencies! You will also find a directory prepared to use DLB with IFSKER, named `dlb-openmp`. It contains the DLB configure.sh file, the Makefile and the initial MPI implementation, ready to be modified. You will need to remember what you have done with LULESH previously and place DLB API's calls to make it work. Remember, it is important to have multiple parallel regions when using DLB, so you are not aiming for just one and big parallel region! Note that each subfolder contains the original pure MPI version `ifsker.mpi.f90`. In order to compile your hybrid extended version, you will have to copy that file and name it as in the corresponding target of the Makefile. You will also have to uncomment that target from the Makefile to actually compile your program. For instance, in `tampi-openmp`, you will have to copy the original file to `ifsker.mpi.omp.tampi.f90`, and then, uncomment the corresponding target from the `PROGRAMS` variable in the Makefile. Additionally, each subfolder contains job scripts for both pure MPI version and your hybrid extended version. .. include:: tampi-ompss2/README.rst