# 2.1. Matrix multiplication¶

This benchmark runs a matrix multiplication operation C = AB, where A has size NxM, B has size MxP, and the resulting matrix C has size NxP. The kernel uses a blocking version of the matrix multiplication algorithm with three nested loops chasing the matrix multiplication index pattern; the loop body implements the inner/block matrix multiplication.

The parallelization approach will use the `task`

directive in order to
express parallelism and the `taskwait`

directive to guarantee the execution
of previously created tasks:

```
for (long k = 0; k < NB_K; k++) {
for (long i = 0; i < NB_I; i++) {
#pragma oss task label(matmul block)
for (long j = 0; j < NB_J; j++) {
double (* restrict A_block)[TS] = A[i][k];
double (* restrict B_block)[TS] = B[k][j];
double (* restrict C_block)[TS] = C[i][j];
for (long ii = 0; ii < TS; ii++) {
for (long jj = 0; jj < TS; jj++) {
for (long kk = 0; kk < TS; kk++) {
C_block[ii][jj] += A_block[ii][kk] * B_block[kk][jj];
}
}
}
}
}
#pragma oss taskwait
}
```

The algorithm creates a task containing the ‘j’ loop (i.e., the ‘i’ loop body).
After all the ‘i’ iterations the code uses a `taskwait`

directive which
guarantees the finalization of all previously instantiated tasks before
continue with the program execution. The algorithm opens and closes the
parallelism as many times as ‘k’ iterations.

## 2.1.1. Goals of this exercise¶

- Code is completely annotated: you DON’T need to modify it. You only need to compile and execute.
- Review source code and check the different directives and their clauses. Try to understand what they mean.
- Pay special attention to data-sharing attributes, including those that do not appear (i.e., the implicit ones).
- Check several runtime options when executing the program (versions, schedulers, etc.).
- Check scalability. Execute the program using different numbers of cpus and compute the speed-up.
- Change program arguments that may have an impact on task granularity (block size, tile size, etc.).
- Change program arguments that may have an impact on the number of tasks (matrix sizes and/or block/tile sizes).
- Get different paraver traces using different runtime options or program arguments and compare them.

## 2.1.2. Execution instructions¶

You should run the program as:

```
./matmul N M P TS
```

where:

`N`

is the number of rows of the matrix A.`M`

is the number of columns of the matrix A and the number of rows of the matrix B.`P`

is the number of columns of the matrix B.- The matrix multiplication operation will be applied in blocks that contain
`TSxTS`

elements.