# 2.1. Matrix multiplication¶

This benchmark runs a matrix multiplication operation C = AB, where A has size NxM, B has size MxP, and the resulting matrix C has size NxP. The kernel uses a blocking version of the matrix multiplication algorithm with three nested loops chasing the matrix multiplication index pattern; the loop body implements the inner/block matrix multiplication.

The parallelization approach will use the `task` directive in order to express parallelism and the `taskwait` directive to guarantee the execution of previously created tasks:

```for (long k = 0; k < NB_K; k++) {
for (long i = 0; i < NB_I; i++) {
#pragma oss task label(matmul block)
for (long j = 0; j < NB_J; j++) {
double (* restrict A_block)[TS] = A[i][k];
double (* restrict B_block)[TS] = B[k][j];
double (* restrict C_block)[TS] = C[i][j];

for (long ii = 0; ii < TS; ii++) {
for (long jj = 0; jj < TS; jj++) {
for (long kk = 0; kk < TS; kk++) {
C_block[ii][jj] += A_block[ii][kk] * B_block[kk][jj];
}
}
}
}
}
}
```

The algorithm creates a task containing the ‘j’ loop (i.e., the ‘i’ loop body). After all the ‘i’ iterations the code uses a `taskwait` directive which guarantees the finalization of all previously instantiated tasks before continue with the program execution. The algorithm opens and closes the parallelism as many times as ‘k’ iterations.

## 2.1.1. Goals of this exercise¶

• Code is completely annotated: you DON’T need to modify it. You only need to compile and execute.

• Review source code and check the different directives and their clauses. Try to understand what they mean.

• Pay special attention to data-sharing attributes, including those that do not appear (i.e., the implicit ones).

• Check several runtime options when executing the program (versions, schedulers, etc.).

• Check scalability. Execute the program using different numbers of cpus and compute the speed-up.

• Change program arguments that may have an impact on task granularity (block size, tile size, etc.).

• Change program arguments that may have an impact on the number of tasks (matrix sizes and/or block/tile sizes).

• Get different paraver traces using different runtime options or program arguments and compare them.

## 2.1.2. Execution instructions¶

You should run the program as:

```./matmul N M P TS
```

where:

• `N` is the number of rows of the matrix A.

• `M` is the number of columns of the matrix A and the number of rows of the matrix B.

• `P` is the number of columns of the matrix B.

• The matrix multiplication operation will be applied in blocks that contain `TSxTS` elements.