# 2.5. Matrix multiplication (with weak dependences)¶

This benchmark runs a matrix multiplication operation C = AB, where A has size NxM, B has size MxP, and the resulting matrix C has size NxP. The kernel uses a blocking version of the matrix multiplication algorithm with three nested loops chasing the matrix multiplication index pattern; the loop body implements the inner/block matrix multiplication:

```for (long k = 0; k < NB_K; k++) {
for (long i = 0; i < NB_I; i++) {
#pragma oss task label(matmul block)
for (long j = 0; j < NB_J; j++) {
double (* restrict A_block)[TS] = A[i][k];
double (* restrict B_block)[TS] = B[k][j];
double (* restrict C_block)[TS] = C[i][j];

for (long ii = 0; ii < TS; ii++) {
for (long jj = 0; jj < TS; jj++) {
for (long kk = 0; kk < TS; kk++) {
C_block[ii][jj] += A_block[ii][kk] * B_block[kk][jj];
}
}
}
}
}
#pragma oss taskwait
}
```

The parallelization creates a task containing the ‘j’ loop (i.e. the ‘i’ loop body). After all the ‘i’ iterations the code executes a taskwait directive which guarantees the execution of all the previous instantiated tasks before continuing. The algorithm opens and closes the parallelism as many times as ‘k’ iterations.

## 2.5.1. Goals of this exercise¶

• Code is completely annotated: you DON’T need to modify it. You only need to compile and execute.
• Review source code and check the different directives and their clauses. Try to understand what they mean.
• Pay special attention to data-sharing attributes, including those that do not appear (i.e., the implicit ones).
• Check several runtime options when executing the program (versions, schedulers, etc.).
• Check scalability. Execute the program using different numbers of cpus and compute the speed-up.
• Change program arguments that may have an impact on task granularity (block size, tile size, etc.).
• Change program arguments that may have an impact on the number of tasks (matrix sizes and/or block/tile sizes).
• Get different paraver traces using different runtime options or program arguments and compare them.

## 2.5.2. Execution instructions¶

You should run the program as:

```./matmul N M P TS
```

where:

• `N` is the number of rows of the matrix A.
• `M` is the number of columns of the matrix A and the number of rows of the matrix B.
• `P` is the number of columns of the matrix B.
• The matrix multiplication operation will be applied in blocks that contain `TSxTS` elements.