README.md

# DO NOT MAKE VISIBLE: Compiler optimizations make sequential version faster than any parallel version

# Matrix multiplication benchmark

## Description
This benchmark runs a matrix multiplication operation C = AB, where A has size
N✕M, B has size M✕P, and the resulting matrix C has size N✕P.

## Versions
We have several implementations of this benchmark:

* **01.matmul_seq:** sequential version of this benchmark.

* **02.matmul_task+dep:** parallel version using the *task* construct to
perform the parallel execution and fine-grained depenences to synchronize
across operations. The sequential version has been modified with blocking so
each task operates on a submatrix.

* **03.matmul_task+reduction.cpp:** This version is similar to
*02.matmul_task+dep.cpp* but it uses the *reduction* clause instead of *inout*
clause to be able to compute the matmul operation concurrently.

## Execution instructions

    ./matmul N M P BLOCK_SIZE

where:

1. N is the number of rows of the matrix A
2. M is the number of columns of the matrix A and the number of rows of the
   matrix B
3. P is the number of columns of the matrix B
4. The matrix multiplication operation will be applied in blocks that contains
   BLOCK_SIZE✕BLOCK_SIZE elements

## References
https://en.wikipedia.org/wiki/Matrix_multiplication
https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm