Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# DO NOT MAKE VISIBLE: Compiler optimizations make sequential version faster than any parallel version
# Matrix multiplication benchmark
## Description
This benchmark runs a matrix multiplication operation C = AB, where A has size
N✕M, B has size M✕P, and the resulting matrix C has size N✕P.
## Versions
We have several implementations of this benchmark:
* **01.matmul_seq:** sequential version of this benchmark.
* **02.matmul_task+dep:** parallel version using the *task* construct to
perform the parallel execution and fine-grained depenences to synchronize
across operations. The sequential version has been modified with blocking so
each task operates on a submatrix.
* **03.matmul_task+reduction.cpp:** This version is similar to
*02.matmul_task+dep.cpp* but it uses the *reduction* clause instead of *inout*
clause to be able to compute the matmul operation concurrently.
## Execution instructions
./matmul N M P BLOCK_SIZE
where:
1. N is the number of rows of the matrix A
2. M is the number of columns of the matrix A and the number of rows of the
matrix B
3. P is the number of columns of the matrix B
4. The matrix multiplication operation will be applied in blocks that contains
BLOCK_SIZE✕BLOCK_SIZE elements
## References
https://en.wikipedia.org/wiki/Matrix_multiplication
https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm