3.2. Axpy (dep)

This benchmark runs several AXPY operations. AXPY is a Level 1 operation in the Basic Linear Algebra Subprograms (BLAS) package, and is a common operation in computations with vector processors. AXPY is a combination of scalar multiplication and vector addition.

We want to parallelize the code using the task directive in order to express parallelism but, similar to what we saw in the examples section, we want to use fine-grained dependences (instead of taskwaits) to synchronize across SAXPY operations. At some point we will probably need to synchronize and wait for all the instantiated tasks Ideally we should focus our effort in the following two functions:

#pragma oss task
void axpy_block(double *x, double *y, double alpha, long N) {
     for (long i=0; i < N; ++i) {
             y[i] += alpha * x[i];
     }
}

void axpy(double *x, double *y, double alpha, long N, long TS) {
     for (long i=0; i < N; i+=TS) {
             axpy_block(x+i, y+i, alpha, std::min(TS, N-i));
     }
}

Note that we have initially selected the declaration of the axpy_block function as a candidate to become a task. We could also have annotated the call to that function, as in the following code:

void axpy(double *x, double *y, double alpha, long N, long TS) {
     for (long i=0; i < N; i+=TS) {
      #pragma oss task
             axpy_block(x+i, y+i, alpha, std::min(TS, N-i));
     }
}

3.2.1. Goals of this exercise

  • Try to understand what is doing each part of the code. Complete the annotation of the existant tasks.
  • When creating tasks, pay special attention to data-sharing attributes.
  • Include synchroniztion directives when required (e.g., you should no start the verification until all the computation has been executed).
  • Check several runtime options when executing the program (versions, schedulers, etc.).
  • Check scalability. Execute the program using different number of cpus and compute the speed-up.
  • Change program’s arguments that may impact on task granularity (block size, tile size, etc.).
  • Change program’s arguments that may impact on the number of tasks (matrix sizes and/or block/tile sizes).
  • Get different paraver traces using different runtime options or program arguments and compare them.

3.2.2. Execution instructions

You should run the program as:

./saxpy SIZE BLOCK_SIZE INTERATIONS

where:

  • SIZE is the number of elements of the vectors used on the SAXPY operation.
  • The SAXPY operation will be applied to the vector in blocks that contains BLOCK_SIZE elements.
  • ITERATIONS is the number of times the SAXPY operation is executed.