Statically scheduling tasks to different task instances


The syntax exposed in this page is under development and may change in the future.

The onto clause can take 2 expressions to define the type of FPGA accelerator and the instance of such type that will handle the spawned task. The first expression must be a constant integer uniquely identifying the task type in the application. The lower 32 bits define the task type and can take any arbitrary value, Mercurium generates a hash of the function name when the onto clause is not specified. The following 8 bits (39 down to 32) specify the architecture where task has to be executed. Currently, the architecture bits are 0x1 for FPGA and 0x2 for SMP. The second clause expression will be evaluated during the task creation and defines the instance that will execute the task. It must evaluate to an integer between 0 and the number of instances specified for the child task -1. Otherwise, the behavior is undefined.

The following C example shows how to statically schedule the matmulBlock tasks in the different instances.

unsigned char instance_num_glob;

#pragma omp target device(fpga) onto(0x100000010, instance_num_glob) num_instances(5)
#pragma omp task in([BSIZE*BSIZE]a, [BSIZE*BSIZE]b) inout([BSIZE*BSIZE]c)
void matmulBlock(const elem_t *a, const elem_t *b, elem_t *c, const unsigned char instance_num);

#pragma omp target device(fpga) copy_in([msize*msize]a, [msize*msize]b) copy_inout([msize*msize]c)
#pragma omp task
void matmulFPGA(const elem_t *a, const elem_t *b, elem_t *c, const unsigned int msize) {
  const unsigned int b2size = BSIZE*BSIZE;
  instance_num_glob = 0;
  for (unsigned int i = 0; i < msize/BSIZE; i++) {
    for (unsigned int k = 0; k < msize/BSIZE; k++) {
      unsigned int const ai = k*b2size + i*BSIZE*msize;
      for (unsigned int j = 0; j < msize/BSIZE; j++) {
        unsigned int const bi = j*b2size + k*BSIZE*msize;
        unsigned int const ci = j*b2size + i*BSIZE*msize;
        matmulBlock(a + ai, b + bi, c + ci);
        instance_num_glob = (instance_num_glob + 1) > 5 ? 0 : (instance_num_glob + 1);
  #pragma omp taskwait


The current limitations are:
  • The second onto expression (which defines the instance) is only evaluated when spawning tasks within another FPGA task.

  • When the second onto expression references a variable it must be either a task argument of a global variable.

  • In C sources, global variables are automatically privatized for each FPGA task accelerator.

  • In C++ sources, global variables must be moved to intermediate HLS C++ code with the #pragma omp target device(fpga,smp) directive on top of variable declarations. See the example below. The pragma will only move the variable declaration to the FPGA intermediate code, effectively privatizing the variable for each FPGA task accelerator.

#pragma omp target device(fpga,smp)
unsigned char instance_num_glob;