2. Develop OmpSs@FPGA programs

Most of the required information to develop an OmpSs@FPGA application should be in the general OmpSs documentation (https://pm.bsc.es/ompss-docs/book/index.html). Note that, there may be some unsupported/not-working OmpSs features and/or syntax when using FPGA tasks. If you have some problem or realize any bug, do not hesitate to contact us or open an issue.

To create an FPGA task you need to add the target directive before the task directive. For example:

const unsigned int LEN = 8;

#pragma omp target device(fpga)
#pragma omp task out([LEN]dst, const char val)
void memset(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

2.1. Limitations

There are some limitations when developing an OmpSs@FPGA application:
  • Only C/C++ are supported, not Fortran.
  • Only function declarations can be annotated as FPGA tasks.
  • Avoid using global variables which are not listed in the dependences/copies. They can be used through function arguments.
  • The macros cannot be used within #pragmas as explained in OmpSs user guide (https://pm.bsc.es/ftp/ompss/doc/user-guide/faq-macros.html).
  • The HLS source code generated by Mercurim for each FPGA task will not contain the includes in the original source file but the ones finished in .fpga.h or .fpga .
  • The FPGA task code cannot perform general system calls, and only some Nanos++ APIs are supported.
  • The usage of size_t, signed long int or unsigned long int is not recommended inside the FPGA accelerator code. They may have different widths in the host and in the FPGA.

2.2. Specific differences in clauses and directives in Ompss@FPGA VS OmpSs

Despite OmpSs@FPGA mostly follows the OmpSs behaviour, there are specific clauses or directives that are not yet implemented or whose implementation slightly differs from the Ompss specification:
  • taskyield and atomic directives are not supported.
  • critical directive is supported as OmpSs specifies. Specifically: it implements a global (all accelerators) mutual exclusion section.

2.3. Clauses of target directive

The following sections list the clauses that can be used in the target directive.

2.3.1. num_instances

Defines the number of instances to place in the FPGA bitstream of a task. Usage example:

const unsigned int LEN = 8;

#pragma omp target device(fpga) num_instances(3)
#pragma omp task out([LEN]dst)
void memset(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

2.3.2. onto

The information in this clause is used at runtime to send the tasks to the corresponding FPGA accelerator. This means that a FPGA task has the onto(0) it can only run in accelerators that are of type 0. The value provided in this clause will overwrite the value automatically generated by Merciurim (a hash based on the source file and function name) to match the tasks. Usage example:

const unsigned int LEN = 8;

#pragma omp target device(fpga) onto(100)
#pragma omp task out([LEN]dst)
void memset_char(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

#pragma omp target device(fpga) onto(101)
#pragma omp task out([LEN]dst)
void memset_float(float * dst, const float val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

2.3.3. localmem

Defines the memory regions that the FPGA task wrapper must catch in BRAMs. This creates a local copy of the parameter in the FPGA task accelerator which can be accessed faster than dispatching memory accesses. The data is copied from the FPGA addressable memory into the FPGA task accelerator before launching the task execution. If the parameter is not labeled with the const modifier, the wrapper includes support for writing back the local copy into memory after the task execution. Both input and output data movements, may be dynamically disabled by the runtime based on its knowledge about task copies and predecessor/successor tasks. Usage example:

const unsigned int LEN = 8;

#pragma omp target device(fpga) localmem([LEN]dst)
#pragma omp task out([LEN]dst)
void memset(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

2.3.4. localmem_copies

Promote the task copies like they were annotated into the localmem clause. This clause is enabled by default, unless the localmem clause is present.

2.3.5. no_localmem_copies

Do not promote the task copies into the localmem clause.

2.3.6. period

Defines the task period in microseconds. The usage of this clause makes the task a recurrent task that is executed (at most) every period microseconds. Usage example where a task is executed every second:

const unsigned int LEN = 8;

#pragma omp target device(fpga) period(1000000)
#pragma omp task
void memset(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

2.3.7. num_repetitions

Defines the number of repetitions that a recurrent task has to be executed before it becomes finished. The usage of this clause makes the task a recurrent task that is executed N times. Usage example where the task body is executed 100 times:

const unsigned int LEN = 8;

#pragma omp target device(fpga) num_repetitions(100)
#pragma omp task
void memset(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

2.4. Calls to Nanos++ API

The list of Nanos++ APIs and their details can be found in the following section. Note that not all Nanos++ APIs can be called within FPGA tasks and others only are supported within them.