1. Develop OmpSs@FPGA programs

Most of the required information to develop an OmpSs@FPGA application should be in the general OmpSs documentation (https://pm.bsc.es/ompss-docs/book/index.html). Note that, there may be some unsupported/not-working OmpSs features and/or syntax when using FPGA tasks. If you have some problem or realize any bug, do not hesitate to contact us or open an issue.

To create an FPGA task you need to add the target directive before the task directive. For example:

const unsigned int LEN = 8;

#pragma omp target device(fpga)
#pragma omp task out([LEN]dst, const char val)
void memset(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

1.1. Limitations

There are some limitations when developing an OmpSs@FPGA application:
  • Only C/C++ are supported, not Fortran.
  • Only function declarations can be annotated as FPGA tasks.
  • Avoid using global variables which are not listed in the dependences/copies. They can be used through function arguments.
  • The HLS source code generated by Mercurim for each FPGA task will not contain the includes in the original source file but the ones finished in .fpga.hpp or .fpga.
  • The FPGA task code cannot perform general system calls, and only some Nanos++ APIs are supported.
  • The usage of size_t, signed long int or unsigned long int is not recommended inside the FPGA accelerator code. They may have different widths in the host and in the FPGA.

1.2. Clauses of target directive

The following sections list the clauses that can be used in the target directive.

1.2.1. num_instances

Defines the number of instances to place in the FPGA bitstream of a task. Usage example:

const unsigned int LEN = 8;

#pragma omp target device(fpga) num_instances(3)
#pragma omp task out([LEN]dst)
void memset(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

1.2.2. onto

The information in this clause is used at runtime to send the tasks to the corresponding FPGA accelerator. This means that a FPGA task has the onto(0) it can only run in accelerators that are of type 0. The value provided in this clause will overwrite the value automatically generated by Merciurim (a hash based on the source file and function name) to match the tasks. Usage example:

const unsigned int LEN = 8;

#pragma omp target device(fpga) onto(100)
#pragma omp task out([LEN]dst)
void memset_char(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

#pragma omp target device(fpga) onto(101)
#pragma omp task out([LEN]dst)
void memset_float(float * dst, const float val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

1.3. Calls to Nanos++ API

The list of APIs that can be called within a FPGA task is:
  • nanos_err_t nanos_instrument_burst_begin(nanos_event_key_t event, nanos_event_value_t value)
  • nanos_err_t nanos_instrument_burst_end(nanos_event_key_t event, nanos_event_value_t value)
  • nanos_err_t nanos_instrument_point_event(nanos_event_key_t event, nanos_event_value_t value)
  • nanos_err_t unsigned long long int nanos_fpga_current_wd()
  • nanos_err_t nanos_fpga_wg_wait_completion( unsigned long long int uwg, unsigned char avoid_flush )
  • void nanos_fpga_create_wd_async( const unsigned long long int type, const unsigned char numArgs, const unsigned long long int * args, const unsigned char numDeps, const unsigned long long int * deps, const unsigned char * depsFlags, const unsigned char numCopies, const nanos_fpga_copyinfo_t * copies )
  • unsigned int nanos_get_periodic_task_repetition_num()
  • void nanos_cancel_periodic_task()

The list of Nanos++ APIs and their details can be found here: