2. Develop OmpSs-2@FPGA programs

Most of the required information to develop an OmpSs-2@FPGA application should be in the general OmpSs-2 documentation (https://pm.bsc.es/ompss-docs/book/index.html). Note that, there may be some unsupported/not-working OmpSs-2 features and/or syntax when using FPGA tasks. If you have some problem or realize any bug, do not hesitate to contact us or open an issue.

To create an FPGA task you need to add the device clause in the task directive. For example:

const unsigned int LEN = 8;

#pragma oss task device(fpga) out([LEN]dst, val)
void memset(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

2.1. Limitations

There are some limitations when developing an OmpSs@FPGA application:
  • Only C/C++ are supported, not Fortran.
  • Only function declarations can be annotated as FPGA tasks.
  • The HLS source code generated by Clang for each FPGA task will not contain the includes in the original source file but the ones finished in “.fpga.h”.
  • The FPGA task code cannot perform general system calls, and only some Nanos6 APIs are supported.
  • The usage of size_t, signed long int or unsigned long int is not recommended inside the FPGA accelerator code. They may have different widths in the host and in the FPGA.

2.2. Specific differences in clauses and directives in Ompss@FPGA VS OmpSs

Despite OmpSs@FPGA mostly follows the OmpSs behaviour, there are specific clauses or directives that are not yet implemented or whose implementation slightly differs from the Ompss specification:
  • taskyield and atomic directives are not supported.
  • critical directive is supported as OmpSs specifies. Specifically: it implements a global (all accelerators) mutual exclusion section.

2.3. Clauses of task directive

The following sections list the clauses that can be used in the task directive.

2.3.1. num_instances

Defines the number of instances to place in the FPGA bitstream of a task. Usage example:

const unsigned int LEN = 8;

#pragma oss task device(fpga) out([LEN]dst) num_instances(3)
void memset(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

2.3.2. affinity

The information in this clause is used at runtime to send the tasks to the corresponding FPGA accelerator. This means that a FPGA task has the affinity(0) it will run in accelerator 0 of that type. This clause is useful to manage task scheduling in the user code when there is more than one accelerator of the same type (num_instances > 1).

const unsigned int LEN = 8;

#pragma oss task device(fpga) out([LEN]dst) num_instances(4) affinity(af)
void memset_char(char * dst, const char val, int af) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

#pragma oss task device(fpga) out([size]dst)
void memset_task_creator(float * dst, int size, const float val) {
  for (unsigned int i=0; i<size/LEN; ++i) {
    memset_char(dst + i*LEN, val, i%4);
  }
}

2.3.3. copy_in/out

Defines the memory regions that the FPGA task wrapper must catch in BRAMs/URAMs. This creates a local copy of the parameter in the FPGA task accelerator which can be accessed faster than dispatching memory accesses. The data is copied from the FPGA addressable memory into the FPGA task accelerator before launching the task execution. Depending on the type of clause (copy_in, copy_out, copy_inout), the wrapper includes support for reading/writing the local copy from/into memory. Both input and output data movements, may be dynamically disabled by the runtime based on its knowledge about task copies and predecessor/successor tasks. Usage example:

const unsigned int LEN = 8;

#pragma oss task device(fpga) out([LEN]dst) copy_out([LEN]dst)
void memset(char * dst, const char val) {
  for (unsigned int i=0; i<LEN; ++i) {
    dst[i] = val;
  }
}

2.3.4. copy_deps

Promote the task dependencies like they were annotated into the copy clause.

2.4. Calls to Nanos6 API

The list of Nanos6 APIs and their details can be found in the following section. Note that not all Nanos6 APIs can be called within FPGA tasks and others only are supported within them.