2. Develop OmpSs@FPGA programs¶
Most of the required information to develop an OmpSs@FPGA application should be in the general OmpSs documentation (https://pm.bsc.es/ompss-docs/book/index.html). Note that, there may be some unsupported/not-working OmpSs features and/or syntax when using FPGA tasks. If you have some problem or realize any bug, do not hesitate to contact us or open an issue.
To create an FPGA task you need to add the target
directive before the task
directive.
For example:
const unsigned int LEN = 8;
#pragma omp target device(fpga)
#pragma omp task out([LEN]dst, const char val)
void memset(char * dst, const char val) {
for (unsigned int i=0; i<LEN; ++i) {
dst[i] = val;
}
}
2.1. Limitations¶
- There are some limitations when developing an OmpSs@FPGA application:
- Only C/C++ are supported, not Fortran.
- Only function declarations can be annotated as FPGA tasks.
- Avoid using global variables which are not listed in the dependences/copies. They can be used through function arguments.
- The macros cannot be used within #pragmas as explained in OmpSs user guide (https://pm.bsc.es/ftp/ompss/doc/user-guide/faq-macros.html).
- The HLS source code generated by Mercurim for each FPGA task will not contain the includes in the original source file but the ones finished in .fpga.h or .fpga .
- The FPGA task code cannot perform general system calls, and only some Nanos++ APIs are supported.
- The usage of
size_t
,signed long int
orunsigned long int
is not recommended inside the FPGA accelerator code. They may have different widths in the host and in the FPGA.
2.2. Clauses of target directive¶
The following sections list the clauses that can be used in the target
directive.
2.2.1. num_instances¶
Defines the number of instances to place in the FPGA bitstream of a task. Usage example:
const unsigned int LEN = 8;
#pragma omp target device(fpga) num_instances(3)
#pragma omp task out([LEN]dst)
void memset(char * dst, const char val) {
for (unsigned int i=0; i<LEN; ++i) {
dst[i] = val;
}
}
2.2.2. onto¶
The information in this clause is used at runtime to send the tasks to the
corresponding FPGA accelerator. This means that a FPGA task has the onto(0)
it
can only run in accelerators that are of type 0. The value provided in this clause
will overwrite the value automatically generated by Merciurim (a hash based on the
source file and function name) to match the tasks. Usage example:
const unsigned int LEN = 8;
#pragma omp target device(fpga) onto(100)
#pragma omp task out([LEN]dst)
void memset_char(char * dst, const char val) {
for (unsigned int i=0; i<LEN; ++i) {
dst[i] = val;
}
}
#pragma omp target device(fpga) onto(101)
#pragma omp task out([LEN]dst)
void memset_float(float * dst, const float val) {
for (unsigned int i=0; i<LEN; ++i) {
dst[i] = val;
}
}
2.2.3. localmem¶
Defines the memory regions that the FPGA task wrapper must catch in BRAMs.
This creates a local copy of the parameter in the FPGA task accelerator which can be
accessed faster than dispatching memory accesses.
The data is copied from the FPGA addressable memory into the FPGA task accelerator
before launching the task execution.
If the parameter is not labeled with the const
modifier, the wrapper includes support
for writing back the local copy into memory after the task execution.
Both input and output data movements, may be dynamically disabled by the runtime based
on its knowledge about task copies and predecessor/successor tasks.
Usage example:
const unsigned int LEN = 8;
#pragma omp target device(fpga) localmem([LEN]dst)
#pragma omp task out([LEN]dst)
void memset(char * dst, const char val) {
for (unsigned int i=0; i<LEN; ++i) {
dst[i] = val;
}
}
2.2.4. localmem_copies¶
Promote the task copies like they were annotated into the localmem
clause.
This clause is enabled by default, unless the localmem
clause is present.
2.2.5. no_localmem_copies¶
Do not promote the task copies into the localmem
clause.
2.2.6. period¶
Defines the task period in microseconds. The usage of this clause makes the task a recurrent task that is executed (at most) every period microseconds. Usage example where a task is executed every second:
const unsigned int LEN = 8;
#pragma omp target device(fpga) period(1000000)
#pragma omp task
void memset(char * dst, const char val) {
for (unsigned int i=0; i<LEN; ++i) {
dst[i] = val;
}
}
2.2.7. num_repetitions¶
Defines the number of repetitions that a recurrent task has to be executed before it becomes finished. The usage of this clause makes the task a recurrent task that is executed N times. Usage example where the task body is executed 100 times:
const unsigned int LEN = 8;
#pragma omp target device(fpga) num_repetitions(100)
#pragma omp task
void memset(char * dst, const char val) {
for (unsigned int i=0; i<LEN; ++i) {
dst[i] = val;
}
}
2.3. Calls to Nanos++ API¶
The list of Nanos++ APIs and their details can be found in the following section. Note that not all Nanos++ APIs can be called within FPGA tasks and others only are supported within them.