Programming Models @ BSC

Boosting parallel computing research since 1989


OmpSs-at-FPGA is an extension of OmpSs programming model to support easily offloading tasks to FPGA devices. It uses the task and target directives to define the portions of application code that will become accelerators in the FPGA. The supported languages are C and C++.

Source Code Example

The following code shows an implementation of the dot product using an FPGA task.

const int BSIZE = 64;

#pragma omp target device(fpga)
#pragma omp task in([BSIZE]v1, [BSIZE]v2) inout([1]result)
void dotProduct(const float *v1, const float *v2, float *result) {
  int resultLocal = result[0];
  for (int i = 0; i < BSIZE; ++i) {
    resultLocal += v1[i]*v2[i];
  result[0] = resultLocal;

int main() {
  const int vecSize = 256;
  float v1[vecSize];
  float v2[vecSize];

  // Initialize the vectors
  // ...

  float result = 0;
  for (int i = 0; i < vecSize; i += BSIZE) {
    dotProduct(v1+i, v2+i, &result);
  #pragma omp taskwait

  // Somehow use the result value
  // ...


The tasks with fpga target device are isolated by OmpSs compiler (Mercurium) into separate files. These files are used by AIT tool (formerly autoVivado) to generate the FPGA bitstream. AIT interfaces with the FPGA vendor tools to generate a bitstream with a generic interface that allows the OmpSs runtime (Nanos++) interact with the FPGA task accelerators. The structure of the components involved in the application binary and FPGA bitstream generation are shown in the below figure.

FPGA compilation flow


The execution of an OmpSs@FPGA application follows the same pattern of a regular OmpSs application. The runtime library (Nanos++), and some extra support libraries (xTasks, etc.), must be available in the execution machine. Then, running the application is as simple as running any other binary.

HW Instrumentation

The FPGA task accelerators can be generated with instrumentation support to trace the execution of OmpSs tasks inside the FPGA device. The information gathered inside the FPGA is merged with the host information using the Extrae tool which generates Paraver traces. The final result is something similar the following trace (it is from an execution Cholesky factorization):

Paraver trace with FPGA devices

Supported Boards

Currently, we have tested the toolchain and successfully accelerated different OmpSs applications in the following boards:

  • Zynq 7000 Family
    • Zedboard
    • Zynq 702
    • Zynq 706
    • Digilent Zybo Zynq 7000
  • Virtex 7 Family
    • Alpha Data ADM-PCIE-7V3
  • Zynq Ultrascale+ Family
    • SECO Axiom Board
    • Zynq U+ ZCU102
  • Alveo Family
    • U200
    • U250


If you are interested in OmpSs@FPGA and want more information, do not hesitate to contact us or send an e-mail: ompss-fpga-support [at]

Toolchain Source code

The toolchain source code is available at Github.