2.4.1. Nanos6 FPGA Architecture API

The following sections list and summarize the Nanos6 FPGA architecture API.

Memory Management

nanos6_fpga_malloc

Allocates memory in the FPGA address space and returns a pointer valid for the FPGA tasks. The returned pointer cannot be dereferenced in the host code.

Arguments:
  • size: Size in bytes to allocate.

  • fpga_addr: Pointer to the FPGA address space as a 64-bit integer.

Return value:
  • NANOS6_FPGA_SUCCESS on success, NANOS6_FPGA_ERROR on error.

typedef enum {
    NANOS6_FPGA_SUCCESS,
    NANOS6_FPGA_ERROR
} nanos6_fpga_stat_t;

nanos6_fpga_stat_t nanos6_fpga_malloc(uint64_t size, uint64_t* fpga_addr);

nanos6_fpga_free

nanos6_fpga_stat_t nanos6_fpga_free(uint64_t fpga_addr);

nanos6_fpga_memcpy

typedef enum {
    NANOS6_FPGA_DEV_TO_HOST,
    NANOS6_FPGA_HOST_TO_DEV
} nanos6_fpga_copy_t;

nanos6_fpga_stat_t nanos6_fpga_memcpy(
    void* usr_ptr,
    uint64_t fpga_addr,
    uint64_t size,
    nanos6_fpga_copy_t copy_type);

Data copies

These Nanos6 API can only be called inside an FPGA task. They allow copies to be performed through a single port that can be wider than the data type being copied.

If any of the data copy API calls are used, the fompss-fpga-memory-port-width option is mandatory.

Data accessed through this functions has to be aligned to the port width, otherwise this will result in undefined behaviour.

Also, data should to be multiple of the port width. If this cannot be guaranteed, fompss-fpga-check-limits-memory-port option is needed so that no out of bounds data is accessed, otherwise this will result in undefined behaviour.

nanos6_fpga_memcpy_wideport_in

nanos6_fpga_stat_t nanos6_fpga_memcpy_wideport_in(void* dst, const unsigned long long int addr, const unsigned int num_elems);

Arguments:

  • dst: Pointer to the destination (local) data. It can be any data type.

  • addr: FPGA memory address space where the data is stored.

  • num_elems: Number of elements of the array type to be copied.

nanos6_fpga_memcpy_wideport_out

nanos6_fpga_stat_t nanos6_fpga_memcpy_wideport_out(void* dst, const unsigned long long int addr, const unsigned int num_elems);

Arguments:

  • dst: Pointer to the source (local) data. It can be any data type.

  • addr: FPGA memory address space where the data is written.

  • num_elems: Number of elements of the array type to be copied.

OMPIF cluster API

OMPIF is an API that allows direct FPGA-to-FPGA communication.

OMPIF API resembles MPI API with few assumptions and simplifications.

  • Data types are not used, raw data and its size in bytes is used instead.

  • A single implicit communicator that includes all FPGAs in the cluster is assumed in collectives.

  • For send/receive, dependencies can be added for task synchronization.

  • The size of a message is limited by the 16-bit sequence number of the OMPIF header and the OMPIF packet size. Currently, it is set to 8960 bytes/packet: 65536*8960=587202560 (560MB).

API calls are defined as follows:

void OMPIF_Send(const void *data, unsigned int size, int destination, unsigned char tag, unsigned char numDeps, const unsigned long long int deps[]);

Arguments:

  • data: Pointer to the local data. It must be multiple of 64 bytes.

  • size: Size in bytes. It must be less than 560MB.

  • destination: Rank of the desination.

  • tag: Message tag. It must be less than 256.

  • numDeps: Size of the deps array.

  • deps: Dependence array.

void OMPIF_Recv(void *data, unsigned int size, int source, unsigned char tag, unsigned char numDeps, const unsigned long long int deps[]);

It has the same arguments as OMPIF_Send. data is a pointer where to put the received message, it also must be multiple of 64 bytes. source is the rank of the source.

void OMPIF_Allgather(void* data, unsigned int size);

Arguments:

  • data: Pointer with the local data to send and where the received data is stored.

  • size: Size in bytes to send and to receive from each rank. It must be less than 560MB.

void OMPIF_Bcast(void* data, unsigned int size, int root);

Arguments:

  • data: For the root, pointer to the local data. For the rest, pointer where the data is stored. It must be multiple of 64 bytes.

  • size: Size in bytes to send for the root, and to receive for the rest. Must be less than 560MB.

  • root: Rank of the root.

Return value: The rank of the calling FPGA.

Return value: The number of FPGAs in the cluster.