4.2. Running OMPIF applications

Multi-node multi-fpga applications developed using the OMPIF cluster API need special setup that is not needed in regular OmpSs@FPGA applications.

4.2.1. Load cluster configuration scripts

Cluster configuration is done using the ompif.py cluster configuration script, which is available in the ompss-2 module (ompss-2/x86_64/git for instance). For more information about this tool, you can use the --help flag to get general information about the tool or each individual command:

ompif.py --help
ompif.py create_cluster_file --help

4.2.2. Application execution

This section covers cluster environment setup and application execution. It is assumed that vivado and dma-ctl are available in the path. Those are available in the ompss-2 environment module.

Creating cluster description file

A json file describing the cluster needs to be created for the cluster to be automatically configured. It contains, for each FPGA, its index inside the node, the node name and the bitstream absolute path:

[
    { "fpga": FPGA_INDEX, "node": NODE_NAME, "bitstream": BITSTREAM_PATH }
]

The following example configures a cluster using 4 FPGAs in 2 different nodes (2 FPGAs each node):

[
    { "node": "fpgan01", "fpga": 0, "bitstream" : "/absolute/path/to/bitstream.bit" },
    { "node": "fpgan01", "fpga": 1, "bitstream" : "/absolute/path/to/bitstream.bit" },
    { "node": "fpgan02", "fpga": 0, "bitstream" : "/absolute/path/to/bitstream.bit" },
    { "node": "fpgan02", "fpga": 1, "bitstream" : "/absolute/path/to/bitstream.bit" }
]

Information regarding FPGAs, can be found in /etc/motd in each of the FPGA nodes.

The ompif.py management script includes a tool to generate the cluster file based on the allocated nodes in the current job and the number of FPGAs requested:

ompif.py create_cluster_file number_of_fpgas bitstream_file

For instance to configure a cluster with FPGAs with bitstream.bit file:

ompif.py create_cluster_file 6 bitstream.bit

This reads the list of nodes from the $SLURM_JOB_NODELIST environment variable set by slurm. A node list can be explicitly specified using the --nodelist flag.

Configuring the FPGA cluster

Once the cluster file is created, the cluster can be configured using the create_cluster command from the ompif.py tool.

ompif.py create_cluster cluster.json

The script will automatically launch servers and connect to them. For each node, a log file ${NODENAME}_servitor.log is created in the current working directory. You can change the path with the --log_prefix flag. Also, for each node the script creates the xtasks_devs_$(hostname).sh file, with the XTASKS_PCI_DEV and XDMA_QDMA_DEV environment variables. By default it is created in the current working directory, but it can be changed with the --xtasks_cluster_prefix flag. The use of this file is explained in Start xtasks servers. The script also creates the xtasks.cluster file needed by your application in the current working directory. This file must be in the same directory where you launch the application, or also you can set the path in the XTASKS_CLUSTER_FILE environment variable. If not, the application will assume you are executing in single-node mode, and will not connect to the remote servers.

Start xtasks servers

A remote server that listens for FPGA tasks needs to be started in each of the remote nodes:

ompif.py launch_xtasks_server cluster.json

Log files are created in the current directory named ${NODENAME}_xtasks.log. You can change the path with the --log_prefix flag. The script launches the server in the current working directory, so you must have the xtasks.cluster file in the same directory. For the moment, it can’t be set with an environment variable. Also, by default all xtasks_devs_$(hostname).sh files must be in the current directory as well. However, you can set another path with the --xtasks_devs_prefix flag. Then, the cluster application can be run as usual.

Debugging

There are many debug registers that can be read with QDMA, including the number of received messages, number of corrupted messages, number of send/receive tasks, etc. More details in POM AXI-Lite interface memory map.