6.6. Meep cluster installation

The Meep cluster, also known as makinote, is an FPGA cluster composed of 12 nodes with 8 FPGAs each for a total of 96 FPGAs. It also contains 4 nodes without FPGAs used for compilation and synthesis.

The OmpSs-2@FPGA releases are automatically installed in the Meep cluster. They are available through a module file for each target architecture.

6.6.1. General remarks

  • OmpSs@FPGA tools are installed in /home/genu/pmtest/opt/bsc/ directory.

  • OmpSs modules need to manually enabled.

  • This cluster uses BSC HPC accounts. Users look like bsc0xxxxx.

  • During the updates, the installation will not be available for the users’ usage.

  • Usually, the installation takes about 30 minutes.

  • After the installation, an informative email will be sent.

6.6.2. Node specifications

Full node specifications are available at the support knowledge center: https://www.bsc.es/supportkc/docs/MEEP/overview

There are 12 FPGA node, 4 synthesis nodes and a login node. Synthesis and login nodes to not have FPGAs.

6.6.3. Logging into Meep

Login node is accessible from the BSC internal network. To access from an external network, the VPN must be used. The login node is accessible from fpgalogin1.bsc.es

ssh bscxxxxx@fpgalogin1.bsc.es

6.6.4. Module structure

The default environment does not have the available modules for building OmpSs@FPGA applications. A suitable environment can be set up:

source ~pmtest/tools/ompss_fpga_init.sh

This will enable OmpSs modules, also, reasonably recent versions of python, cmake or clang are enabled.

Note

The loaded python 3.11, while it’s needed by ait, will break gdb and maybe other system applications

The OmpSs-2 modules are:

  • ompss-2/x86_64/*[release version]*

This will automatically load the default Vivado version, although an arbitrary version can be loaded before OmpSs:

module load vivado/2023.2 ompss-2/x86_64/git

To list all available modules in the system run:

module avail

6.6.5. Build applications

To generate an application binary and bitstream, you could refer to Compile OmpSs-2@FPGA programs as the steps are general enough.

Note that the appropriate modules need to be loaded. See Module structure.

To allocate a job in the synthesis nodes, the gpp partition needs to be used. To enable remote x11 graphics, the --x11 needs to be specified.

For instance, to start an interactive session with graphics:

salloc -c 10 --mem=64G -t 4:00:00 -p gpp --x11

The job will be using 10 cores and 64GB of memory.

For a batch job with no graphics:

sbatch -c 10 --mem=64G -t 4:00:00 -p gpp build_script.sh

6.6.6. Running applications

This section describes how to allocate resources and set up the environment to run an OmpSs@FPGA application. To execute the application itself, refer to Running OmpSs-2@FPGA Programs.

Get access to an installed fpga

To run OmpSs@FPGA applications, a job needs to be allocated in the FPGA nodes. These nodes are the main partition. Therefore, no partition needs to be specified.

For instance, an interactive job looks like this:

salloc - c 112 --mem=128G -t 4:00:00 --constraint=dmaqdma

This will allocate a full node in qdma mode, which is needed for running OmpSs@FPGA applications. The fill node will be allocated and the 8 FPGAs are available to the user.

Information about the FPGAs is stored in /etc/motd file. This file specifies board serial number, USB port, PCIe slot, and the network ports used by each FPGA.

For instance:

+------------------------------------------------------------------------------------------------------------------------------------+
|    +------+        +------+         +------+         +------+         +------+        +------+         +------+        +------+    |
|    |swp26 |        |swp25 |         |swp24 |         |swp23 |         |swp22 |        |swp21 |         |swp20 |        |swp19 |    |
|    +------+        +------+         +------+         +------+         +------+        +------+         +------+        +------+    |
+-------^---------------^----------------^----------------^----------------^---------------^----------------^---------------^--------+
        |               |                |                |                |               |                |               |
+-------|--------+------|--------+-------|--------+-------|--------+-------|--------+------|--------+-------|-------+-------|--------+
|       v        |      v        |       v        |       v        |       v        |      v        |       v       |       v        |
|    +------+    |   +------+    |    +------+    |    +------+    |    +------+    |   +------+    |    +------+   |    +------+    |
|    |      |    |   |      |    |    |      |    |    |      |    |    |      |    |   |      |    |    |      |   |    |      |    |
|    |QSFP0 |    |   |QSFP0 |    |    |QSFP0 |    |    |QSFP0 |    |    |QSFP0 |    |   |QSFP0 |    |    |QSFP0 |   |    |QSFP0 |    |
|    |      |    |   |      |    |    |      |    |    |      |    |    |      |    |   |      |    |    |      |   |    |      |    |
|    +------+    |   +------+    |    +------+    |    +------+    |    +------+    |   +------+    |    +------+   |    +------+    |
|                |               |                |                |                |               |               |                |
|    +------+    |   +------+    |    +------+    |    +------+    |    +------+    |   +------+    |    +------+   |    +------+    |
|    |      |    |   |      |    |    |      |    |    |      |    |    |      |    |   |      |    |    |      |   |    |      |    |
|    |QSFP1 | <----> |QSFP1 |    |    |QSFP1 |  <----> |QSFP1 |    |    |QSFP1 | <----> |QSFP1 |    |    |QSFP1 | <----> |QSFP1 |    |
|    |      |    |   |      |    |    |      |    |    |      |    |    |      |    |   |      |    |    |      |   |    |      |    |
|    +------+    |   +------+    |    +------+    |    +------+    |    +------+    |   +------+    |    +------+   |    +------+    |
|   onic180s0f0  |  onic179s0f0  |   onic204s0f0  |   onic205s0f0  |   onic26s0f0   |  onic25s0f0   |   onic51s0f0  |   onic52s0f0   |
|                |               |                |                |                |               |               |                |
|      +-+       |     +-+       |      +-+       |      +-+       |      +-+       |      +-+      |      +-+      |      +-+       |
|      | |       |     | |       |      | |       |      | |       |      | |       |      | |      |      | |      |      | |       |
|      +-+       |     +-+       |      +-+       |      +-+       |      +-+       |      +-+      |      +-+      |      +-+       |
|   USB-UART-    |  USB-UART-    |   USB-UART-    |   USB-UART-    |  USB-UART-     |  USB-UART-    |   USB-UART-   |   USB-UART-    |
| XFL1ND323BSU   |XFL1Y1BX0JYT   | XFL1E3102VRH   | XFL12GU0UBJA   | XFL1UZW5U0MR   |XFL1G5IYME1R   | XFL12IUWGVDB  |  XFL1D2QP00YZ  |
+----------------+---------------+----------------+----------------+----------------+---------------+---------------+----------------+

+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| FPGA Card  | Chassis | FPGA Serial  | PCIe Bus | USBPort | ttyUSBx               | QSFP0  | QSFP1      | QDMA onic   | onic IP     |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f01 | 3       | XFL1D2QP00YZ | 34:00.0  | 1       | USB-UART-XFL1D2QP00YZ | Switch | fpgan08f02 | onic52s0f0  | 10.0.1.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f02 | 4       | XFL12IUWGVDB | 33:00.0  | 2       | USB-UART-XFL12IUWGVDB | Switch | fpgan08f01 | onic51s0f0  | 10.0.2.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f03 | 5       | XFL1G5IYME1R | 19:00.0  | 3       | USB-UART-XFL1G5IYME1R | Switch | fpgan08f04 | onic25s0f0  | 10.0.3.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f04 | 6       | XFL1UZW5U0MR | 1a:00.0  | 4       | USB-UART-XFL1UZW5U0MR | Switch | fpgan08f03 | onic26s0f0  | 10.0.4.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f05 | 7       | XFL12GU0UBJA | cd:00.0  | 5       | USB-UART-XFL12GU0UBJA | Switch | fpgan08f06 | onic205s0f0 | 10.0.5.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f06 | 8       | XFL1E3102VRH | cc:00.0  | 6       | USB-UART-XFL1E3102VRH | Switch | fpgan08f05 | onic204s0f0 | 10.0.6.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f07 | 9       | XFL1Y1BX0JYT | b3:00.0  | 7       | USB-UART-XFL1Y1BX0JYT | Switch | fpgan08f08 | onic179s0f0 | 10.0.7.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f08 | 10      | XFL1ND323BSU | b4:00.0  | 8       | USB-UART-XFL1ND323BSU | Switch | fpgan08f07 | onic180s0f0 | 10.0.8.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+

Loading bistreams

The FPGA bitstream needs to be loaded before the application can run. The load_bitstream utility is provided in order to simplify the FPGA configuration.

load_bitstream bitstream.bit [index] ...

The utility receives a second parameter to indicate which of the FPGAs to program. More than one index can be specified. In such case, all the specified FPGAs will be programmed using the given bitstream.

To know which FPGAs indices have been allocated, run load_bitstream with the help (-h) option. The output should be similar to this:

Usage load_bitstream bitstream.bit [index]
Available devices:
index: jtag serial pcie
0: XFL1D2QP00YZ 34:00.0
1: XFL12IUWGVDB 33:00.0
2: XFL1G5IYME1R 19:00.0
3: XFL1UZW5U0MR 1a:00.0
4: XFL12GU0UBJA cd:00.0
5: XFL1E3102VRH cc:00.0
6: XFL1Y1BX0JYT b3:00.0
7: XFL1ND323BSU b4:00.0

Set up qdma queues

Note

This step is performed by load_bitstream script, which creates a single bidirectional memory mapped queue. This is only needed if other configuration is needed.

For DMA transfers to be performed between system main memory and the FPGA memory, qdma queues has to be set up by the user prior to any execution.

In this case dma-ctl tool is used. For instance: In order to create and start a memory mapped qdma queue with index 1 run:

dma-ctl qdmab3000 q add idx 1 mode mm dir bi
dma-ctl qdmab3000 q start idx 1 mode mm dir bi

OmpSs runtime system expects an mm queue at index 1, which can be created with the commands listed above.

In the same fashion, these queues can also be removed:

dma-ctl qdmab3000 q stop idx 1 mode mm dir bi
dma-ctl qdmab3000 q del idx 1 mode mm dir bi

For more information, see

dma-ctl --help

Get current bitstream info

In order to get information about the bitstream currently loaded into the FPGA, the tool read_bitinfo is installed in the system.

read_bitinfo

Note that an active slurm reservation is needed in order to query the FPGA.

This call should return something similar to the sample output for a OMPIF test application:

Bitinfo of FPGA 0000:cc:00.0:
Bitinfo version:    13
Bitstream user-id:  0x479B8510
AIT version:        7.7.2
Wrapper version     13
Number of acc:      5
Board base frequency (MHz)  100.000000
Interleaving not enabled

Features:
[ ] Instrumentation
[ ] Hardware counter
[x] Performance interconnect
[ ] Simplified interconnection
[x] POM AXI-Lite
[x] POM task creation
[ ] POM dependencies
[ ] POM lock
[x] POM spawn queues
[ ] Power monitor (CMS)
[ ] Thermal monitor (sysmon)
[x] OMPIF

Managed rstn addr 0x10000
Cmd In addr 0xC000 len 128
Cmd Out addr 0xE000 len 128
Spawn In addr 0x8000 len 1024
Spawn Out addr 0xA000 len 1024
Hardware counter not enabled
POM AXI-Lite addr 0x4000
Power monitor (CMS) not enabled
Thermal monitor (sysmon) not enabled

xtasks accelerator config:
type        count   freq(KHz)   description
8381065717  1       100000      send_receive_test
8454279320  1       100000      allgather_test_task
7899490654  1       100000      broadcast_test_task
4294967299  1       100000      ompif_message_sender
4294967300  1       100000      ompif_message_receiver

ait command line:
ait --name=ompif_test --board=alveo_u55c -c=100 --enable_pom_axilite --interconnect_opt=performance --wrapper_version 13

Hardware runtime VLNV:
bsc:ompss:picos_ompss_manager:7.3

Running cluster applications

See Running OMPIF applications.