6.6. Meep cluster installation¶
The Meep cluster, also known as makinote, is an FPGA cluster composed of 12 nodes with 8 FPGAs each for a total of 96 FPGAs. It also contains 4 nodes without FPGAs used for compilation and synthesis.
The OmpSs-2@FPGA releases are automatically installed in the Meep cluster. They are available through a module file for each target architecture.
6.6.1. General remarks¶
OmpSs@FPGA tools are installed in
/home/genu/pmtest/opt/bsc/
directory.OmpSs modules need to manually enabled.
This cluster uses BSC HPC accounts. Users look like bsc0xxxxx.
During the updates, the installation will not be available for the users’ usage.
Usually, the installation takes about 30 minutes.
After the installation, an informative email will be sent.
6.6.2. Node specifications¶
Full node specifications are available at the support knowledge center: https://www.bsc.es/supportkc/docs/MEEP/overview
CPU: Intel Xeon Gold 6330 with 28 cores @ 2.0GHz
Main memory: 256 GB (16 RDIMM x 16GB DDR4 @ 3200 MHz)
8 Xilinx Alveo UC55c FPGAs
There are 12 FPGA node, 4 synthesis nodes and a login node. Synthesis and login nodes to not have FPGAs.
6.6.3. Logging into Meep¶
Login node is accessible from the BSC internal network.
To access from an external network, the VPN must be used.
The login node is accessible from fpgalogin1.bsc.es
ssh bscxxxxx@fpgalogin1.bsc.es
6.6.4. Module structure¶
The default environment does not have the available modules for building OmpSs@FPGA applications. A suitable environment can be set up:
source ~pmtest/tools/ompss_fpga_init.sh
This will enable OmpSs modules, also, reasonably recent versions of python, cmake or clang are enabled.
Note
The loaded python 3.11, while it’s needed by ait, will break gdb and maybe other system applications
The OmpSs-2 modules are:
ompss-2/x86_64/*[release version]*
This will automatically load the default Vivado version, although an arbitrary version can be loaded before OmpSs:
module load vivado/2023.2 ompss-2/x86_64/git
To list all available modules in the system run:
module avail
6.6.5. Build applications¶
To generate an application binary and bitstream, you could refer to Compile OmpSs-2@FPGA programs as the steps are general enough.
Note that the appropriate modules need to be loaded. See Module structure.
To allocate a job in the synthesis nodes, the gpp
partition needs to be used.
To enable remote x11 graphics, the --x11
needs to be specified.
For instance, to start an interactive session with graphics:
salloc -c 10 --mem=64G -t 4:00:00 -p gpp --x11
The job will be using 10 cores and 64GB of memory.
For a batch job with no graphics:
sbatch -c 10 --mem=64G -t 4:00:00 -p gpp build_script.sh
6.6.6. Running applications¶
This section describes how to allocate resources and set up the environment to run an OmpSs@FPGA application. To execute the application itself, refer to Running OmpSs-2@FPGA Programs.
Get access to an installed fpga¶
To run OmpSs@FPGA applications, a job needs to be allocated in the FPGA nodes. These nodes are the main partition. Therefore, no partition needs to be specified.
For instance, an interactive job looks like this:
salloc - c 112 --mem=128G -t 4:00:00 --constraint=dmaqdma
This will allocate a full node in qdma mode, which is needed for running OmpSs@FPGA applications. The fill node will be allocated and the 8 FPGAs are available to the user.
Information about the FPGAs is stored in /etc/motd
file.
This file specifies board serial number, USB port, PCIe slot,
and the network ports used by each FPGA.
For instance:
+------------------------------------------------------------------------------------------------------------------------------------+
| +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+ |
| |swp26 | |swp25 | |swp24 | |swp23 | |swp22 | |swp21 | |swp20 | |swp19 | |
| +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+ |
+-------^---------------^----------------^----------------^----------------^---------------^----------------^---------------^--------+
| | | | | | | |
+-------|--------+------|--------+-------|--------+-------|--------+-------|--------+------|--------+-------|-------+-------|--------+
| v | v | v | v | v | v | v | v |
| +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ |
| | | | | | | | | | | | | | | | | | | | | | | | |
| |QSFP0 | | |QSFP0 | | |QSFP0 | | |QSFP0 | | |QSFP0 | | |QSFP0 | | |QSFP0 | | |QSFP0 | |
| | | | | | | | | | | | | | | | | | | | | | | | |
| +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ |
| | | | | | | | |
| +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ |
| | | | | | | | | | | | | | | | | | | | | | | | |
| |QSFP1 | <----> |QSFP1 | | |QSFP1 | <----> |QSFP1 | | |QSFP1 | <----> |QSFP1 | | |QSFP1 | <----> |QSFP1 | |
| | | | | | | | | | | | | | | | | | | | | | | | |
| +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ |
| onic180s0f0 | onic179s0f0 | onic204s0f0 | onic205s0f0 | onic26s0f0 | onic25s0f0 | onic51s0f0 | onic52s0f0 |
| | | | | | | | |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
| | | | | | | | | | | | | | | | | | | | | | | | |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
| USB-UART- | USB-UART- | USB-UART- | USB-UART- | USB-UART- | USB-UART- | USB-UART- | USB-UART- |
| XFL1ND323BSU |XFL1Y1BX0JYT | XFL1E3102VRH | XFL12GU0UBJA | XFL1UZW5U0MR |XFL1G5IYME1R | XFL12IUWGVDB | XFL1D2QP00YZ |
+----------------+---------------+----------------+----------------+----------------+---------------+---------------+----------------+
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| FPGA Card | Chassis | FPGA Serial | PCIe Bus | USBPort | ttyUSBx | QSFP0 | QSFP1 | QDMA onic | onic IP |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f01 | 3 | XFL1D2QP00YZ | 34:00.0 | 1 | USB-UART-XFL1D2QP00YZ | Switch | fpgan08f02 | onic52s0f0 | 10.0.1.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f02 | 4 | XFL12IUWGVDB | 33:00.0 | 2 | USB-UART-XFL12IUWGVDB | Switch | fpgan08f01 | onic51s0f0 | 10.0.2.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f03 | 5 | XFL1G5IYME1R | 19:00.0 | 3 | USB-UART-XFL1G5IYME1R | Switch | fpgan08f04 | onic25s0f0 | 10.0.3.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f04 | 6 | XFL1UZW5U0MR | 1a:00.0 | 4 | USB-UART-XFL1UZW5U0MR | Switch | fpgan08f03 | onic26s0f0 | 10.0.4.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f05 | 7 | XFL12GU0UBJA | cd:00.0 | 5 | USB-UART-XFL12GU0UBJA | Switch | fpgan08f06 | onic205s0f0 | 10.0.5.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f06 | 8 | XFL1E3102VRH | cc:00.0 | 6 | USB-UART-XFL1E3102VRH | Switch | fpgan08f05 | onic204s0f0 | 10.0.6.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f07 | 9 | XFL1Y1BX0JYT | b3:00.0 | 7 | USB-UART-XFL1Y1BX0JYT | Switch | fpgan08f08 | onic179s0f0 | 10.0.7.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
| fpgan08f08 | 10 | XFL1ND323BSU | b4:00.0 | 8 | USB-UART-XFL1ND323BSU | Switch | fpgan08f07 | onic180s0f0 | 10.0.8.1/24 |
+------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+
Loading bistreams¶
The FPGA bitstream needs to be loaded before the application can run.
The load_bitstream
utility is provided in order to simplify the
FPGA configuration.
load_bitstream bitstream.bit [index] ...
The utility receives a second parameter to indicate which of the FPGAs to program. More than one index can be specified. In such case, all the specified FPGAs will be programmed using the given bitstream.
To know which FPGAs indices have been allocated,
run load_bitstream
with the help (-h
) option. The output should be similar to this:
Usage load_bitstream bitstream.bit [index]
Available devices:
index: jtag serial pcie
0: XFL1D2QP00YZ 34:00.0
1: XFL12IUWGVDB 33:00.0
2: XFL1G5IYME1R 19:00.0
3: XFL1UZW5U0MR 1a:00.0
4: XFL12GU0UBJA cd:00.0
5: XFL1E3102VRH cc:00.0
6: XFL1Y1BX0JYT b3:00.0
7: XFL1ND323BSU b4:00.0
Set up qdma queues¶
Note
This step is performed by load_bitstream
script,
which creates a single bidirectional memory mapped queue.
This is only needed if other configuration is needed.
For DMA transfers to be performed between system main memory and the FPGA memory, qdma queues has to be set up by the user prior to any execution.
In this case dma-ctl
tool is used.
For instance: In order to create and start a memory mapped qdma queue with index 1 run:
dma-ctl qdmab3000 q add idx 1 mode mm dir bi
dma-ctl qdmab3000 q start idx 1 mode mm dir bi
OmpSs runtime system expects an mm queue at index 1, which can be created with the commands listed above.
In the same fashion, these queues can also be removed:
dma-ctl qdmab3000 q stop idx 1 mode mm dir bi
dma-ctl qdmab3000 q del idx 1 mode mm dir bi
For more information, see
dma-ctl --help
Get current bitstream info¶
In order to get information about the bitstream currently loaded into the FPGA, the tool read_bitinfo
is installed in the system.
read_bitinfo
Note that an active slurm reservation is needed in order to query the FPGA.
This call should return something similar to the sample output for a OMPIF test application:
Bitinfo of FPGA 0000:cc:00.0:
Bitinfo version: 13
Bitstream user-id: 0x479B8510
AIT version: 7.7.2
Wrapper version 13
Number of acc: 5
Board base frequency (MHz) 100.000000
Interleaving not enabled
Features:
[ ] Instrumentation
[ ] Hardware counter
[x] Performance interconnect
[ ] Simplified interconnection
[x] POM AXI-Lite
[x] POM task creation
[ ] POM dependencies
[ ] POM lock
[x] POM spawn queues
[ ] Power monitor (CMS)
[ ] Thermal monitor (sysmon)
[x] OMPIF
Managed rstn addr 0x10000
Cmd In addr 0xC000 len 128
Cmd Out addr 0xE000 len 128
Spawn In addr 0x8000 len 1024
Spawn Out addr 0xA000 len 1024
Hardware counter not enabled
POM AXI-Lite addr 0x4000
Power monitor (CMS) not enabled
Thermal monitor (sysmon) not enabled
xtasks accelerator config:
type count freq(KHz) description
8381065717 1 100000 send_receive_test
8454279320 1 100000 allgather_test_task
7899490654 1 100000 broadcast_test_task
4294967299 1 100000 ompif_message_sender
4294967300 1 100000 ompif_message_receiver
ait command line:
ait --name=ompif_test --board=alveo_u55c -c=100 --enable_pom_axilite --interconnect_opt=performance --wrapper_version 13
Hardware runtime VLNV:
bsc:ompss:picos_ompss_manager:7.3