6.4. crdbmaster cluster installation¶
The OmpSs-2@FPGA releases are automatically installed in the crdbmaster cluster. They are available through a module file for each target architecture. This document describes how to load and use the modules to compile an example application. Once the modules are loaded, the workflow in the crdbmaster cluster should be the same as in the Docker images.
6.4.1. General remarks¶
- The OmpSs@FPGA toolchain is installed in a version folder under the
/opt/bsc/
directory. - Third-party libraries required to run some programs are installed in the corresponding folder under the
/opt/lib/
directory. - The rest of the software (Xilinx toolchain, slurm, modules, etc.) is installed under the
/tools/
directory.
6.4.2. Node specifications¶
- CPU: Intel Xeon E3-1220 CPU
- Main memory: 32GB DDR3-1600
- FPGAs:
- Xilinx Kria KV260
- Xilinx Zynq Ultrascale+ ZCU102
- Xilinx Zynq 7000 ZC702
6.4.3. System overview¶
Current setup consists of an x86 login node and several SoC boards directly connected to it. Serial lines and jtag are connected to the login node, allowing node management as well as debug and programming.
6.4.4. Logging into the system¶
Crdbmaster login node is accessible via ssh at crbmaster.bsc.es
6.4.5. Module structure¶
The ompss-2 modules are:
ompss-2/arm64/*[release version]*
This will automatically load the default Vivado version, although an arbitrary version can be loaded before ompss:
module load vivado/2023.2 ompss-2/arm64/git
To list all available modules in the system run:
module avail
6.4.6. Build applications¶
To generate an application binary and bitstream, you could refer to Compile OmpSs-2@FPGA programs as the steps are general enough.
Note that the appropriate modules need to be loaded. See Module structure.
6.4.7. Running applications¶
Get access to an installed fpga¶
crdbmaster cluster uses SLURM in order to manage access to computation resources. Therefore, to be able to use the resources of an FPGA, an allocation in one of the partitions has to be made.
There are 2 partitions in the cluster:
arm64
: KV260 and ZCU102 boardsarm32
: ZC702 board
In order to make an allocation, you must run srun
:
srun -p [partition]
For instance:
srun -p arm32 --pty bash
Or allocate a specific board with:
srun -p arm64 --nodelist=zcu102 --pty bash
These commands will allocate an FPGA and run an interactive bash with the required tools and file permissions already set by slurm. To get information about the active slurm jobs, run:
squeue
The output should look similar to this:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1312 arm32 bash afilguer R 17:14 1 zynq702
Loading bistreams¶
The FPGA bitstream needs to be loaded before the application can run.
Xilinx provides the fpgautil
utility in order to simplify bitstream loading.
fpgautil -b bitstream.bin
Get current bitstream info¶
In order to get information about the bitstream currently loaded into the FPGA, the tool read_bitinfo
is installed in the system.
read_bitinfo
Note that an active slurm reservation is needed in order to query the FPGA.
This call should return something similar to the sample output for a matrix multiplication application:
Bitinfo version: 13
Bitstream user-id: 0x9D8E280
AIT version: 7.7.2
Wrapper version 13
Number of acc: 3
Board base frequency (MHz) 125.000000
Interleaving not enabled
Features:
[ ] Instrumentation
[ ] Hardware counter
[x] Performance interconnect
[ ] Simplified interconnection
[ ] POM AXI-Lite
[x] POM task creation
[x] POM dependencies
[ ] POM lock
[x] POM spawn queues
[ ] Power monitor (CMS)
[ ] Thermal monitor (sysmon)
[ ] OMPIF
Managed rstn addr 0x8000A000
Cmd In addr 0x80006000 len 256
Cmd Out addr 0x80008000 len 256
Spawn In addr 0x80002000 len 1024
Spawn Out addr 0x80004000 len 1024
Hardware counter not enabled
POM AXI-Lite not enabled
Power monitor (CMS) not enabled
Thermal monitor (sysmon) not enabled
xtasks accelerator config:
type count freq(KHz) description
5839957875 1 100000 matmulFPGA
7602000973 2 100000 matmulBlock
ait command line:
ait --name=matmul --board=zynq702 -c=100 --interconnect_opt=performance --interconnect_regslice=all --wrapper_version 13
Hardware runtime VLNV:
bsc:ompss:picos_ompss_manager:7.3