.. index:: single: meep Meep cluster installation ============================= .. _installation_meep: The Meep cluster, also known as *makinote*, is an FPGA cluster composed of 12 nodes with 8 FPGAs each for a total of 96 FPGAs. It also contains 4 nodes without FPGAs used for compilation and synthesis. The `OmpSs-2\@FPGA releases `__ are automatically installed in the Meep cluster. They are available through a module file for each target architecture. General remarks --------------- * OmpSs\@FPGA tools are installed in ``/home/genu/pmtest/opt/bsc/`` directory. * OmpSs modules need to manually enabled. * This cluster uses BSC HPC accounts. Users look like bsc0xxxxx. * During the updates, the installation will not be available for the users' usage. * Usually, the installation takes about 30 minutes. * After the installation, an informative email will be sent. Node specifications ------------------- Full node specifications are available at the support knowledge center: https://www.bsc.es/supportkc/docs/MEEP/overview * CPU: Intel Xeon Gold 6330 with 28 cores @ 2.0GHz * https://ark.intel.com/content/www/us/en/ark/products/212458/intel-xeon-gold-6330-processor-42m-cache-2-00-ghz.html * Main memory: 256 GB (16 RDIMM x 16GB DDR4 @ 3200 MHz) * 8 Xilinx Alveo UC55c FPGAs There are 12 FPGA node, 4 synthesis nodes and a login node. Synthesis and login nodes to not have FPGAs. .. _logging_in_meep: Logging into Meep ----------------- Login node is accessible from the BSC internal network. To access from an external network, the VPN must be used. The login node is accessible from ``fpgalogin1.bsc.es`` .. code-block:: text ssh bscxxxxx@fpgalogin1.bsc.es .. _meep-modules: Module structure ---------------- The default environment does not have the available modules for building OmpSs\@FPGA applications. A suitable environment can be set up: .. code-block:: text source ~pmtest/tools/ompss_fpga_init.sh This will enable OmpSs modules, also, reasonably recent versions of python, cmake or clang are enabled. .. note:: The loaded python 3.11, while it's needed by ait, will break gdb and maybe other system applications The OmpSs-2 modules are: * ``ompss-2/x86_64/*[release version]*`` This will automatically load the default Vivado version, although an arbitrary version can be loaded before OmpSs: .. code-block:: text module load vivado/2023.2 ompss-2/x86_64/git To list all available modules in the system run: .. code-block:: text module avail Build applications ------------------ To generate an application binary and bitstream, you could refer to :ref:`compile-ompss2atfpga-programs` as the steps are general enough. Note that the appropriate modules need to be loaded. See :ref:`meep-modules`. To allocate a job in the synthesis nodes, the ``gpp`` partition needs to be used. To enable remote x11 graphics, the ``--x11`` needs to be specified. For instance, to start an interactive session with graphics: .. code-block:: text salloc -c 10 --mem=64G -t 4:00:00 -p gpp --x11 The job will be using 10 cores and 64GB of memory. For a batch job with no graphics: .. code-block:: text sbatch -c 10 --mem=64G -t 4:00:00 -p gpp build_script.sh Running applications -------------------- This section describes how to allocate resources and set up the environment to run an OmpSs\@FPGA application. To execute the application itself, refer to :ref:`running-ompss2atfpga-programs`. Get access to an installed fpga ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To run OmpSs\@FPGA applications, a job needs to be allocated in the FPGA nodes. These nodes are the *main* partition. Therefore, no partition needs to be specified. For instance, an interactive job looks like this: .. code-block:: text salloc - c 56 --mem=128G -t 4:00:00 --constraint=dmadma This will allocate a full node in *qdma* mode, which is needed for running OmpSs\@FPGA applications. The fill node will be allocated and the 8 FPGAs are available to the user. Information about the FPGAs is stored in ``/etc/motd`` file. This file specifies board serial number, USB port, PCIe slot, and the network ports used by each FPGA. For instance: .. code-block:: text +------------------------------------------------------------------------------------------------------------------------------------+ | +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+ | | |swp26 | |swp25 | |swp24 | |swp23 | |swp22 | |swp21 | |swp20 | |swp19 | | | +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+ | +-------^---------------^----------------^----------------^----------------^---------------^----------------^---------------^--------+ | | | | | | | | +-------|--------+------|--------+-------|--------+-------|--------+-------|--------+------|--------+-------|-------+-------|--------+ | v | v | v | v | v | v | v | v | | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | | | | | | | | | | | | | | | | | | | | | | | | | | | |QSFP0 | | |QSFP0 | | |QSFP0 | | |QSFP0 | | |QSFP0 | | |QSFP0 | | |QSFP0 | | |QSFP0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | | | | | | | | | | | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | | | | | | | | | | | | | | | | | | | | | | | | | | | |QSFP1 | <----> |QSFP1 | | |QSFP1 | <----> |QSFP1 | | |QSFP1 | <----> |QSFP1 | | |QSFP1 | <----> |QSFP1 | | | | | | | | | | | | | | | | | | | | | | | | | | | | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | +------+ | | onic180s0f0 | onic179s0f0 | onic204s0f0 | onic205s0f0 | onic26s0f0 | onic25s0f0 | onic51s0f0 | onic52s0f0 | | | | | | | | | | | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | | USB-UART- | USB-UART- | USB-UART- | USB-UART- | USB-UART- | USB-UART- | USB-UART- | USB-UART- | | XFL1ND323BSU |XFL1Y1BX0JYT | XFL1E3102VRH | XFL12GU0UBJA | XFL1UZW5U0MR |XFL1G5IYME1R | XFL12IUWGVDB | XFL1D2QP00YZ | +----------------+---------------+----------------+----------------+----------------+---------------+---------------+----------------+ +------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+ | FPGA Card | Chassis | FPGA Serial | PCIe Bus | USBPort | ttyUSBx | QSFP0 | QSFP1 | QDMA onic | onic IP | +------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+ | fpgan08f01 | 3 | XFL1D2QP00YZ | 34:00.0 | 1 | USB-UART-XFL1D2QP00YZ | Switch | fpgan08f02 | onic52s0f0 | 10.0.1.1/24 | +------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+ | fpgan08f02 | 4 | XFL12IUWGVDB | 33:00.0 | 2 | USB-UART-XFL12IUWGVDB | Switch | fpgan08f01 | onic51s0f0 | 10.0.2.1/24 | +------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+ | fpgan08f03 | 5 | XFL1G5IYME1R | 19:00.0 | 3 | USB-UART-XFL1G5IYME1R | Switch | fpgan08f04 | onic25s0f0 | 10.0.3.1/24 | +------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+ | fpgan08f04 | 6 | XFL1UZW5U0MR | 1a:00.0 | 4 | USB-UART-XFL1UZW5U0MR | Switch | fpgan08f03 | onic26s0f0 | 10.0.4.1/24 | +------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+ | fpgan08f05 | 7 | XFL12GU0UBJA | cd:00.0 | 5 | USB-UART-XFL12GU0UBJA | Switch | fpgan08f06 | onic205s0f0 | 10.0.5.1/24 | +------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+ | fpgan08f06 | 8 | XFL1E3102VRH | cc:00.0 | 6 | USB-UART-XFL1E3102VRH | Switch | fpgan08f05 | onic204s0f0 | 10.0.6.1/24 | +------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+ | fpgan08f07 | 9 | XFL1Y1BX0JYT | b3:00.0 | 7 | USB-UART-XFL1Y1BX0JYT | Switch | fpgan08f08 | onic179s0f0 | 10.0.7.1/24 | +------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+ | fpgan08f08 | 10 | XFL1ND323BSU | b4:00.0 | 8 | USB-UART-XFL1ND323BSU | Switch | fpgan08f07 | onic180s0f0 | 10.0.8.1/24 | +------------+---------+--------------+----------+---------+-----------------------+--------+------------+-------------+-------------+ Loading bistreams ^^^^^^^^^^^^^^^^^ The FPGA bitstream needs to be loaded before the application can run. The ``load_bitstream`` utility is provided in order to simplify the FPGA configuration. .. code-block:: text load_bitstream bitstream.bit [index] ... The utility receives a second parameter to indicate which of the FPGAs to program. More than one index can be specified. In such case, all the specified FPGAs will be programmed using the given bitstream. To know which FPGAs indices have been allocated, run ``load_bitstream`` with the help (``-h``) option. The output should be similar to this: .. code-block:: text Usage load_bitstream bitstream.bit [index] Available devices: index: jtag serial pcie 0: XFL1D2QP00YZ 34:00.0 1: XFL12IUWGVDB 33:00.0 2: XFL1G5IYME1R 19:00.0 3: XFL1UZW5U0MR 1a:00.0 4: XFL12GU0UBJA cd:00.0 5: XFL1E3102VRH cc:00.0 6: XFL1Y1BX0JYT b3:00.0 7: XFL1ND323BSU b4:00.0 Set up qdma queues ^^^^^^^^^^^^^^^^^^ .. note:: This step is performed by ``load_bitstream`` script, which creates a single bidirectional memory mapped queue. This is only needed if other configuration is needed. For DMA transfers to be performed between system main memory and the FPGA memory, qdma queues has to be set up by the user *prior to any execution*. In this case ``dma-ctl`` tool is used. For instance: In order to create and start a memory mapped qdma queue with index 1 run: .. code-block:: text dma-ctl qdmab3000 q add idx 1 mode mm dir bi dma-ctl qdmab3000 q start idx 1 mode mm dir bi OmpSs runtime system expects an mm queue at index 1, which can be created with the commands listed above. In the same fashion, these queues can also be removed: .. code-block:: text dma-ctl qdmab3000 q stop idx 1 mode mm dir bi dma-ctl qdmab3000 q del idx 1 mode mm dir bi For more information, see .. code-block:: text dma-ctl --help Get current bitstream info ^^^^^^^^^^^^^^^^^^^^^^^^^^ In order to get information about the bitstream currently loaded into the FPGA, the tool ``read_bitinfo`` is installed in the system. This tool is available when the ``ompss-2`` environment module is loaded. .. code-block:: text read_bitinfo Note that an active slurm reservation is needed in order to query the FPGA. This call should return something similar to the sample output for a OMPIF test application: .. code-block:: text Bitinfo of FPGA 0000:cc:00.0: Bitinfo version: 13 Bitstream user-id: 0x479B8510 AIT version: 7.7.2 Wrapper version 13 Number of acc: 5 Board base frequency (MHz) 100.000000 Interleaving not enabled Features: [ ] Instrumentation [ ] Hardware counter [x] Performance interconnect [ ] Simplified interconnection [x] POM AXI-Lite [x] POM task creation [ ] POM dependencies [ ] POM lock [x] POM spawn queues [ ] Power monitor (CMS) [ ] Thermal monitor (sysmon) [x] OMPIF Managed rstn addr 0x10000 Cmd In addr 0xC000 len 128 Cmd Out addr 0xE000 len 128 Spawn In addr 0x8000 len 1024 Spawn Out addr 0xA000 len 1024 Hardware counter not enabled POM AXI-Lite addr 0x4000 Power monitor (CMS) not enabled Thermal monitor (sysmon) not enabled xtasks accelerator config: type count freq(KHz) description 8381065717 1 100000 send_receive_test 8454279320 1 100000 allgather_test_task 7899490654 1 100000 broadcast_test_task 4294967299 1 100000 ompif_message_sender 4294967300 1 100000 ompif_message_receiver ait command line: ait --name=ompif_test --board=alveo_u55c -c=100 --enable_pom_axilite --interconnect_opt=performance --wrapper_version 13 Hardware runtime VLNV: bsc:ompss:picos_ompss_manager:7.3 Running cluster applications ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ See :ref:`run-OMPIF-applications`.