.. index:: single: xaloc Xaloc cluster installation ============================= .. _xaloc-installation: The `OmpSs-2\@FPGA releases `__ are automatically installed in the Xaloc cluster. They are available through a module file for each target architecture. This document describes how to load and use the modules to compile an example application. Once the modules are loaded, the workflow in the Xaloc cluster should be the same as in the Docker images. General remarks --------------- * The OmpSs\@FPGA toolchain is installed in a version folder under the ``/opt/bsc/`` directory. * Third-party libraries required to run some programs are installed in the corresponding folder under the ``/opt/lib/`` directory. * The rest of the software (Xilinx toolchain, slurm, modules, etc.) is installed under the ``/tools/`` directory. Node specifications ------------------- * CPU: Dual Intel Xeon X5680 * https://ark.intel.com/content/www/us/en/ark/products/47916/intel-xeon-processor-x5680-12m-cache-3-33-ghz-6-40-gts-intel-qpi.html * Main memory: 72GB DDR3-1333 * FPGA: Xilinx Versal VCK5000 * https://www.amd.com/en/products/adaptive-socs-and-fpgas/evaluation-boards/vck5000.html .. _xaloc-login: Logging into xaloc ------------------ Xaloc is accessible from HCA ``ssh.hca.bsc.es`` Alternatively, it can be accessed through the ``8410`` port in HCA and ssh connection will be redirected to the actual host: .. code:: bash ssh -p 8410 ssh.hca.bsc.es Also, this can be automated by adding a ``xaloc`` host into ssh config: .. code:: bash Host xaloc HostName ssh.hca.bsc.es Port 8410 .. _xaloc-modules: Module structure ---------------- The ompss-2 modules are: * ``ompss-2/x86_64/*[release version]*`` This will automatically load the default Vivado version, although an arbitrary version can be loaded before ompss: .. code-block:: text module load vivado/2023.2 ompss-2/x86_fpga/git To list all available modules in the system run: .. code-block:: text module avail Build applications ------------------ To generate an application binary and bitstream, you could refer to :ref:`compile-ompss2atfpga-programs` as the steps are general enough. Note that the appropriate modules need to be loaded. See :ref:`xaloc-modules`. Running applications -------------------- .. warning:: Although the Versal board is installed and can be allocated via slurm there is no toolchain support yet. Get access to an installed fpga ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Xaloc cluster uses SLURM in order to manage access to computation resources. Therefore, to be able to use the resources of an FPGA, an allocation in one of the partitions has to be made. There is 1 partition in the cluster: * ``fpga``: a Versal VCK5000 board The easiest way to allocate an FPGA is to run bash through ``srun`` with the ``--gres`` option: .. code-block:: text srun --gres=fpga:BOARD:N --pty bash Where ``BOARD`` is the FPGA to allocate, in this case ``versal``, and ``N`` the number of FPGAs to allocate, that is 1. For instance, the command: .. code-block:: text srun --gres=fpga:versal:1 --pty bash Will allocate the FPGA and run an interactive bash with the required tools and file permissions already set by slurm. To get information about the active slurm jobs, run: .. code-block:: text squeue The output should look similar to this: .. code-block:: text JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1312 fpga bash afilguer R 17:14 1 xaloc .. Loading bistreams ^^^^^^^^^^^^^^^^^ The FPGA bitstream needs to be loaded before the application can run. The ``load_bitstream`` utility is provided in order to simplify the FPGA configuration. .. code:: bash load_bitstream bitstream.bit [index] The utility receives a second optional parameter to indicate which of the allocated FPGAs to program, the default behavior is to program all the allocated FPGAs with the bitstream. To know which FPGAs indices have been allocated, run ``load_bitstream`` with the help (``-h``) option. The output should be similar to this: .. code-block:: text Usage load_bitstream bitstream.bit [index] Available devices: index: jtag pcie usb 0: XXXXXXXXXXXXX 0000:02:00.0 002:002 Set up qdma queues ^^^^^^^^^^^^^^^^^^ .. note:: This step is performed by ``load_bitstream`` script, which creates a single bidirectional memory mapped queue. This is only needed if other configuration is needed. For DMA transfers to be performed between system main memory and the FPGA memory, qdma queues has to be set up by the user *prior to any execution*. In this case ``dmactl`` tool is used. For instance: In order to create and start a memory mapped qdma queue with index 1 run: .. code-block:: text dmactl qdma02000 q add idx 1 mode mm dir bi dmactl qdma02000 q start idx 1 mode mm dir bi OmpSs runtime system expects an mm queue at index 1, which can be created with the commands listed above. In the same fashion, these queues can also be removed: .. code-block:: text dmactl qdma02000 q stop idx 1 mode mm dir bi dmactl qdma02000 q del idx 1 mode mm dir bi For more information, see .. code:: bash dmactl --help Get current bitstream info ^^^^^^^^^^^^^^^^^^^^^^^^^^ In order to get information about the bitstream currently loaded into the FPGA, the tool ``read_bitinfo`` is installed in the system. .. code:: bash read_bitinfo Note that an active slurm reservation is needed in order to query the FPGA. This call should return something similar to the sample output for a matrix multiplication application: .. code-block:: text Reading bitinfo of FPGA 0000:b3:00.0 Bitstream info version: 11 Number of acc: 8 AIT version: 7.1.0 Wrapper version 13 Board base frequency (Hz) 156250000 Interleaving stride 32768 Features: [ ] Instrumentation [ ] Hardware counter [x] Performance interconnect [ ] Simplified interconnection [ ] POM AXI-Lite [x] POM task creation [x] POM dependencies [ ] POM lock [x] POM spawn queues [ ] Power monitor (CMS) enabled [ ] Thermal monitor (sysmon) enabled Cmd In addr 0x2000 len 128 Cmd Out addr 0x4000 len 128 Spawn In addr 0x6000 len 1024 Spawn Out addr 0x8000 len 1024 Managed rstn addr 0xA000 Hardware counter addr 0x0 POM AXI-Lite addr 0x0 Power monitor (CMS) addr 0x0 Thermal monitor (sysmon) addr 0x0 xtasks accelerator config: type count freq(KHz) description 5839957875 1 300000 matmulFPGA 7602000973 7 300000 matmulBlock ait command line: ait --name=matmul --board=alveo_u200 -c=300 --memory_interleaving_stride=32K --simplify_interconnection --interconnect_opt=performance --interconnect_regslice=all --floorplanning_constr=all --slr_slices=all --placement_file=u200_placement_7x256.json --wrapper_version 13 Hardware runtime VLNV: bsc:ompss:picosompssmanager:7.3 bitinfo note: '' Remote debugging ^^^^^^^^^^^^^^^^ Although it is possible to interact with Vivado's Hardware Manager through ssh-based X forwarding, Vivado's GUI might not be very responsive over remote connections. To avoid this limitation, one might connect a local Hardware Manager instance to targets hosted on Quar, completely avoiding X forwarding, as follows. #. On Xaloc, when allocating an FPGA with slurm, a Vivado HW server is automatically launched for each FPGA: * FPGA 0 uses port 3120 #. On the local machine, assuming that Xaloc's HW Server runs on port 3121, let all connections to port 3121 be forwarded to Xaloc by doing ``ssh -L 3121:xaloc:3121 [USER]@ssh.hca.bsc.es -p 8410`` . #. Finally, from the local machine, connect to Xaloc's hardware server: * Open Vivado's Hardware Manager. * Launch the "Open target" wizard. * Establish a connection to the local HW server, which will be just a bridge to the remote instance.