6.3. Quar cluster installation

La Quar is a small town and municipality located in the comarca of Berguedà, in Catalonia.

It’s also an intel machine containing a xilinx Alveo u200 accelerator card.

The OmpSs@FPGA releases are automatically installed in the Quar cluster. They are available through a module file for each target architecture. This document describes how to load and use the modules to compile an example application. Once the modules are loaded, the workflow in the Quar cluster should be the same as in the Docker images.

6.3.1. General remarks

  • All software is installed in a version folder under the /opt/bsc directory.

  • During the updates, the installation will not be available for the users’ usage.

  • Usually, the installation just takes 20 minutes.

  • After the installation, an informative email will be sent.

6.3.2. Logging into quar

Quar is accessible from HCA ssh.hca.bsc.es Alternatively, it can be accessed through the 4819 port in HCA and ssh connection will be redirected to the actual host:

ssh -p 4819 ssh.hca.bsc.es

Also, this can be automated by adding a quar host into ssh config:

Host quar
    HostName ssh.hca.bsc.es
    Port 8419

6.3.3. Module structure

The ompss modules are:

  • ompss/x86_fpga/[release version]

It requires having some vivado loaded:

module load vivado ompss/x86_fpga/git

To list all available modules in the system run:

module avail

6.3.4. Build applications

To generate an application binary and bitstream, you could refer to Compile OmpSs@FPGA programs as the steps are general enough.

Note thet the appropriate modules need to be loaded. See Module structure.

6.3.5. Running applications

6.3.5.1. Get access to an installed fpga

Quar cluster uses SLURM in order to manage access to computation resources. Therefore, to be able to use the resources of an FPGA, an allocation in one of the partitions has to be made.

There is 1 partition in the cluster: * fpga: Alveo U200 board

In order to make an allocation, you must run salloc:

salloc -p [partition]

For instance:

salloc -p fpga

Then get the node that has been allocated for you:

squeue

The output should look similar to this:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
 1312 fpga          bash afilguer  R      17:14      1 quar

6.3.5.2. Loading bistreams

The fpga bitstream needs to be loaded before the application can run. The load_bitstream utility is provided in order to simplify the FPGA configuration.

load_bitstream bitstream.bit

6.3.5.3. Set up qdma queues

Note

This step is performed by load_bitstream script, which creates a single bitirectional memory mapped queue. This is only needed if other configuration is needed.

For DMA transfers to be performed between system main memory and the FPGA memory, qdma queues has to be set up by the user prior to any execution.

In this case dmactl tool is used. For instance: In order to create and start a memory mapped qdma queue with index 1 run:

dmactl qdma02000 q add idx 1 mode mm dir bi
dmactl qdma02000 q start idx 1 mode mm dir bi

OmpSs runtime system expects an mm queue at index 1, which can be created with the commands listed above.

In the same fashion, these queues can also be removed:

dmactl qdma02000 q stop idx 1 mode mm dir bi
dmactl qdma02000 q del idx 1 mode mm dir bi

For more information, see

dmactl --help

6.3.5.4. Get current bitstream info

In order to get information about the bitstream currently loaded into the FPGA, the tool read_bitinfo is installed in the system.

read_bitinfo

Note that an active slurm reservation is needed in order to query the FPGA.

This call should return something similar to the sample output for a matrix multiplication application:

Bitstream info version: 6
Number of acc:  5
Base freq:      300 MHz
AIT version:    3.8
Wrapper version 10
Features:
0x1c4
[ ] Instrumentation
[ ] DMA engine
[x] Performance interconnect
[x] Hardware Runtime
[x] Extended HW runtime
[x] SOM
[ ] Picos
Interconnect level: basic
xtasks accelerator config:
type    #ins    name    freq
0000000006708694863     001     matmulFPGA                      300
0000000004353056269     004     matmulBlock                     300

ait command line:
ait.pyc --disable_utilization_check --name=matmul --board=alveo_u200 -c=300 --hwruntime=som --interconnection_opt=performance --wrapper_version=10

Hardware runtime VLNV:
bsc:ompss:smartompssmanager:3.2

6.3.5.5. Debugging with HW server

Although it is possible to interact with Vivado’s Hardware Manager through ssh-based X forwarding, Vivado’s GUI might not be very responsive over remote connections. To avoid this limitation, one might connect a local Hardware Manager instance to targets hosted on Xaloc, completely avoiding X forwarding, as follows.

  1. On Xaloc, launch Vivado’s HW server by running exec hw_server -d on Vivado’s TCL console.

  2. On the local machine, assuming that Xaloc’s HW Server runs on port 3121, let all connections to port 3121 be forwarded to quar by doing ssh -L 3121:quar:3121 [USER]@ssh.hca.bsc.es -p 8410 .

  3. Finally, from the local machine, connect to Xaloc’s hardware server:

    • Open Vivado’s Hardware Manager.

    • Launch the “Open target” wizard.

    • Establish a connection to the local HW server, which will be just a bridge to the remote instance.