3.2.2. Accelerator placement options

This section documents how to constrain accelerators to a particular SLR region in a device.

There are three flags that control accelerator placement:

  • Constraints: --floorplanning_constr
  • Slices: --slr_slices
  • Configuration file --placement_file

On an Alveo U200, which has 3 Super logic regions, external interfaces are placed as follows:

../_images/u200_layout.png

Interface layout for Alveo U200

By default, all user accelerators are placed as vivado considers. Sometimes it places a kernel accelerator between 2 SLR, usually negatively impacting timing. Users can enforce accelerators to be constrained to an slr region in order to prevent it from being scattered across multiple SLR. For instance, a user can specify something as follows:

../_images/u200_placement_diagram.png

Placed instance diagram

Additionally, users can apply register slices between the SLR crossings to further help timing at the cost of using additional fpga resources. Users can control this by setting different settings for constraints and register slices. For example, activating register slices for the previous design will result in the following layout:

../_images/u200_placement_slices_diagram.png

Placed instance diagram with register slices

User flags

Constraints

Constraints affecting different sets of IPs can be individually enabled. This is done by setting the --floorplanning_constr=<constraint level> flag. This can take four different values: [none], acc, static, all.

These are specified as follows:

[none]

Nothing is constrained to a particular region. This is the default behavior.

This is done by not specifying the --floorplanning_constr

acc

Accelerator kernels are constrained to be in a slr region.

static

Static logic is constrained to a particular region. Each of the static logic IP is constrained to its relevant region. For instance PCI IP is going to be constrained to the slr that contains it IO pins, which is SLR 1 in the case of the U200.

all

Enables acc and static

Slices

Slices can be automatically placed in SLR crossings to improve timing. --slr_slices flag controls the settings. It can take four different values: [none], acc, static, all.

[none]

No register slices are created for slr crossing, this is the default behaviour.

This is achieved by omitting --slr_slices flag.

acc

Register slices for SLR crossing are created for accelerator related interfaces: - Accelerator - hw runtime - Accelerator - DDR interconnect

static

Register slices are created for static logic (DDR MIGs, PCI, communication infrastructure, etc.).

all

Enables both acc and static.

Configuration file

Configuration file is a json file that determines the placement of each accelerator instance. It’s specified using the --placement_file option. It should contain a dictionary of accelerator types Each accelerator type must contain a list of SLR numbers, one for each instance, indicating where the accelerator is going to be placed. For instance:

{
    "calculate_forces_BLOCK" : [0, 0, 1, 2, 2],
    "solve_nbody_task": [1],
    "update_particles_BLOCK": [1]
}

This constrains 2 of the 4 calculate_forces_BLOCK accelerators to be in SLR0, one of them in SLR1 and the remaining 2 in SLR2. Also, solve_nbody_task and update_particles_BLOCK will be placed in SLR1.