Implement --smp-threads-per-core option
Currently, Nanos++ can only bind one thread to one CPU (hwthread).
With SMT processors, we want to limit the number of threads per core (that could be done with --binding-stride
), but still making full use of the core by binding the thread to a subset of hwthreads inside the core.
The new option --smp-threads-per-core
will allow to reduce the threads created per core and bind those threads to multiple hwthreads.
Example (2 cores SMT-4):
$ lstopo
...
L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#1)
PU L#2 (P#2)
PU L#3 (P#3)
L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#4 (P#4)
PU L#5 (P#5)
PU L#6 (P#6)
PU L#7 (P#7)
...
$ NX_ARGS="--smp-threads-per-core=1 --verbose" ./app
...
[0]Binding thread 0 to cpus [0-3]
[1]Binding thread 1 to cpus [4-7]
...
$ NX_ARGS="--smp-threads-per-core=2 --verbose" ./app
...
[0]Binding thread 0 to cpus [0,1]
[1]Binding thread 1 to cpus [2,3]
[2]Binding thread 2 to cpus [4,5]
[3]Binding thread 3 to cpus [6,7]
...
$ NX_ARGS="--verbose" ./app
...
[0]Binding thread 0 to cpus [0]
[1]Binding thread 1 to cpus [1]
[2]Binding thread 2 to cpus [2]
[3]Binding thread 3 to cpus [3]
[4]Binding thread 4 to cpus [4]
[5]Binding thread 5 to cpus [5]
[6]Binding thread 6 to cpus [6]
[7]Binding thread 7 to cpus [7]
...