Skip to content

Mount the hut nix store for SLURM jobs

Rodrigo Arias Mallo requested to merge slurm-shared-nix-store into master

Until we don't transition to a global nix store #42 or fix the overlay problems #41, this is an intermediate solution that allows us to run parallel jobs without the need to copy derivations to the compute nodes.

The trick resides in the private mount namespace that systemd creates to the slurm daemon, which replaces the /nix/store by a read only mount of the hut store exported via NFS.

There are some drawbacks:

  • The local binaries in /run/current-system/sw/bin are not available, as the overlay FS doesn't work. But at least it allows us to run some jobs in the meanwhile.

  • The nix build/shell/develop run as if executed outside the slurm mount namespace, as they contact with the daemon for build operations, and the daemon only sees the local store. But nothing will appear inside the slurm namespace. The environment must be entered from the hut node first, and then the srun command must be launched with all dependencies in hut.

It seems to be immune to the overlay FS "caching" problem, where a ls of a missing path that later becomes readable doesn't work:

hut% nix eval nixpkgs#cowsay.outPath
"/nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0"
hut% ls -d /nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0
ls: cannot access '/nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0': No such file or directory
hut% srun ls -d /nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0
/run/current-system/sw/bin/ls: cannot access '/nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0': No such file or directory
srun: error: owl1: task 0: Exited with exit code 2
hut% nix shell nixpkgs#cowsay
hut% cowsay hi from hut
 _____________
< hi from hut >
 -------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
hut% srun cowsay hi from owl
 _____________
< hi from owl >
 -------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
hut% srun ls -d /nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0
/nix/store/k3xhwh54bf9z6xxsdz32lhq4h7c5fimj-cowsay-3.7.0

Merge request reports