Fig. 5. Slurm Job Flow for Composable Nodes

3.7 Slurm Job Results

In order to test the queues, three simple scripts were created:
  1. - request 0 GPUs from the normal queue
  2. - request 2 GPUs from the 2gpu queue
  3. request 4 GPUs from the 4gpu queue

Each script counts the number of GPUs available (using lspci) and waited 30 seconds before completing. The pertinent part of is shown below.

#SBATCH --partition=4gpu
GPUS=$(lspci|grep -i nvidia|wc -l)
echo "My name is $ME and I have $GPUS GPUs"
echo Sleeping for $SLEEPTIME
echo done

The correct number of GPUs was reported for each script as indicated in Table 1 above. While the test was running, an sinfo command was run to show the state of the queues. (output compressed and abbreviated):

normal* up inf 2 drain~ kraken-a,leviathan-a
2gpu    up inf 2 drain~ kraken-a-2gpu,leviathan-a-2gpu
4gpu    up inf 1 alloc# kraken-a-4gpu
4gpu    up inf 1 drain~ leviathan-a-4gpu

Notice, all the other nodes are in the drain configuration (not available) and the "#" next to a "alloc" indicates the node is allocated and is in the power-up state.

