Page 5 of 6
Fig. 5. Slurm Job Flow for Composable Nodes
3.7 Slurm Job Results
In order to test the queues, three simple scripts were created:- - request 0 GPUs from the normal queue
- - request 2 GPUs from the 2gpu queue
- request 4 GPUs from the 4gpu queue
Each script counts the number of GPUs available (using lspci) and waited 30 seconds before completing. The pertinent part of is shown below.
#SBATCH --partition=4gpu SLEEPTIME=30 ME=$(hostname) GPUS=$(lspci|grep -i nvidia|wc -l) echo "My name is $ME and I have $GPUS GPUs" echo Sleeping for $SLEEPTIME sleep $SLEEPTIME echo done
The correct number of GPUs was reported for each script as indicated in Table 1 above. While the test was running, an sinfo command was run to show the state of the queues. (output compressed and abbreviated):
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST normal* up inf 2 drain~ kraken-a,leviathan-a 2gpu up inf 2 drain~ kraken-a-2gpu,leviathan-a-2gpu 4gpu up inf 1 alloc# kraken-a-4gpu 4gpu up inf 1 drain~ leviathan-a-4gpu
Notice, all the other nodes are in the drain configuration (not available) and the "#" next to a "alloc" indicates the node is allocated and is in the power-up state.