Article Index


Fig. 5. Slurm Job Flow for Composable Nodes

3.7 Slurm Job Results

In order to test the queues, three simple scripts were created:
  1. slurm-test.sh - request 0 GPUs from the normal queue
  2. slurm-test-gpu2.sh - request 2 GPUs from the 2gpu queue
  3. slurm-test-gpu4.sh request 4 GPUs from the 4gpu queue

Each script counts the number of GPUs available (using lspci) and waited 30 seconds before completing. The pertinent part of slurm-test-gpu4.sh is shown below.

#SBATCH --partition=4gpu
SLEEPTIME=30
ME=$(hostname)
GPUS=$(lspci|grep -i nvidia|wc -l)
echo "My name is $ME and I have $GPUS GPUs"
echo Sleeping for $SLEEPTIME
sleep $SLEEPTIME
echo done

The correct number of GPUs was reported for each script as indicated in Table 1 above. While the test was running, an sinfo command was run to show the state of the queues. (output compressed and abbreviated):

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal* up inf 2 drain~ kraken-a,leviathan-a
2gpu    up inf 2 drain~ kraken-a-2gpu,leviathan-a-2gpu
4gpu    up inf 1 alloc# kraken-a-4gpu
4gpu    up inf 1 drain~ leviathan-a-4gpu

Notice, all the other nodes are in the drain configuration (not available) and the "#" next to a "alloc" indicates the node is allocated and is in the power-up state.

You have no rights to post comments

Search

Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.

Feedburner


This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.