[Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script.

Reuti reuti at staff.uni-marburg.de
Wed Sep 1 04:47:29 EDT 2010


Am 01.09.2010 um 09:34 schrieb Christopher Samuel:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 01/09/10 01:58, Reuti wrote:
> 
>> With recent kernels also (kernel) processes in D state
>> count as running.
> 
> I wouldn't say recent, that goes back as far as I can
> remember.
> 
> For instance I've seen RHEL3 (2.4.x - sort of) NFS servers
> with load averages in the 80's where they were run with a lot
> of nfsd's that were blocked waiting for I/O due to ext3.

My impression was always (as there is a similar setting for the load_threshold in OGE), that it should limit the number of jobs on a big SMP machine when you oversubscribe by intention, as not all parallel jobs are really using all the CPU power over their lifetime (maybe such a machine was even operated w/o any NFS). Then allowing e.g. 72 slots for jobs on a 60 core maschine might get most out of it with a load near 100%.

Well, getting now 12 cores in newer CPUs and assemble them to 24 or 48 core machines would make such a setting useful again. Maybe the load sensor should honor only the scheduled jobs' load.

-- Reuti


> cheers!
> Chris
> - -- 
> Christopher Samuel - Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computational Initiative
> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>         http://www.vlsci.unimelb.edu.au/
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAkx+AfwACgkQO2KABBYQAh+QhgCfUUgmyUUGYtQ00Xd8/N/TOXN1
> 47gAn0DYzhSrZV1pY489HpMVhjGNVXPl
> =70PC
> -----END PGP SIGNATURE-----
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Beowulf mailing list