[Beowulf] User resource limits
prentice at ias.edu
Mon Jun 9 11:41:29 EDT 2008
This topic is slightly off topic, since it's not a beowulf specific
problem, but it is HPC-related:
I have several fat servers with 4 cores and 32 GB of RAM, for jobs that
aren't very parallel and need large amounts of RAM. They are not
clustered in any way. At the moment, users ssh into these systems to run
large jobs. Eventually, I will have these nodes managed by a queuing
The problem: Every couple of days, one of these systems become
unresponsive due to OOM errors. If we wait long enough, the offending
job will complete, and everything will return to normal. Since these are
multi-user shared resources, I don't have the luxury of waiting for the
systems to clear themselves up, and I often have to hit the power button.
I would like to impose some CPU and memory limits on users that are hard
limits that can't be changed/overridden by the users. What is the best
way to do this? All I know is environment variables or shell commands
done as the user (ulimit, for example).
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Beowulf