Scyld Nodes Freezing w/ SMP (fwd)
nfaerber at penguincomputing.com
Fri Nov 14 12:42:57 EST 2003
> The problem:
> The model runs fine when run on 4 processors (1 on each compute node).
> However, when I use the SMP capabilities of the machine and try to run on,
> say, 8 processors (using both CPUs on each compute node), everything will
> run fine for a while. Then, at a non-consistent time, a node will
> invariably freeze up. The cluster loses its connection to the
> node and I cannot communicate with it using any of the cluster tools -
> sometimes it will automatically reboot, but usually it requires me to go
> perform a hard reset on the node.
> However, I have found that in most cases if I run 2 jobs in parallel (i.e.
> 2 4-cpu processes, each using only 1 CPU on each node) things seem to work
> fine. Nodes may still freeze from time to time but not nearly as often.
Do you have experience running this software on other clusters with SMP?
I have seen a software package that did not perform well (or properly)
on SMP systems. He have a customer that could only run one process per
system. This limitation was a known to the software vendor and
customer. It may not be the case now for that piece of software if it
has matured since then.
Nate Faerber, Engineer
Tel: 415-358-2666 Fax: 415-358-2646 Toll Free: 888-PENGUIN
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf