Scyld Nodes Freezing w/ SMP (fwd)

Nate Faerber nfaerber at penguincomputing.com
Fri Nov 14 12:42:57 EST 2003


> The problem:
> The model runs fine when run on 4 processors (1 on each compute node).
> However, when I use the SMP capabilities of the machine and try to run on,
> say, 8 processors (using both CPUs on each compute node), everything will
> run fine for a while.  Then, at a non-consistent time, a node will
> invariably freeze up.  The cluster loses its connection to the
> node and I cannot communicate with it using any of the cluster tools -
> sometimes it will automatically reboot, but usually it requires me to go
> perform a hard reset on the node.
> 
> However, I have found that in most cases if I run 2 jobs in parallel (i.e.
> 2 4-cpu processes, each using only 1 CPU on each node) things seem to work
> fine.  Nodes may still freeze from time to time but not nearly as often.

Do you have experience running this software on other clusters with SMP?

I have seen a software package that did not perform well (or properly)
on SMP systems.  He have a customer that could only run one process per
system.  This limitation was a known to the software vendor and
customer.  It may not be the case now for that piece of software if it
has matured since then.

-- 
Nate Faerber, Engineer
Tel: 415-358-2666   Fax: 415-358-2646   Toll Free: 888-PENGUIN
PENGUIN COMPUTING
www.penguincomputing.com

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list