cluseter node freezes while running namd 2.5/2.5b1
richardlbj at yahoo.com
Sat Oct 18 23:56:37 EDT 2003
I have been try to figure this out for the past two
months with no luck.
I have a 8-node PC cluster that consists of 16 athlon
mp2200+, msi k7d master-l mb, intel i82557/i82558
10/100 on-board lan, 500mb kingston ddr266 pc2100
unbuffered, 3com superstack III baseline 24 port
The cluster was built using oscar2.1/redhat7.3 w/ the
kernel update 2.4.20-20. namd used includes 2.5b1 and
the latest 2.5, both linux binary distributions and
source code builds. the simulation tested is apoa1
namd/apoa1 only runs w/o problems on a single cluster
node, either with one or two cpus. Every time it runs
on two or more nodes, either using one or two cpus
from each node, namd/apoa1 stops somewhere in the
middle of run. One of the nodes freezes and does not
respond to ping, ssh or the directly attached
keyboard. Most of the time there were no error
messages. A few times I received apic error or sorcket
receive failure. I tried plugging a ps/2 mouse into
the nodes as some people suggested for a bug of the
motherboad but it did not help.
I don't know how to proceed from here. Any suggestions
would be appreciated.
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf