cluseter node freezes while running namd 2.5/2.5b1

Joe Landman landman at scalableinformatics.com
Mon Oct 20 11:17:24 EDT 2003


Hi Richard:

  Are your Intel network drivers up to date?  Check on the Intel site.  
If only one node repeatedly freezes (the same node), you might look at 
taking it out of the cluster, and seeing if that improves the 
situation.  If it does, swap the one you took out, with one that is 
still in there, and see if the problem returns.  This will help you 
determine if the problem is node based or system based.

Joe

Richard Brown wrote:

>I have been try to figure this out for the past two
>months with no luck.
>
>I have a 8-node PC cluster that consists of 16 athlon
>mp2200+, msi k7d master-l mb, intel i82557/i82558
>10/100 on-board lan, 500mb kingston ddr266 pc2100
>unbuffered, 3com superstack III baseline 24 port
>10/100 switch.
>
>The cluster was built using oscar2.1/redhat7.3 w/ the
>kernel update 2.4.20-20. namd used includes 2.5b1 and
>the latest 2.5, both linux binary distributions and
>source code builds. the simulation tested is apoa1
>benchmark example.
>
>namd/apoa1 only runs w/o problems on a single cluster
>node, either with one or two cpus. Every time it runs
>on two or more nodes, either using one or two cpus
>from each node, namd/apoa1 stops somewhere in the
>middle of run. One of the nodes freezes and does not
>respond to ping, ssh or the directly attached
>keyboard. Most of the time there were no error
>messages. A few times I received apic error or sorcket
>receive failure. I tried plugging a ps/2 mouse into
>the nodes as some people suggested for a bug of the
>motherboad but it did not help.
>
>I don't know how to proceed from here. Any suggestions
>would be appreciated.
>
>Thanks,
>Richard
>
>
>__________________________________
>Do you Yahoo!?
>The New Yahoo! Shopping - with improved product search
>http://shopping.yahoo.com
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>  
>

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list