[Beowulf] SMP Nodes [still] Freezing w/ Scyld (follow-up, long)
wade.hampton at nsc1.net
Wed Jan 7 16:36:23 EST 2004
Timothy R. Whitcomb wrote:
>No, I'm not using myrinet. It's a Scyld 28cz4 with 100BaseT
>interconnects, dual-processor Tyan AMD boards.
I too missed your first post, but this thread reminded me of
problems I had 6 or so months ago. We had some node
hangs with Scyld our cluster, however we were using
Tyan dual XEON boards, not AMD. We also had
28cz4. We were using the 100T interface (not the GB).
A reboot of the node fixed the problem.
We've had problems with this cluster recently, but it is
now using the GB interfaces and I have not seen this problem.
(Note: getting GB working in this config was a PITA.)
Is the node locking up or is it just losing networking? Can you attach
monitors to the nodes and see if it is still running when it "locks up"?
My head node (same Tyan dual MB as my nodes, hyperthread enabled),
would sometimes lose the eepro100 100T connection (outside world),
but the e1000 (cluster) interface kept working. However, I would still
have to reboot the head node. This would happen every few months
initially, but then happened almost every day.
I had to install a PCI NIC and disable the onboard 100T interface.
It worked fine for months after that (until I lost my RAID -- another
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf