Dual-Athlon Cluster Problems

Craig Tierney ctierney at hpti.com
Mon Jan 27 13:09:36 EST 2003


On Sun, Jan 26, 2003 at 10:13:09PM -0800, Ben Ransom wrote:
...stuff deleted...
> 
> It is still curious to me, that we can run other codes on Dolphin SCI and 
> showing 100% cpu utilization (full power/heat) on a ring away from a 
> suspect SCI card.  This implies the reliability is code dependant, as other 
> have alluded.  I spose this may be due to the amount of message 
> passing?  Hopefully the problem will disappear once we get our full Dolphin 
> set in working order.

Are you using the ScaMPI or are you using MPICH over SCI?  ScaMPI is fast,
but implemented MPI differently than MPICH did.  If the code runs correctly
with MPICH in other places you might try using MPICH on your cluster as
well.  That might fix the problem if it is message passing related or at
least provide more data to help debug the real problem.

Craig

-- 
Craig Tierney (ctierney at hpti.com)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list