Thermal Problems

Mitchel Kagawa mitchel at navships.com
Wed Jul 23 22:15:31 EDT 2003


Here are a few pictures of the culprite.  Any suggestions on how to fix it
other than buying a whole new case would be appreciated
http://neptune.navships.com/images/oscarnode-front.jpg
http://neptune.navships.com/images/oscarnode-side.jpg
http://neptune.navships.com/images/oscarnode-back.jpg

You can also see how many I'm down... it should read 65 nodes (64 + 1 head
node)
http://neptune.navships.com/ganglia

Mitchel Kagawa
Systems Administrator

----- Original Message -----
From: "Robert G. Brown" <rgb at phy.duke.edu>
To: "Mitchel Kagawa" <mitchel at navships.com>
Cc: <beowulf at beowulf.org>
Sent: Wednesday, July 23, 2003 10:14 AM
Subject: Re: Thermal Problems


> On Wed, 23 Jul 2003, Mitchel Kagawa wrote:
>
> > I run a small 64 node cluster each with dual AMD MP2200's in a 1U
chassis.
> > I am having problems with some of the nodes overheating and shutting
down.
> > We are using Dynatron 1U CPU fans which are supposed to spin at 5400 rpm
but
> > I notice that a lot (25%) of the fans tend to freeze up or blow the
bearings
> > and spin at only 1000 RPM, which causes the cpu to overheat.  After
careful
> > inspection I noticed that the heatsink and fan sit very close to the lid
of
> > the case.  I was wondering how much clearance is needed between the lid
and
> > the fan that blown down onto the short copper heatsink?  When I put the
lid
> > on the case it is almost as if the fan is working in a vaccum because it
> > actually speeds up an aditional 600-700 rpm to over 6000 rpm... like
there
> > is no air resistance.  Could this be why the fans are crapping out?  I
was
> > thinking that a 60x60x10mm cpu fan that has air intakes on the side of
the
> > fan might work better but I have not seen any... have you?
> >
> > Also the vendor suggested that we sepetate the 1U cases because he
belives
> > that there is heat transfer between the nodeswhen they are stacked right
on
> > top of eachother.  I thought that if one node is running at 50c and
another
> > node is running at 50c it wont generate a combined heatload of more than
50c
> > right.
>
> AMD's really hate to run hot, and duals in 1U require some fairly
> careful engineering to run cool enough, stably.  Who is your vendor?
> Did they do the node design or did you?  If they did, you should be able
> to ask them to just plain fix it -- replace the fans or if necessary
> reengineer the whole case -- to make the problem go away.
>
> Issues like fan clearance and stacking and overall airflow through the
> case are indeed important.  Sometimes things like using round instead of
> ribbon cables (which can turn sideways and interrupt airflow) makes a
> big difference.  Keeping the room's ambient air "cold" (as opposed to
> "comfortable") helps.  There is likely some heat transfer vertically
> between the 1U cases, but if you go to the length of separating them you
> might as well have used 2U cases in the first place.
>
> From your description, it does sound like you have some bad fans.
> Whether they are bad (as in a bad design, poor vendor), or bad (as in
> installed "incorrectly" in a case/mobo with inadequate clearance causing
> them to fail), or bad (as in you just happened to get some fans from a
> bad production batch but replacements would probably work fine) it is
> very hard to say, and I don't envy you the debugging process of finding
> out which.  We've been the route of replacing all of the fans once
> ourselves so it can certainly happen...
>
>    rgb
>
> >
> >
> > Mitchel Kagawa
> > Systems Admin.
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> Robert G. Brown                        http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list