Thermal Problems

Robert G. Brown rgb at phy.duke.edu
Wed Jul 23 16:14:40 EDT 2003


On Wed, 23 Jul 2003, Mitchel Kagawa wrote:

> I run a small 64 node cluster each with dual AMD MP2200's in a 1U chassis.
> I am having problems with some of the nodes overheating and shutting down.
> We are using Dynatron 1U CPU fans which are supposed to spin at 5400 rpm but
> I notice that a lot (25%) of the fans tend to freeze up or blow the bearings
> and spin at only 1000 RPM, which causes the cpu to overheat.  After careful
> inspection I noticed that the heatsink and fan sit very close to the lid of
> the case.  I was wondering how much clearance is needed between the lid and
> the fan that blown down onto the short copper heatsink?  When I put the lid
> on the case it is almost as if the fan is working in a vaccum because it
> actually speeds up an aditional 600-700 rpm to over 6000 rpm... like there
> is no air resistance.  Could this be why the fans are crapping out?  I was
> thinking that a 60x60x10mm cpu fan that has air intakes on the side of the
> fan might work better but I have not seen any... have you?
> 
> Also the vendor suggested that we sepetate the 1U cases because he belives
> that there is heat transfer between the nodeswhen they are stacked right on
> top of eachother.  I thought that if one node is running at 50c and another
> node is running at 50c it wont generate a combined heatload of more than 50c
> right.

AMD's really hate to run hot, and duals in 1U require some fairly
careful engineering to run cool enough, stably.  Who is your vendor?
Did they do the node design or did you?  If they did, you should be able
to ask them to just plain fix it -- replace the fans or if necessary
reengineer the whole case -- to make the problem go away.

Issues like fan clearance and stacking and overall airflow through the
case are indeed important.  Sometimes things like using round instead of
ribbon cables (which can turn sideways and interrupt airflow) makes a
big difference.  Keeping the room's ambient air "cold" (as opposed to
"comfortable") helps.  There is likely some heat transfer vertically
between the 1U cases, but if you go to the length of separating them you
might as well have used 2U cases in the first place.

>From your description, it does sound like you have some bad fans.
Whether they are bad (as in a bad design, poor vendor), or bad (as in
installed "incorrectly" in a case/mobo with inadequate clearance causing
them to fail), or bad (as in you just happened to get some fans from a
bad production batch but replacements would probably work fine) it is
very hard to say, and I don't envy you the debugging process of finding
out which.  We've been the route of replacing all of the fans once
ourselves so it can certainly happen...

   rgb

> 
> 
> Mitchel Kagawa
> Systems Admin.
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list