mitchel at navships.com
Wed Jul 23 22:15:31 EDT 2003
Here are a few pictures of the culprite. Any suggestions on how to fix it
other than buying a whole new case would be appreciated
You can also see how many I'm down... it should read 65 nodes (64 + 1 head
----- Original Message -----
From: "Robert G. Brown" <rgb at phy.duke.edu>
To: "Mitchel Kagawa" <mitchel at navships.com>
Cc: <beowulf at beowulf.org>
Sent: Wednesday, July 23, 2003 10:14 AM
Subject: Re: Thermal Problems
> On Wed, 23 Jul 2003, Mitchel Kagawa wrote:
> > I run a small 64 node cluster each with dual AMD MP2200's in a 1U
> > I am having problems with some of the nodes overheating and shutting
> > We are using Dynatron 1U CPU fans which are supposed to spin at 5400 rpm
> > I notice that a lot (25%) of the fans tend to freeze up or blow the
> > and spin at only 1000 RPM, which causes the cpu to overheat. After
> > inspection I noticed that the heatsink and fan sit very close to the lid
> > the case. I was wondering how much clearance is needed between the lid
> > the fan that blown down onto the short copper heatsink? When I put the
> > on the case it is almost as if the fan is working in a vaccum because it
> > actually speeds up an aditional 600-700 rpm to over 6000 rpm... like
> > is no air resistance. Could this be why the fans are crapping out? I
> > thinking that a 60x60x10mm cpu fan that has air intakes on the side of
> > fan might work better but I have not seen any... have you?
> > Also the vendor suggested that we sepetate the 1U cases because he
> > that there is heat transfer between the nodeswhen they are stacked right
> > top of eachother. I thought that if one node is running at 50c and
> > node is running at 50c it wont generate a combined heatload of more than
> > right.
> AMD's really hate to run hot, and duals in 1U require some fairly
> careful engineering to run cool enough, stably. Who is your vendor?
> Did they do the node design or did you? If they did, you should be able
> to ask them to just plain fix it -- replace the fans or if necessary
> reengineer the whole case -- to make the problem go away.
> Issues like fan clearance and stacking and overall airflow through the
> case are indeed important. Sometimes things like using round instead of
> ribbon cables (which can turn sideways and interrupt airflow) makes a
> big difference. Keeping the room's ambient air "cold" (as opposed to
> "comfortable") helps. There is likely some heat transfer vertically
> between the 1U cases, but if you go to the length of separating them you
> might as well have used 2U cases in the first place.
> From your description, it does sound like you have some bad fans.
> Whether they are bad (as in a bad design, poor vendor), or bad (as in
> installed "incorrectly" in a case/mobo with inadequate clearance causing
> them to fail), or bad (as in you just happened to get some fans from a
> bad production batch but replacements would probably work fine) it is
> very hard to say, and I don't envy you the debugging process of finding
> out which. We've been the route of replacing all of the fans once
> ourselves so it can certainly happen...
> > Mitchel Kagawa
> > Systems Admin.
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> Robert G. Brown http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf