Cool it off

Robert G. Brown rgb at phy.duke.edu
Fri Nov 10 10:30:11 EST 2000


On Thu, 9 Nov 2000, Timm Murray wrote:

> In discussing our plans to build a Beowulf, my high school's computer club
> became concerned about heat.  I suggested a system where you stack the nodes
> in coloums of three or four computers.  Every three or four rows, you have a
> "cooling row."  The cases in the cooling row have holes in their sides, no
> actual computer equiptment, and contain only a few power supplies, fans, and
> some scrap metal (old hard drives, etc.) for use as heatsinks.  The fans
> would be arranged so that the airflow is directed from one end to the other,
> thus ensuring a smooth flow.  Holes in the cases allow the air to go between
> the cases.
> 
> Has anybody tried this before?  Comments?

Why not just put them on a heavy-duty steel shelving unit and leave an
empty shelf and mount a fan at one end?  Or put a "sandwich" shelf every
row (say, wood/styrofoam/wood or sheetmetal/styrofoam/sheetmetal)?

The problem with your proposed solution is a) that it will cost a lot;
and b) it won't do what you think it will do.  In particular, putting
lots of junk metal in a closed environment with limited airflow will
actually trap heat.

If I understand your concern, you are worried that the heat generated by
the bottom nodes rises into the second tier (so they run a bit hotter)
and dump some of their heat into the third node, and so on, so that by
the time you get to the fourth or fifth row the nodes are running
dangerously hot.

I think that you overestimate the contribution to the overall cooling of
heat leaking out of the top of an e.g. minitower case.  The real issue
here is: if you put styrofoam (or other) insulation all the way around
the case sides (without, of course, obstructing the airflow through the
case in any way) would it overheat?  Even if you have e.g. Compaq/Alpha
XP1000's (which generate far more heat than most Intel boxes) I think
that the answer would generally be no.  This means that the fan alone
provides adequate cooling for most cases.  This is especially true of
full tower or minitower cases, which are so big they support good air
circulation patterns within the cases.  However, it is still true for
e.g. rackmount cases.

Rackmount server farms or beowulfs would be impossible to build if
simply stacking ten or twenty units on top of one another led to
overheating at the top.  It is entirely possible that those cases have a
dead air space at the top and bottom to provide some thermal insulation
between layers but we're talking perhaps a centimeter or two of trapped
space.

If your particular candidate nodes DO get significantly warm on top if
you build a row and place a blanket or piece of foam sheathing (like the
stuff they put under house siding) on top and leave it for a few hours,
then perhaps you need something to retard the bleed out the top, but
that something could be ANYTHING that simply insulates the next row up
from the one(s) below.  Alternatively, just leaving an airspace between
rows (putting them on shelves) and blowing air over the whole stack with
a common house fan will keep cool air moving over the tops of the cases
and will both help keep them cool and deflect the heat before it can
enter the row(s) above.

Your REAL problem with heat is going to be getting rid of it in the
first place.  Every node draws between 70 and 250 watts (depending on
who makes it and what's inside).  I use 100 watts as a rule of thumb for
estimating cooling requirements.  Since you're talking about getting to
the fourth or fifth row, I'm going to assume that your beowulf is going
to contain at least e.g. 8x4 or 8x5 nodes -- 32 or more.  

This presents you with two infrastructure problems.  The first is that
you need to get power to them.  >>Most<< 120 VAC circuits are likely to
be 20 Amps or less.  This is because to carry more one needs wire
thicker than 12 gauge, which is a pain to work with.  A watt is an (rms)
volt-ampere.  Thus a 100 watt load represents around 1 ampere.  Thus you
can plug in at most 15-20 nodes on a single circuit.  Unless you are
fortunate enough to have a computer lab space in your school with more
than one circuit, you will likely need to have a second and possibly
even a third circuit added to the room.

The second is that energy is conserved.  All the energy consumed by the
nodes is (ultimately) released into the room as heat.  Heat that isn't
removed contributes to an increase in temperature until the thermal
gradient that is established suffices to carry heat away as fast as it
is being added in.  If you insulated the >>whole<< case of an operating
system (e.g. wrap the entire case in foil and blankets to prevent all
air flow and most heat loss) then the release of heat inside the case
with no means of getting out would turn the case literally into an oven.
A fire could easily ensue.  The walls of many cases are sufficiently
insulating that >>if there was no cooling fan<< they'd overheat all by
themselves in a room temperature environment, although probably not to
the point of catching on fire.

You've got to get rid of all that heat.  The cases can remove their OWN
heat into the room air well enough with their fans, PROVIDED that you
keep the room air cool.  The room itself is just like a bigger "case"
for all of those nodes.  The only way to keep it cool (remove the heat)
is to provide a flow of cool air in an the warm air out.  This is
usually accomplished by air conditioning.

So, how much air conditioning is needed to keep the room air cool so
that the case fans can keep the cases cool?  Think of your 32 nodes as a
3200 watt space heater (they are!).  You'd need to run a pretty good
sized air conditioner to keep a room at 20C while running a 3200 space
heater, right?  Air conditioning capacity is frequently rated in
>>tons<< where a "ton" of air conditioning is one that can remove enough
heat in 24 hours to melt a ton of ice at 0C (its melting point).  After
a few lines of arduous unit conversions you will find that this is a
rate of heat removal of roughly 3500 watts, distributed over a day.

Since you likely need a monitor, some lights, some extra fans, a network
switch, you need at >>least<< a ton of air conditioning to keep a
smallish room cool while operating 32 nodes.  This much air conditioning
would also keep a good sized house cool on a hot summer day.  If the
room itself is large and has other sources of ventilation and cooling,
one might get by with a bit less.

We actually lost air conditioning in our server/beowulf room here a
couple of days ago.  The fans remained on, blowing in air at roughly 20C
(room temperature) from the rest of the building but the chiller was
kicked off.  The heat generated by our systems raised the temperature of
the room to over 30C (87F) and rising in a matter of minutes; we had to
shut everything we could down on an emergency basis until we got the AC
back on.  Note that if the room air reaches e.g. 38-40C (100F) then the
cooling fans inside the cases >>cannot<< cool any of the internal
components of the cases to >>less<< than this temperature.  Neither can
heat sinks or anything else -- heat flows only from hot to cold!  The
silicon components would rapidly heat up at least as much as the room
air and quite possibly would burn out or, in the worst case, start a
fire.  We have certainly lost units before during AC failures and once
when room painters put their plastic drop cloths over systems units in
such a way that the obstructed the cooling fans (the idiots!).

If you get enough power in (carried on >>thick<< wires, which lose less
energy to waste heat) and keep the room air cool enough and arrange the
nodes so their cooling fans are unobstructed and there is an inch or so
of airspace between nodes horizontally (consider a staggered pyramid
tower for stability) then you probably won't need to do anything else to
keep the nodes cool.  Consider installing some sort of thermal alarm or
kill mechanism, though, to avoid meltdown in the even the room AC goes
off for any reason.

I hope this helps.  The support infrastructure required by a beowulf is
nontrivial, and is often neglected by those putting together a design.
It's good that you're thinking about heat and power and all that now --
that way you can either arrange to get no more nodes than your local
infrastructure can now support or to get more infrastructure as
required.

BTW, SOME of this is already discussed in my online beowulf book.  The
chapter really needs to be even more detailed, but you can check it out
now if you like.  Visit www.phy.duke.edu/brahma (and look for the
links to the book).

If your club writes up a summary of your beowulf when it is designed and
built and tested, I'd be happy to include it as a section in this book.
Your experience can then serve as a guide to other clubs at other
schools.

   rgb

> 
> ------------
> A bad random number generator: 1, 1, 1, 1, 1, 4.33e+67, 1, 1, 1...
> 
> 
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu




_______________________________________________
Beowulf mailing list
Beowulf at beowulf.org
http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list