Robert G. Brown
rgb at phy.duke.edu
Fri Apr 11 14:11:01 EDT 2003
On Fri, 11 Apr 2003, jbassett wrote:
> Does anyone know of if it is possible to buy a rackmount cluster with an
> integrated cooling system? It seems against the philosophy of Beowulf to look
> for low cost computing solutions, and then find that you need to make a
> substantial investment just to cool the room. I had an Athlon system shut down
This is in some ways impossible, if I understand what you mean. Or from
another point of view, it is already standard. Let's understand
refrigeration and thermodynamics a bit:
All the energy used to run your systems and do computations turns into
heat (1st law).
One cannot make heat "go away"; it either naturally flows from hot
places to cooler places, or one can move it forcibly from a hot place
to a hotter place. It costs energy which makes still MORE heat to move
it forcibly around (2nd law).
Now view the CPUs as little heaters -- 50W to 100W apiece (as hot as
most incandescent light bulbs) and confined inside a 1U or 2U case. Add
on another 50W plus for the motherboard, memory, disk, network, and the
switching power supply itself inside the case. Even the "refrigeration"
devices already standard in the case (case fans intended to speed the
heat on its way) add heat to the case exhaust in the process.
Cases are already designed to move heat from the hot spots inside out
into the ambient air as efficiently as possible (within the quality of
engineering and layout of any particular case with any particular
motherboard). There are even cooling devices designed for e.g. CPU
cooling that are active electronic refrigerators (peltier coolers) and
not just fan+heat sink conduction+convection coolers.
The problem is out in the room. Once you remove the heat from the
cases, with or without an actual case refrigerator at work (in general
one will exhaust MORE heat into the room than a case cooled with fan
alone) the heat still HAS to get out of the room. If the room has lots
of nodes making heat, nice thick walls, ceilings, floors, and lots of
dead air (as do most uncooled cluster rooms, it seems), it won't get out
quickly enough on its own, so it will start to build up. This makes the
room get hot -- temperature being a measure of the "heat" (random
kinetic energy) in the room's air.
Now, a passive cooler fan can only cool the CPU if the ambient air is
cooler than the CPU. It can move air through more quickly, but
basically heat is flowing from hot to cold. As the room air temperature
goes up, so does the CPU temperature as the fan is less successful in
helping to remove its power-generated heat. An active cooler is in no
fundamentally better shape. Yest, it will maintain a temperature
gradient, and keep the CPU actually cooler than ambient air, but as
ambient air goes up in temperature so will the CPU temperature AND the
ambient air will get still hotter as a result of the extra energy the
cooler itself consumes (which in turn goes up as the ambient air
temperature increases in a vicious cycle). It also heats the other
components in the case more while keeping the CPU a bit cooler, so other
things may fail at a higher rate unless you remove all that heat.
ONE WAY OR ANOTHER you will HAVE to remove the heat from the room JUST
AS FAST as all the systems and other heat sources (including electric
lights and human bodies) produce it to maintain the room's temperature
as constant. If you live in a cold climate or have some handy "cold
reservoir" that can absorb the heat from your cluster indefinitely
without getting warmer itself, maybe you can metaphorically open a
window and stick in a fan and blow the hot air out into the snow,
replacing it with nice cool air from outside. If you live in Durham NC
in the summer, the air outside the building is a lot HOTTER than you'd
like the cluster room to be, so you have to do work to actively move the
heat from your nice cool cluster room to the much hotter out of doors,
moving it "uphill" so to speak.
This work WILL be done by a refrigeration unit -- an air conditioner --
as that's what they are and what they do. You can even estimate fairly
accurately how much air conditioning you'll require to keep up with the
rate at which the cluster produces heat, using 3500 Watts per "ton" of
A/C (and remembering to provide a lot more capacity than you think
you'll need, maybe twice as much). You can install an "off-the-shelf"
air conditioning solution if one is possible and makes sense for your
cluster room, or you can (likely better) have a pro come and install a
proper climate control system.
You'd have to do this for EITHER a "big iron" supercomputer OR a beowulf
-- in both cases they make lots of heat, in both cases you MUST remove
that heat as fast as it is made and dump it outside to maintain ambient
air temperatures in the 60's (ideally). Beowulfish clusters are cheap
to build, they are relatively cheap and scalable to operate in most
environments, but there are most definitely infrastructure costs and
requirements -- adequate power and ac and networking in the physical
space, and the actual cost of power to run and cool the nodes. The
former can usually be "amortized" over many years so that it adds a few
tens of dollars per year to the cost of operating the nodes themselves.
The latter is unavoidable -- roughly $1/watt/year for heating and
This is another "killer" surprise for cluster builders -- a 100 node
cluster of 100 Watt nodes might cost $75,000 in direct hardware costs,
AND $25,000 in renovation costs for new power and AC (amortized over ten
years and 100 nodes -- maybe $30 per node per year "payback", including
the cost of the money), AND $10,000 a year for power and A/C. It's
still cheap, really, compared to big iron -- just not as cheap as you
might have thought looking at hardware costs alone.
This serious, thoughtful approach to infrastructure, is the best way to
keep from having problems with overheating. The best fans or Peltier
coolers in the world aren't going to do much if ambient air in the
cluster room is in the 80's or 90's, and without AC a cluster room can
get well into the 100's and beyond in a remarkably short period of time.
If you have 50 KW or so being given off in an office-sized space with
insulating walls and no AC, you'll be able to bake brownies by leaving
cups of batter out on top of your racks, at least until something melts,
shorts, starts a fire, and burns down the whole thing.
As far as the rest of your remarks on case design are concerned, they
may be well-justified but there are a lot of cases out there and you
should look at more than one. It isn't horribly easy to design airflow
inside a 1U space filled with big block-like components, and some do a
better job of it than others. Even with a good case design, something
like using a flat ribbon cable ide/floppy connector instead of a round
cable can defeat your purpose, in SOME units, by virtue of the ribbon
accidentally blocking part of the airflow!
I "like" 2U cases a bit better than 1U's for that reason, but there are
some people that make very lovely 1U cases that seem to be quite robust
and reliable -- as long as you keep ambient air in the 60's or at worst
low 70's at the fan intake.
> on me due to overheat, so I look at the cases and I think- why aren't people
> looking to use airflow in a more efficient manner. I know the ambient air temp
> isn't this high. I may be in left field, but it seems like the flow inside a
> case is so turbulent that the mean air velocity is not carrying the warm air
> away from the cpu as quickly as it could.
> Joseph Bassett
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf