[Beowulf] Q: Cooling units? Raised floors? General machine room stuff..

Robert G. Brown rgb at phy.duke.edu
Mon Jul 5 09:47:12 EDT 2004


On Fri, 2 Jul 2004, Suvendra Nath Dutta wrote:

> With good advice like this, you have to give recommendations! What 
> brand work bench do you use? This is quite important for me as we have 
> no service agreement. I fought hard to have a "temporary" ramp put in 
> up the steps to our cluster room (with a raised floor). But without a 
> good work bench it would be useless.

A "leftover table from the physics department labs" brand bench.
Basically a wooden table that fits neatly in an open space between
racks, with power and network, room for a monitor keyboard, a
compartmentalized drawer unit for small screws and tools, some baskets
on nearby shelving for frequently used or scavenged parts (disk drives,
NICs, video cards, cabling).  We also use an old shelf unit to store
e.g. motherboard boxes, cables, larger items.

What you need is the room to lay out at least one, ideally two systems
open and plugged into power and network with video you can switch
between the two.  If you are really lucky and can keep a cheap PC there
to act as an interface to both nodes and cluster and colleagues and to
play your music on while wearing your thick (warm and sound protecting)
headphones, a small/cheap KVM to switch between the 2-3 systems can
help.

A nice work chair that can elevate or drop is also good.  It needs to
come up to a height that makes working on the bench easy.  For tools I
recommend an electric screwdriver with LOTS of bits including e.g.
allen and hex bolt, sundry pliers and wire cutters, magnetized
ratcheting screwdriver(s) and bits, a hammer (don't ask:-), a decent
multimeter (and/or an oscilloscope if you have one and have a clue how
to use it), a kill-a-watt or two, a flashlight.  I haven't put one
down there yet, but it occurs to me that a good electrical fire
extinguisher wouldn't be a bad idea...;-)

The idea is that the BIGGEST expense of dealing with downed nodes is
human time, not the cost of replacement parts.  You want to make getting
nodes out of the rack, testing them and determining the failed part(s),
replacing and retesting said nodes/parts, and reracking them as smooth
and rapid a process as possible.  If a tool helps (even a fairly
expensive tool) it almost certainly will pay for itself if it saves you
even minutes per node on average, over time.

Compare the cost of downtime.  How much does it cost the cluster's users
to have a node down?  Not a whole lot, perhaps, but if it stays down for
days it mounts up.  If it stays down for hours (or less than a day,
anyway) it limits this "cost" in cluster productivity.

The same argument, BTW, justifies getting service contracts on all the
nodes or buying from tier 1 or 2 vendors.  It costs typically around 10%
of the purchase price, but it can save you a lot of YOUR TIME and
minimize node downtime.  Right now I am burning hours and hours of my
time repairing nodes that don't have a contract and weren't built all
that well, as it turns out.  We don't buy from that vendor any more, but
we still have to use the nodes for a "standard lifetime" of 3 years...

   rgb

> 
> Suvendra.
> 
> On Jul 2, 2004, at 10:03 AM, Robert G. Brown wrote:
> 
> > On Fri, 2 Jul 2004, Jim Lux wrote:
> >
> >>>   Finally, what other suggestions do people have for equipment 
> >>> needed?
> >>> On my list we've got the UPS, the AC, thermal sensors and 
> >>> killswitches,
> >>> APC Masterswitch units, power lines, network lines, web/video camera 
> >>> and
> >>> an alarm.  Anything else people find useful?
> >>
> >> Coat rack to hang some jackets on.
> >
> > Work bench.  A nice one, with rechargable screwdriver and bit set,
> > various hand tools, good lighting, a network drop or three, a small KVM
> > and flatpanel video/keyboard, screw/part organized storage, a rack to
> > hang spare cables on -- you get the idea.  Even if you get top shelf
> > service agreements on everything, you WILL be working down there
> > prepping nodes to go in or out, replacing failed drives on out of
> > warranty systems, and so forth.
> >
> > Comfortable swivel/rolling work chairs to match the bench, and if you
> > are really into comfort, put a cheap workstation on your bench KVM
> > (useful in and of itself) and add a nice set of headphones on a long
> > cable to keep your ears warm, exclude room noise (likely considerable),
> > and let you listen to that 20 GB ogg collection we know that you've got
> > squirrelled away somewhere while working...;-)
> >
> > A phone.  One with a really loud ringer or even a blinking light 
> > ringer,
> > if your AC is as loud as ours.
> >
> > He says as he prepares to descend into the bowels of the physics
> > department to work in OUR cluster/server room... sigh.  Comfort is key.
> >
> >> When thinking about UPSes, etc.  consider partitioning your system so 
> >> that
> >> not everything dies together.  Your monitoring and head node 
> >> computers might
> >> want a longer duration than the compute nodes.  (that web cam's not 
> >> going to
> >> do much for you if the power is shut off...)
> >>
> >> A temperature/humidity recorder is nice to have.  It could be as 
> >> simple as
> >> an off the shelf weatherstation widget and a logging program.  It 
> >> lets you
> >> better manage your HVAC.
> >
> > Regarding AC units, there are some very lovely 10 ton and 15 ton units.
> > Since a major component of your cost will be space renovation, and 
> > since
> > this will become MORE costly later as it will require downtime and dust
> > on top of the mere dollars, you'll really want to TRY to engineer the
> > space now for its eventual future peak capacity.  You'd also much 
> > rather
> > have too much AC than two little.
> >
> > You might want two 5 ton units instead of one 10, though -- that way 
> > you
> > have a bit of redundancy should one fail while you are still at current
> > levels.  I agree with Jim, though -- talk to your HVAC contractor, they
> > should be able to give you good advice here.
> >
> >    rgb
> >
> > -- 
> > Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> > Duke University Dept. of Physics, Box 90305
> > Durham, N.C. 27708-0305
> > Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit 
> > http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list