[Beowulf] Re: HVAC and room cooling...

Jim Lux James.P.Lux at jpl.nasa.gov
Mon Feb 2 19:56:33 EST 2004

At 04:27 PM 2/2/2004 -0500, Eckhoff.Peter at epamail.epa.gov wrote:

>Problem 2:  What do you do when the AC stops?  Maintenance and the 
>occasional AC system oops can be devastating to a cluster in a small room.
>Solution 2a:  We are tied directly into a security system.  When a
>sensor in the room reaches a temperature level, "Security" responds 
>dependent upon the
>level detected.
>Solution 2b:  We installed a backup automated telephone dialer.  Not
>that we don't trust "Security", but we wanted a backup to let us know what was
>going on.
>    When the temperature reaches a certain level, the phone dials us with
>    automated message:
>    " This is the Sensaphone 1108.  The time is 1:36 AM and ...
>    [ ed.  your CPUs are about to fry... Have a nice night!!!"  ;-)  ]

YOu need to seriously consider a "failsafe" totally automated shutdown (as 
in chop the power when temperature gets to, say, 40C, in the room)... 
Security might be busy (maybe there was a big problem with the chiller 
plant catching fire or the boiler exploding.. if they're directing fire 
engine traffic, the last thing they're going to be thinking about is going 
over to your machine room and shutting down your hardware.

The autodialer is nice, but, what if you're out of town when the balloon 
goes up?

A simple temperature sensor with a contact closure wired into the "shunt 
trip" on your power distribution will work quite nicely as a "kill it 
before it melts". Sure, the file system will be corrupted, and so forth, 
but, at least, you'll have functioning hardware to rebuild it on.

Automated monitoring and tcp sockets are nice for management in the day to 
day situation, ideal for answering questions like: Should we get another 
fan? or Maybe Rack #3 needs to be moved closer to the vent. But, what if 
there's a DDoS attack on someone near you, and netops decides to shut down 
the router. What if all those Windows desktops run amok, sending mass 
emails to each other or trying to remotely manage each other's IIS, 
bringing the network to a grinding halt.

The upshot is: Do not trust computers to save your computers in the 
ultimate extreme.  Have a totally separate, bulletproof system.  It's 
cheap, it's reliable, all that stuff.

James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list