CPUs for a Beowulf
Robert G. Brown
rgb at phy.duke.edu
Wed Sep 10 14:21:38 EDT 2003
On Wed, 10 Sep 2003, Greg Lindahl wrote:
> > > My first non-small HPC cluster was 64 nodes, and it took 28 hours of
> > > labor to set up, of which only 8 hours was my time
> > And the first small one?
> > (I basically wanted to scare people who are just building the very
> > first cluster
> You shouldn't use me as an example, then: that was my first HPC
> cluster, but I had set several groups of systems before which had
> related configs, and worked on clusters up to ~ 130 systems in size
> while working in industry.
> It all depends on what your experience is. If you haven't set up
> any kind of cluster before, you're asking for trouble.
Ya, which returns nicely to the original point. There are two or three
elements of cluster engineering that can bite you if you've never done
large scale cluster (or LAN) design or management before. Hardware,
installation and maintenance, and infrastructure might be easy (but
probably incomplete) categories.
COTS is good, COTS is the point, but reliability becomes increasingly
important as the number of systems scales up and somewhere in there it
is no longer wise to get CHEAP COTS systems, as in lowest-bid vanilla
boxes with no provisions for service. Hardware maintenance can eat you
alive if you end up with a large batch of flaky hardware, even if you do
have a service deal of some sort as it still costs you all sorts of time
for every failure.
Clusters eat electricity and excrete it as heat. Cluster nodes need
safe homes and like to talk. An eight-node toy cluster can often be set
up "anywhere" and just plugged in to a handy wall receptacle (although
even 8 nodes can make a room pretty hot without enough A/C). A larger
cluster often requires planning and remodeling, even engineering, of the
space the nodes are to live in. This can EASILY cost more than the
hardware you put into that space!
It is easy to use bad methodology to install and maintain a cluster.
Not as easy as it once was -- it is really pretty easy to use GOOD
methodology these days it the (awesome) tools are there and are even
tolerably well-documented -- but if your experience with linux is
installing it off of CD's onto a few systems one at a time, you've got
some learning to do. If you're really a Unix novice and don't have a
decent grasp of TCP/IP, ethernet (at least), scalable administrative
tools and services such as but not limited to NIS, NFS, and a whole lot
more, you've got a LOT of learning to do.
You, Greg are very much in the expert category -- no, I'd have to go
beyond that to "trans-professional superguru" category -- with a vast
experience in all of the above and then some. Besides that, you're
pretty smart;-) You should be wearing a special medal or insignia -- the
Royal Order of the 'Wulf or the like:-)
Others (including me if it came to it) sometimes are very good at
figuring out what went wrong -- after the fact -- so it helps for
them/us to proceed cautiously. Even something like a cheap, non-PXE
fast ethernet NIC vs a more expensive, PXE-supporting NIC may not seem
important (and hey, going COTS cheap saves you MONEY, right?), until you
find yourself carrying a floppy around to 64 systems to install them one
at a time, or have a two-post rack collapse because it isn't adequately
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf