CPUs for a Beowulf

Robert G. Brown rgb at phy.duke.edu
Mon Sep 8 12:44:42 EDT 2003


On Mon, 8 Sep 2003, Robert Kane wrote:

> Good morning,
> 
>   If anyone doesn't mind, may I ask a few questions. When given a
> specific application for which a cluster is being built it should be
> relatively simple to look are the requirements of the problem and the
> available hardware, and then determine which hardware solution is best
> for the problem. However, if the cluster is being built as a general
> purpose cluster for research, things become a bit more difficult, as (as
> I far as I can tell) there is no one answer. But, if anyone has any
> insight into the following problems it would be greatly appreciated.
> 
> 1. Single versus Dual CPUs?
> 
>   Both of these choices have their pros and cons and are each best
> suited for different types of problems. Given that the cluster will be
> used for a variety of problems, is there one which would be a better
> choice? Is there a particular configuration for which the majority of
> problems will run better? Is there a solution that on average provides
> more performance per dollar?

You're going to get a lot of "it depends" answers because it does.
However:

  a) Historically dual packaging is marginally very slightly cheaper,
per CPU, than single packaging.  The difference is pretty much the
duplicated parts -- two cases instead of one, two power supplies, two
hard disks.  So if your programs are expected to be purely CPU bound,
you'll get a bit more CPU for your dollar in dual packaging.

  b) OTOH if your task(s) are likely to be memory bound, dual packagings
typically oversubscribe the memory bus.  That is if both CPUs are trying
to read/write to memory as fast as they can, one will often be blocked
waiting for the other.

  c) This latter problem projects down to other resources as well.  For
example two CPUs might end up sharing a single NIC, or two NICs might
end up sharing a bus.  Anywhere you have a memory/speed hierarchy that
is shared by the CPUs on a dual but that has its own private resource on
a single, you can end up with one CPU/task blocking while waiting for
the other to free the resource.

In many cases the problems described in b and c are minimal or
relatively rare.  In others they can significantly degrade performance.
I think that's the most that can be said without a deeper knowledge of
the problem space and the rest of your architecture, e.g. the network,
the memory we're talking about, the CPU's, the overall architecture.
With both 32 and 64 bit architectures and with a variety of FSB and
memory and CPU speeds these days, something that might be a problem with
one architecture might be perfectly ok with a different one, so just
"single" and "dual" aren't enough data to answer your question, if they
ever were.

You'll probably have to do some sort of cost benefit analysis with SOME
sort of idea of the boundaries of your problem space to proceed.  For
example, you might select duals knowing they won't scale perfectly for
certain problems, because they'll outperform (dollarwise) on the others.
Or you might go with singles to get the best scaling on the former, at
the expense of the latter.  These days the marginal cost difference
isn't so great that either path is likely to be a huge win or loss, and
it may be other considerations such as CPU density in a rackmount that
become more important than performance per se.

> 2. CPU Type
> 
>   Intel and AMD's new 64-bit processors are finally beginning to become
> more common it appears. And from what I've seen the benchmarks are
> rather impressive. However, there seems to be a significant price
> increase going from previous generation chips (ie Xeon) to the new
> 64-bit chips. In general is the increased performance worth the money
> invested, or would a larger number of slower chips be effective
> cost/performance wise? Apart from the increased electricial, A/C costs
> of course.

This is plain old impossible to answer without a knowledge of the
problem space.  Never one to let that stop me, let me pronounce that at
the moment AMD's 64 bit chips are probably worth considering, and
Intel's are not.  Yet.  And I could be behind the times on the yet, as
well.

Note that I say considering.  I honestly think that the only way to
answer a question like this is to get loaners of two or three candidate
systems and run benchmarks.  There are also additional costs outside of
the raw hardware to consider, in particular a certain degree of weakness
in mainline linux distribution support for the 64 bit systems.  You're a
bit closer to the beta level there and might well end up "paying" a bit
extra in screwing-around-with-crap costs administratively to get the
full benefit of the CPUs for a while yet.

Again, though, the cost differentials are getting so low for AMD's that
their 64 bit systems don't look TOO horrible run as 32 bit systems, and
the OS situation can only improve.  There is also little doubt that 64
bit support will in fact come to pass -- I know some folks that got
eaten alive by Alphas when they bought them for their speed and had to
deal with it when their OS support more or less evaporated, leaving them
struggling and burning admin FTE with simple issues like scalable
installation and mandatory security upgrades of important packages.

Administrative costs can easily be THE dominant cost in a public cluster
of the sort you describe, with benefits that are difficult to quantify.
This tends to bias one towards conservative solutions more or less
guaranteed to minimize human management time, which in turn biases one
towards older "guaranteed to work" hardware, straight off the shelf
commercial linux distros (or a supported cluster package like Scyld at
moderate expense in dollars but presumed savings in human time),
hardware from a vendor that does on site service as part of the up-front
cost, and things like PXE-capable NICs, kickstart, yum.  64 bit systems
would, I think, require a bit more human effort and skill in the
administrative chain to make work.

    rgb

> 
> 
> Thank you for any information concerning these issues, whether
> information be answers or links to good resources,
> 
> Robert Kane
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list