[OT] Maximum performance on single processor ?

Robert G. Brown rgb at phy.duke.edu
Fri Jun 20 07:06:25 EDT 2003

On Fri, 20 Jun 2003, Joel Jaeggli wrote:

> > Our budget is of roughly 20000 $, and a linux/beowulf platform
> > would be ideal.
> actually by defintion if your code is can't be parallelized then a cluster
>  of off-the-shelf hardware probably isn't appropriate.
> > The current top Intel x86 or AMD processors are too slow, and we
> > do not have access to other architectures.
> Then you're screwed.

Let me be a TEENY bit more optimistic than Joel.

The one glimmer of hope I can hold out is that if your code is single
threaded by highly vectorized, you might look at some exotic solutions
(and some less exotic solutions) involving a vector co-processor, either
integrated with the main CPU or as a programmable add-on.

Some of the vector processors have extremely high floating point rates
for the limited instruction set and data manipulations they are capable
of.  Three slightly off-the-beaten path solutions you might look into

   a) Power PC.  Apple is actually pushing them for clusters these days
(I just attended a talk where they were touted), and the upper end ones
have both an L3 cache layer and an embedded Altivec vector unit.
Although the CPU itself is around 1 GHz and hence "slow" in terms of
cycles, the vector unit has some extremely high overall FLOPS rate for
the right kind of code.  I know nothing of the level of difficulty of
programming for it, but Apple supports it with compilers and so forth.

   b) http://arrakis.ncsa.uiuc.edu/ps2/cluster.php.  OK, yes, it's sort
of a toy, sort of a joke, but there is at least one Sony Playstation 2
cluster out there.  Why, you ask?  It runs linux (the "emotion engine"
core is a MIPS CPU) and it has two attached vector DSP's.  Programming
it so that it uses the VU's is PITA, and it may not be what you need,
but you want exotica, and this is it.

   c) NEC has a strange one it does:


I know next to nothing about it, but maybe you can cut a deal with them
to try it out.  May or may not have linux support yet (but they might
give you a couple if you were to offer to do the port:-).  Again, 8
GFLOP vector processors.

   d) Vector units are available in a lot of contexts for Linux systems,
few of them (alas) straightforward.  DSP's, high end video cards.  In
most cases these units have staggering peak FLOPS (more then 10 GFLOPS
in some cases IIRC) but in all cases programming them and optimizing to
achieve a reasonable fraction of peak in real world throughput will be,
um, a "challenge" unless your problem is shaped just right.

But it might be, so give these a look.  Otherwise, as Joel says, you're
pretty much screwed, especially with a budget of only $20K.  This
assumes that you've already looked at Intel Itanium and AMD Opteron
solutions, and already know about the SSE2 instructions available to
those CPUs.  In many cases, throughput is related to memory bandwidth in
the limit, and these units are designed with fast wide memory.  Make
sure that you understand your application very well indeed and that it
really isn't parallelizable.  If it is vectorizable you may have hope.
If it isn't vectorizable OR parallelizable then the best you will do
anywhere is the fastest CPU du jour... however inadequate it may be.


Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list