Enterprise Beowulf and Beowulf on a Chip

Mark Hahn hahn at physics.mcmaster.ca
Wed Jun 18 23:03:09 EDT 2003

> In my own search for finding out whether Beowulf makes sense for
> enterprise computing (specifically on-line transaction processing
> (OLTP)),

OLTP has significant throughput-type parallelism, so can be done
by clusters quite nicely.  it could be that you'd want a low-latency
interconnect for lockmanager-type synchronization, but wouldn't 
necessarily need much bandwidth.

> This work emphasizes "large memory stall times" as being
> a primary culprit limiting OLTP performance.  The solution

right: OLTP is computationally trivial.  so you can either use 
CMP-type parallelism (where stalls still waste the CPU for the 
duration, but CPUs are cheap) or SMT (which they reject for a 
practical reason, namely that CMP is a lot easier to throw together.)

> multiprocessing (CMP).  This makes me wonder if OLTP
> can benefit from simply lots of processors (with fast interconnect)
> to utilize more L1, L2 cache simultaneously (a might bit larger
> than a chip I might add!!)...

it's not clear how much good cache does you for OLTP - 
if you look at figure 6, you see that miss rate decreases 
between P4 and P8 - that is, when you have 8 1M caches, you
start to approach a working set.  

in figure 8, you see that misses are still something like 33% of 
execution time, though, so even 12M cache is not terribly effective.

>  If this approach makes sense for OLTP doesn't a Beowulf
> make sense for OLTP work now?

clusters are great for OLTP, but Beowulf is usually considered a fairly
specific kind of cluster, which is tuned in ways that are not much of 
advantage for OLTP.

> If Beowulf makes sense on the macro level does it make
> sense in the micro-level or perhaps in the fractal sense of
> a self similar architecture (exploiting even more hierarchy)?

I do not believe there is any problem obtaining whatever OLTP performance
you need.  do you need 700K tpmC?  I really don't think so.  making OLTP 
cheap (even at modest performance) is an entirely different topic.

