[Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?)

Wed Feb 11 11:44:10 EST 2004

On Tue, 10 Feb 2004, Dr. Yong Duan wrote:

> 
> On Tue, 10 Feb 2004, [big5] Andrew Wang wrote:
> 
> > Again, no matter how pretty the benchmarks results
> > look, in the end we still need to run on the real
> > system. So, what's the point of having benchmarks?
> > 
> > Andrew.
> > 
> 
> A guidelines, I guess. A lot of CPUs (including some rather expensive 
> ones and often call them HPC CPUs) perform at less than half the speed of 
> consumer grade CPUs. You'd definitely avoid those, for instance.
> Also, you can look at the performance in each area and figure out the 
> relative performance expected to your own code. In the end, the most 
> reliable benchmark is always on your own code, of course.

A short article this morning, as I'm debugging code and somewhat busy.

Before discussing benchmarks in general, one needs to make certain
distinctions.  There are really two kinds of benchmarks.  Maybe even
three.  Hell, more, but I'm talking broad categories.  Let's try these
three:

  * microbenchmarks
  * comparative benchmarks
  * application benchmarks

Microbenchmarks measure very specific, highly targeted areas of system
functionality.  By their very nature they are "simple", not complex --
often the pseudocode is as simple as

 start_timer();
 loop lotsatimes{
   do_something_simple = dumb*operation;
 }
 stop_timer();
 compute_speed();
 print_result();

(To compute "how fast a multiply occurs").  Simple can also describe
atomicity -- benchmarking "a single operation" where the operation might
be complex but is a standard unitary building block of complex code.

Microbenchmarks are undeniably not only useful, they are essential to
anyone who takes systems/cluster/programming engineering seriously.
Examples of microbenchmark suites that are in more or less common use
are:

  lmbench (very full featured suite; one infamous user: Linux Torvalds:-)
  stream  (very commonly cited on the list)
  cpu_rate (not so common -- wraps e.g. stream and other tests so
            variations with vector size can be explored)
  rand_rate (almost unknown, but it DOES benchmark all the gsl rands:-)
  netpipes (measure network speeds)
  netperf  (ditto, but alas no longer maintained)

I (and many others) USE these tools (I wrote two of them SO I could use
them) to study systems that we are thinking of buying and using for a
cluster, to study the kernel and see if the latest change made some
critical operation faster or slower, to figure out if the NIC/switch
combo we are using is why PVM code is moving like molasses.  They are
LESS commonly poached by vendors, fortunately - Larry Macvoy has lmbench
bristling with anti-vendor-cooking requirements at the license level.
The benchmarks are simple, but because one needs a lot of them to get an
accurate picture of overall performance they tend to be too complex for
typical mindless consumers...

Comparative benchmarks are what I think you're really referring to.
They aren't completely useless, but they do often become pissing
contests (such as the top 500 list) and there are famous stories of Evil
by corporations seeking to cook up good results on one or another
(sometimes at the expense of overall system balance and performance!).

Most of the Evil in these benchmarks arise because people end up using
them as a naive basis for purchase decisions.  "Ooo, that system has a
linpork of 4 Gigacowflops so it must be better than that one which only
gets 2.7 Gcf, so I'll buy 250 of them for my next cluster and be able to
brag about my 1 Teracowflop supercomputer and make the top third of the
top 500 list, which will impress my granting agencies and tenure board,
who are just as ignorant as I am about meaningful measures of systems
performance..."  Never mind that your application is totally
non-linpack-like, that the bus performance on the systems you got sucks,
and that the 2.7 Gcf systems you rejected cost 1/2 the 4 Gcf systems you
got so you could have had 500 at 2.7 Gcf for a net of 1.35 Tcf and
balanced memory and bus performance (and run your application faster per
dollar) if you'd bothered to do a cost benefit analysis.

The bleed of dollars attracts the vendor sharks, who often can rattle
off the aggregate specmarks and so forth for their most expensive boxes.
However, they CAN be actually useful, if one of the tests in the SPEC
suite happens to correspond to your application, if you bother to read
all the component results in the SPECmarks, if you bother to check the
compiler used and flags and system architecture in some detail to see if
they appear cooked (hand tuned or optimized, based on a compiler that is
lovely but very expensive and has to be factored into your CBA).

Finally, there are application benchmarks.  These tend to be "atomic"
but at a very high level (an application is generally very complex).
These are also subject to the Evil of comparative benchmarks (in fact
some of comparative benchmark suites, especially in the WinX world, are
a collection of application benchmarks).  They also have some evil of
their own when the application in question is commercial and not open
source -- you have effectively no control over how it was built and
tuned for your architecture, for example, and may not even have
meaningful version information.

However, they are also undeniably useful.  Especially when the
application being benchmarked is YOUR application and under your
complete control.

So the answer to your question appears to be:

  * Microbenchmarks berry berry good.  Useful.  Essential.  Fundamental.
  * Comparative benchmarks sometimes good.  Sometimes a force for Evil.
  * Application benchmark ideal if it is your application or very
similar and under your control.

Pissing contests in general are not useful, and even a useful higher
level benchmark divorced from an associated CBA is like shopping in a
store that has no price tags -- a thing of use only to those so rich
that they don't have to ask.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf