[Beowulf] Re: [Rocks-Discuss]Intel compiler specifically tuned for SPEC2k (and other benchmarks?)

Robert G. Brown rgb at phy.duke.edu
Wed Feb 11 17:13:32 EST 2004


On Wed, 11 Feb 2004, Lombard, David N wrote:

> > They also have some evil of
> > their own when the application in question is commercial and not open
> > source -- you have effectively no control over how it was built and
> > tuned for your architecture, for example, and may not even have
> > meaningful version information.
> 
> Let's be fair here. An ISV application is not the definition of evil.

I did not mean to imply that they were wholly evil or even evil in
intent.

> Clearly, "you have effectively no control over how an application was
> built and tuned for your architecture" has no direct correspondence to
> performance.

I would have to respectfully and vehemently disagree.  It has all sorts
of direct correspondances.  Let us make a short tally of ways that a
closed source, binary only application used as a benchmark can mislead
me with regard to the performance of a system.

  * I don't control the compiler choice.  Your compiler and mine might
result in me getting a very different performance even if your
application "resembles" mine (AFAICT given that I cannot read the
source).

  * I don't control the libraries.  Your application is (probably)
static linked in various places and might even use private libraries
that are hand-optimized.  My application would likely be linked
dynamically with completely different libraries.  Your libraries might
be out of date.  My libraries might be out of date.

  * I don't have any way of knowing whether your "canned" (say) Monte
Carlo benchmark is relevant to my Monte Carlo application.  Maybe your
code is structured to be strictly vectorized and local, but mine
requires random site access.  Yours might be CPU bound.  Mine might be
memory bound.  Since I can't see the source, I'll never know.

  * I have to pay money for the application to use as a benchmark before
I even look at hardware.  If I'm an honest soul, I probably have to buy
a separate license for every platform I plan to test even before I buy
the test platform OR run afoul of the Dumb Mutha Copyright Act (aka
known as the "Intellectual Straightjacket Act").  Or maybe I can rely on
vendor reports of the results.  This adds costs to the engineering
process.

  * Even leaving side the additional costs, there is the issue of
whether the application I'm using is tuned for the hardware I'm running
on.  strict i386 code will not run as fast as strict i586 code will not
run as fast as i686 code will not run optimally on an Athlon will not
run optimally on an Opteron.  Yet the Opteron will likely RUN i386 code.
I just won't know whether the result is at all relevant to how the
Opteron runs Opteron code.  (These effects are not necessarily small.)

  * And if I thought about it hard, I could likely come up with a few
more negatives...such as the entire raft of reasons that closed source
software is a Bad Thing to encourage on general principles.  The
principles built right into the original beowulf mission statement
(which IIRC has a very clear open source requirement for engineering
reasons).

The point being that while closed source commercial applications don't
necessarily make "evil" benchmarks in the sense that there is any intent
to hide or alter performance characteristics of a given architecture,
they add a number of sources of noise to an already arcane and uncertain
process.  They are less reliable, more likely to mislead you (quite
possibly through nobody's fault or intention), less likely to accurately
predict the performance of the architecture on your application suite.
And they are ultimately black boxes that you have to pay people to use.

I personally am a strong proponent (in case you can't tell:-) of open
source (ideally GPL) software and tools, ESPECIALLY for benchmarking.  I
even tried to talk Larry McVoy into GPL-ing lmbench back when it had a
fairly curmudgeonly license, even though it the source itself was open
enough.

Note, BTW, that all of the observations above are irrelevant if the
application being used as a benchmark is the application you intend to
use in the form you intend to use it, purchased or not.  As in:

> > However, they are also undeniably useful.  Especially when the
> > application being benchmarked is YOUR application and under your
> > complete control.
> 
> Regardless of ownership or control, they're especially useful when
> you're looking at an application being used in the way you intend on
> using it. Many industrial users buy systems to run a specific list of
> ISV applications.  In this instance, the application benchmark can be
> the most valid benchmark, as it can model the system in the way it will
> be used -- and that's the most important issue.

Sure.  Absolutely.  I'd even say that your application(s) is(are) ALWAYS
the best benchmark for many or even most purposes, with the minor caveat
that the microbenchmarks have a slightly different purpose and are best
for the purpose for which they are intended.  I doubt that Linus runs a
scripted set of userspace Gnome applications to test the performance of
kernel subsystems...

> I'm not disagreeing with your message.  I too try to make sure that
> people use the right benchmarks for the right purpose; I've seen way too
> many people jump to absurd conclusions based on a single data point or
> completely unrelated information.  I'm just trying to sharpen your
> message by pointing out some too broad brush strokes...
> 
> Well, maybe I don't put as much faith in micro benchmarks unless in the
> hands of a skilled interpreter, such as yourself.  My preference is for
> whatever benchmarks most closely describe your use of the system.

Microbenchmarks are not intended to be predictors of performance in
macro-applications, although a suite of results such as lmbench can give
an expert a surprisingly accurate idea of what to expect there.  They
are more to help you understand systems performance in certain atomic
operations that are important components of many applications.  A
networking benchmark can easily reveal problems with your network, for
example, that help you understand why this application which ran just
peachy keen at one scale as a "benchmark" suddenly turns into a pig at
another scale.  A good CPU/memory benchmark can do the same thing wrt
the memory subsystem.

This is yet another major problem with an naive application benchmark or
comparative benchmark (and even with microbenchmarks) -- they are OFTEN
run at a single scale or with a single set of parameters.  On system A,
that scale might be one that lets the application remain L2-local.  On
system B it might not be.  You might then conclude that B is much
slower.  On the scale that you intend to run it, both might be L2-local
or both might be running out of memory.  B might have a faster
processor, or a better overall balance of performance and might actually
be faster at that scale.

I don't put much faith in benchmarks, period.  With the exception of
your application(s), of course.  Faith isn't the point -- they are just
rulers, stopwatches, measuring tools.  Some of them measure "leagues per
candle", or "furlongs per semester" and aren't terribly useful.  Others
are just what you need to make sense of a system.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list