ATHLON vs XEON: number crunching

Jakob Oestergaard jakob at unthought.net
Fri Jun 21 05:26:35 EDT 2002


On Wed, Jun 19, 2002 at 04:59:45PM -0400, Ivan Oleynik wrote:
> A week ago I posted a question asking people about their experiences with
> both Athlon's and Xeon's in terms of their performances in number
> crunching.
> 
> Now I made my own tests and found very bizarre results. I only compared
> serial performance by running my representative code on each platform. The
> code was compiled with PGI compiler, no optimization options (by default
> it is -O1, local optimization), I used flags -tp athlon and -tp piv (xeon)
> for platform specific compilations.

As others suggested, you may be seeing memory throughput problems.

These memory problems may be limited if the code is optimized better
(less register spills -> less reading and writing from/to cache ->
better cache utilization -> less memory activity).

Now, in production you would probably never run code that was not
optimized. So don't test with code that's not optimized.

Specify the best optimization options. Experiment a little - perhaps one
set yields better performance on one CPU than the other. Then use the
best set of options for each processor type.

This of course means, that your performance will also be limited by how
well the compiler can optimize for the different CPUs.  But this is just
how it is in the real world. Tough, but we have to live with it.

Is there any chance you can re-run the benchmarks with better
optimization enabled ?   That would be really interesting to a lot of us
here on the list.

> The code itself contains a lot of FFTs, vector & matrix algebra, it
> includes BLAS and LAPACK sources and minimal IO. All the systems have 2 GB
> RAM and the code consumes only 50 Mb.

Any chance you can try using ATLAS ?

You would need to compile one ATLAS for the Intel CPUs and one for the
AMD ones.

> What I found is that Xeon 2.2 GHz is 1.5 times faster! than any athlons I
> tested. But the most strange thing is all these athlons: MP 1200, MP 1900,
> MP 2100 give approximately the same timing within 5%. This is completely
> above my comprehension.
> 
> Did someone encounter such a strange pattern and what can be a source of
> this behavior?

I've seen seven 120 MHz Power2 CPUs (running parallel code scaling
better than 75% of theoretical performance) outperformed by a single 80
MHz vector CPU.   The real world sucks   ;)

Seriously though, performance tuning and comparison is just not simple.

-- 
................................................................
:   jakob at unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list