Intel and GNU C++ compilers

José M. Pérez Sánchez iosephus at
Tue Oct 14 10:19:11 EDT 2003

On Mon, Oct 13, 2003 at 06:22:17PM +0200, Joachim Worringen wrote:
> thanks for the information, but you really should (also) use the latest gcc 
> (3.3x) for such a comparision. It will be interesting to see how it performs 
> relative to the latest icc on the one hand, and to the old gcc on the other 
> hand.
> And some information on the application (or libraries used) would be helpful, 
> too. Like: is it memory-bound or compute-bound, etc..
>  Joachim

I installed gcc-3.3.2 from the debian testing distribution, here it is
the full report including gcc-3.3.2:

*** gcc version 2.95.4 ***
flags                bin-size   elapsed-time
-----                --------   ------------
none                 9.5 KB     311 sec
"-O3"                8.7 KB     192 sec
"-O3 -ffast-math"    8.7 KB     165 sec

*** gcc version 3.3.2 ***
flags                                     bin-size   elapsed-time
-----                                     --------   ------------
none                                      9.1 KB     245 sec
"-O3"                                     8.8 KB     161 sec
"-O2"                                     8.7 KB     157 sec
"-O2 -ffast-math -fomit-frame-pointer"    8.5 KB     127 sec
"-O2 -ffast-math"                         8.5 KB     125 sec
"-O2 -ffast-math -march=pentium4"         8.5 KB     120 sec
"-O2 -ffast-math -march=pentium4 -msse2"  8.5 KB     120 sec
"-O3 -ffast-math -march=pentium4 -msse2"  8.5 KB     120 sec

*** icc version 7.1 ***
flags                                    bin-size   elapsed-time
-----                                    --------   ------------
none                                     597 KB     100 sec
"-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0"   563 KB      89 sec

For this test, we actually wrote a version of the program with many parameters
hardcoded, so that we make it as compute bound as posible, we aimed at
evaluating how the different compilers took advantage of the Xeon

I will repeat the tests with the full version, which includes more
memory usage, maybe about 80Mb each process, but it will finally depend
on how big we make the files we use to split the calculations.

The main calculation is the phase of a particle, we use an
implementation of the MersenneTwister algorithm:

and have to compute sqrt(-2*log(x)/x) and sin(C*x/y) (x and y are not
position, they correspond to other variables in the program), C is a
constant hardcoded in the code like sin(9.7438473847*x/y).

I measured how much it it took to compute sqrt(-2*log(x)/x), and it was
about 412 processor cycles (I used rdtscll() ).

I will submit other results as soon as I get them, probably using another
computing algorithm which runs quite faster.


Jose M. Perez.

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list