Intel and GNU C++ compilers

Mark Hahn hahn at
Mon Oct 13 12:04:47 EDT 2003

> *** gcc version 2.95.4 ***

that's god-aweful ancient.

> none                 9.5 KB     311 sec
> "-O3"                8.7 KB     192 sec
> "-O3 -ffast-math"    8.7 KB     165 sec

-fomit-frame-pointer usually helps, sometimes noticably,
since x86 is so short of registers.  -O3 is often not 
better than -O2 or -Os, mainly because of interactions 
between unrolling, Intel's microscopic L1's, and the 
difficulty of scheduling onto a tiny reg set...

I'd be surprised if 3.3 or 3.4 (pre-release) didn't perform
noticably better.

> flags                                    bin-size   elapsed-time
> -----                                    --------   ------------
> none                                     597 KB     100 sec
> "-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0"   563 KB      89 sec

isn't -tpp2 redundant if you have -xW?

> the flags -tpp7 and -xW in 'icc' activate Pentium4 and SSE2 extensions
> respectively, I guess that using a newer 'gcc', capable of '-march=pentium4'
> and SSE2 extensions would improve 'gcc' results.

yes.  '-march=pentium4 -fpmath=sse' seems to do it.  gcc doesn't have 
an auto-vectorizer yet, unfortunately.

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list