Intel and GNU C++ compilers
Mark Hahn
hahn at physics.mcmaster.ca
Mon Oct 13 12:04:47 EDT 2003
> *** gcc version 2.95.4 ***
that's god-aweful ancient.
> none 9.5 KB 311 sec
> "-O3" 8.7 KB 192 sec
> "-O3 -ffast-math" 8.7 KB 165 sec
-fomit-frame-pointer usually helps, sometimes noticably,
since x86 is so short of registers. -O3 is often not
better than -O2 or -Os, mainly because of interactions
between unrolling, Intel's microscopic L1's, and the
difficulty of scheduling onto a tiny reg set...
I'd be surprised if 3.3 or 3.4 (pre-release) didn't perform
noticably better.
> flags bin-size elapsed-time
> ----- -------- ------------
> none 597 KB 100 sec
> "-O2 -tpp7 -xW -D_IOSTREAM_OP_LOCKS=0" 563 KB 89 sec
isn't -tpp2 redundant if you have -xW?
> the flags -tpp7 and -xW in 'icc' activate Pentium4 and SSE2 extensions
> respectively, I guess that using a newer 'gcc', capable of '-march=pentium4'
> and SSE2 extensions would improve 'gcc' results.
yes. '-march=pentium4 -fpmath=sse' seems to do it. gcc doesn't have
an auto-vectorizer yet, unfortunately.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list