<div dir="ltr">Hi Bill,<br><br>&nbsp;&nbsp;&nbsp;&nbsp; I&#39;m sorry. I composed the mail in proper format, but its not showing as I put.<br><br>See, I&#39;ve tested with three compilers only for AMD. For intel only Intel ifort.<br><br>Also there are two results for a single run (not for all. I missed out to take results with time command).<br>

<br>I hope this helps,<br><br>Thanks,<br>Sangamesh<br><br><div class="gmail_quote">On Thu, Sep 18, 2008 at 11:59 AM, Bill Broadley <span dir="ltr">&lt;<a href="mailto:bill@cse.ucdavis.edu">bill@cse.ucdavis.edu</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

<br>

I&#39;m trying to understand your post, but failed. &nbsp;Can you post a link, publish a google spreadsheet or format it differently?<br>

<br>

You tried 3 compilers on both machines? &nbsp;Which times are for which CPU/Compiler combos? &nbsp;I tried to match up the columns and ros, but sometimes there were 3 columns, and sometimes 4. &nbsp;None of them lines up nicely under CPU or compiler headings.<br>


<br>

Mine (and many other folks) read email in ASCII/text, so a table should look like:<br>

<br>

Serial run:<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Compiler A &nbsp; Compiler B &nbsp; Compiler C<br>

=====================================================<br>

Intel 2.3 GHz &nbsp; &nbsp; 30 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;29 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 31<br>

AMD 2.3 GHZ &nbsp; &nbsp; &nbsp; 28 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;32 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 32<br>

<br>

Note that I used spaces and not tabs so it appears clear to everyone irregardless of their mail client, ascii/text, html, tab settings, etc.<br>

<br>

I&#39;ve been testing these machines quite a bit lately and have been quite impressed with the barcelona memory systems, for instance:<br>

<br>

<a href="http://cse.ucdavis.edu/bill/fat-node-numa3.png" target="_blank">http://cse.ucdavis.edu/bill/fat-node-numa3.png</a><br>

<br>

<br>

Sangamesh B wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="Wj3C7c">

The scientific application used is Dl-Poly - 2.17.<br>

<br>

Tested with Pathscale and Intel compilers on AMD Opteron Quad core. The time<br>

figures mentioned were taken from DL-Poly output file. Also I had used time<br>

command. Here are the results:<br>

<br>

<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AMD-2.3GHz (32 GB RAM)<br>

 &nbsp; &nbsp;INTEL-2.33GHz (32 GB RAM)<br>

<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; GNU gfortran &nbsp; &nbsp; &nbsp;Pathscale &nbsp; &nbsp; &nbsp;Intel 10<br>

ifort &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Intel 10 fiort<br>

<br>

1. Serial<br>

<br>

OUTPUT file &nbsp; &nbsp; &nbsp; 147.719 sec &nbsp; &nbsp; &nbsp; 158.158 sec &nbsp; &nbsp; 135.729 sec<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 73.952 sec<br>

<br>

Time command &nbsp; &nbsp;2m27.791s<br>

2m38.268s &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1m13.972s<br>

<br>

2. Parallel<br>

 &nbsp; &nbsp; &nbsp;4 core<br>

<br>

OUTPUT file &nbsp; &nbsp; &nbsp; &nbsp; 39.798 sec &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 44.717 sec &nbsp; &nbsp; &nbsp; &nbsp;36.962 sec<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;32.317 sec<br>

<br>

Time Command &nbsp; &nbsp; 0m41.527s<br>

0m46.571s &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0m36.218s<br>

<br>

<br>

3. Parallel<br>

 &nbsp; &nbsp; &nbsp;8 core<br>

<br>

OUTPUT &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 26.880 sec &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 33.746 sec &nbsp; &nbsp; &nbsp; 27.979 sec<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 30.371 sec<br>

<br>

Time cmd<br>

0m30.171s<br>

<br>

<br>

The optimization flags used:<br>

<br>

Intel ifort 10: &nbsp; &nbsp; &nbsp; &nbsp;-O3 &nbsp;-axW &nbsp;-funroll-loops &nbsp;(don&#39;t remember exact<br>

flag. Similar to loop unroll)<br>

<br>

Pathscale: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-O3 &nbsp;-OPT:Ofast &nbsp; -ffast-math &nbsp; &nbsp; &nbsp;-fno-math-errno<br>

<br>

GNU gfortran &nbsp; &nbsp; &nbsp;-O3 &nbsp; -ffast-math -funroll-all-loops &nbsp;-ftree-vectorize<br>

<br>

<br>

I&#39;ll try to use the further: <a href="http://directory.fsf.org/project/time/" target="_blank">http://directory.fsf.org/project/time/</a><br>

<br>

Thanks,<br>

Sangamesh<br>

<br>

<br>

On Thu, Sep 18, 2008 at 6:07 AM, Vincent Diepeveen &lt;<a href="mailto:diep@xs4all.nl" target="_blank">diep@xs4all.nl</a>&gt; wrote:<br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

How does all this change when you use a PGO optimized executable on both<br>

sides?<br>

<br>

Vincent<br>

<br>

<br>

On Sep 18, 2008, at 2:34 AM, Eric Thibodeau wrote:<br>

<br>

&nbsp;Vincent Diepeveen wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


Nah,<br>

<br>

I guess he&#39;s referring to sometimes it&#39;s using single precision floating<br>

point<br>

to get something done instead of double precision, and it tends to keep<br>

sometimes stuff in registers.<br>

<br>

That isn&#39;t a problem necessarily, but if i remember well floating point<br>

state<br>

could get wiped out when switching to SSE2.<br>

<br>

Sometimes you lose your FPU registerset in that case.<br>

<br>

Main problem is that there is so many dangerous optimizations possible,<br>

to speedup testsets, because in itself floating point is real slow to do<br>

at hardware,<br>

from hardware viewpoint seen.<br>

<br>

Yet in general last generations of intel compilers that has improved<br>

really a lot.<br>

<br>

</blockquote>

Well, running the same code here is the result discrepancy I got:<br>

FLOPS:<br>

 &nbsp;my code has to do: 7,975,847,125,000 (~8Tflops) ...takes 15minutes on<br>

8*2core Opeteron with 32 Gigs-o-RAM (thank you OpenMP ;)<br>

<br>

The running times (ran it a _few_ times...but not the statistical minimum<br>

of 30):<br>

 &nbsp;ICC -&gt; runtime == 689.249 &nbsp;; summed error == 1651.78<br>

 &nbsp;GCC -&gt; runtime == 1134.404 ; summed error == 0.883501<br>

<br>

Compiler Flags:<br>

 &nbsp;icc -xW -openmp -O3 vqOpenMP.c -o vqOpenMP<br>

 &nbsp;gcc -lm -fopenmp -O3 -march=native vqOpenMP.c -o vqOpenMP_GCC<br>

<br>

No trickery, no smoky mirrors ;) Just a _huge_ kick ASS k-Means<br>

parallelized with OpenMP (thank gawd, otherwise it takes hours to run) and a<br>

rather big database of 1.4 Gigs<br>

<br>

... So this is what I meant by floating point errors. Yes, the runtime was<br>

almost halved by ICC (and this is on an *opteron* based system, Tyan VX50).<br>

The running time wasn&#39;t what I was actually looking for rather than<br>

precision skew and that&#39;s where I fell off my chair.<br>

<br>

For the ones itching for a little more specs:<br>

<br>

eric@einstein ~ $ icc -V<br>

Intel(R) C Compiler for applications running on Intel(R) 64, Version 10.1<br>

 &nbsp; Build 20080602<br>

Copyright (C) 1985-2008 Intel Corporation. &nbsp;All rights reserved.<br>

FOR NON-COMMERCIAL USE ONLY<br>

<br>

eric@einstein ~ $ gcc -v<br>

Using built-in specs.<br>

Target: x86_64-pc-linux-gnu<br>

Configured with:<br>

/dev/shm/portage/sys-devel/gcc-4.3.1-r1/work/gcc-4.3.1/configure<br>

--prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.3.1<br>

--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include<br>

--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1<br>

--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/man<br>

--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.3.1/info<br>

--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.1/include/g++-v4<br>

--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec<br>

--enable-nls --without-included-gettext --with-system-zlib<br>

--disable-checking --disable-werror --enable-secureplt --enable-multilib<br>

--enable-libmudflap --disable-libssp --enable-cld --disable-libgcj<br>

--enable-languages=c,c++,treelang,fortran --enable-shared<br>

--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu<br>

--with-bugurl=<a href="http://bugs.gentoo.org/" target="_blank">http://bugs.gentoo.org/</a> --with-pkgversion=&#39;Gentoo 4.3.1-r1<br>

p1.1&#39;<br>

Thread model: posix<br>

gcc version 4.3.1 (Gentoo 4.3.1-r1 p1.1)<br>

<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Vincent<br>

<br>

On Sep 17, 2008, at 10:25 PM, Greg Lindahl wrote:<br>

<br>

&nbsp;On Wed, Sep 17, 2008 at 03:43:36PM -0400, Eric Thibodeau wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

&nbsp;Also, note that I&#39;ve had issues with icc<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

generating really fast but inaccurate code (fp model is not IEEE *by<br>

default*, I am sure _everyone_ knows this and I am stating the obvious<br>

here).<br>

<br>

</blockquote>

All modern, high-performance compilers default that way. It&#39;s certainly<br>

the case that sometimes it goes more horribly wrong than necessary, but<br>

I wouldn&#39;t ding icc for this default. Compare results with IEEE mode.<br>

<br>

-- greg<br>

<br>

<br>

</blockquote></blockquote>

<br>

</blockquote>

_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a><br>

To change your subscription (digest mode or unsubscribe) visit<br>

<a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

<br>

</blockquote>

<br>

<br></div></div>

------------------------------------------------------------------------<div class="Ih2E3d"><br>

<br>

_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a><br>

To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

</div></blockquote>

<br>

</blockquote></div><br></div>