<div>Joshua,</div>

<div>Great thanks. That was clear and the takeaway is that I should pay attention to the number of memory channels per core (which may be less than 1.0) besides the number of cores and the RAM/core. </div>

<div>&nbsp;</div>

<div>What is the &quot;ncpu&quot; column in Table 1 (for example)? Does the 4 refer to 4 cores, and the 1 and 2 cases don&#39;t use all the cores on the motherboard? Or is &quot;ncpu&quot; an application parameter? I read it as &quot;number of CPUs&quot;? I noted that the heart simulation didn&#39;t have an ncpu column, but that was why I thought you had multiple nodes going.

</div>

<div>&nbsp;</div>

<div>Thanks very much, </div>

<div>Peter</div>

<div>&nbsp;</div>

<div>P.S. and then where does the billiard cue go?<br><br>&nbsp;</div>

<div><span class="gmail_quote">On 3/8/07, <b class="gmail_sendername">Joshua Baker-LePain</b> &lt;<a href="mailto:jlb17@duke.edu">jlb17@duke.edu</a>&gt; wrote:</span>

<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">On Thu, 8 Mar 2007 at 11:33am, Peter St. John wrote<br><br>&gt; Those benchmarks are quite interesting and I wonder if I interpret them at

<br>&gt; all correctly.<br>&gt; It would seem that the Intel outperforms it&#39;s advantage in clockspeed (1/6th<br>&gt; faster, but ballpark 1/3 better performance?) so the question would be<br>&gt; performance gain per dollar cost (which is fine); however, for that heart

<br>&gt; simulation towards the end, it looks like the AMD scales up with increasing<br>&gt; nodecount enormously better, and with several nodes actually outperforms the<br>&gt; faster Intel.<br>&gt; Should I guess at relatively poor performance of the networking on the

<br>&gt; motherboard used with the intel chip or does that have anything to do with<br>&gt; the CPU itself?<br><br>Each benchmark was run on a single sytem with 4 CPUs (or, rather, 4 cores<br>in 2 sockets) -- there was no network involved.&nbsp;&nbsp;The difference (IMO) lies

<br>in the memory subsystems of the 2 architectures.<br><br>Opterons have 1 memory controller per socket (on the CPU, shared by the 2<br>cores) attached to a dedicated bank of memory via a Hypertransport link<br>(referred to from here on as HT).&nbsp;&nbsp;That socket is connected to the other

CPU socket (and its HT connected memory bank) by HT. Xeons (still) have a single memory controller hub to which the CPUs communicate via the front side bus (FSB).&nbsp;&nbsp;That single hub has 2 channels to memory.

<br><br>So, yes, clock-for-clock (and for my usage) Xeon 51xxs are faster than<br>Opterons.&nbsp;&nbsp;But, if your code hits memory *really hard* (which that heart<br>model does), then the multiple paths to memory available to the Opterons

<br>allow them to scale better.<br><br>This situation has existed for a long time on the Intel side.&nbsp;&nbsp;For P4<br>based Xeons it was crippling.&nbsp;&nbsp;The new Core based Xeons, however, don&#39;t<br>suffer nearly as badly (due to their big cache, maybe?).&nbsp;&nbsp;

E.g. the thermal<br>simulations in that same file are pretty memory intensive themselves, and<br>P4 based Xeons scaled *horribly* on them.&nbsp;&nbsp;But the 51xx Xeons still scale<br>very well on them (which surprised me).<br><br>

--<br>Joshua Baker-LePain<br>Department of Biomedical Engineering<br>Duke University<br></blockquote></div><br>

!DSPAM:45f044ce297541465223968!