[Beowulf] bizarre scaling behavior on a Nehalem

Rahul Nabar rpnabar at gmail.com
Mon Aug 10 18:28:59 EDT 2009

On Mon, Aug 10, 2009 at 12:48 PM, Bruno Coutinho<coutinho at dcc.ufmg.br> wrote:
> This is often caused by cache competition or memory bandwidth saturation.
> If it was cache competition, rising from 4 to 6 threads would make it worse.
> As the code became faster with DDR3-1600 and much slower with Xeon 5400,
> this code is memory bandwidth bound.
> Tweaking CPU affinity to avoid thread jumping among cores of the will not
> help much, as the big bottleneck is memory bandwidth.
> To this code, CPU affinity will only help in NUMA machines to maintain
> memory access in local memory.
> If the machine has enough bandwidth to feed the cores, it will scale.

Exactly! But I thought this was the big advance with the Nehalem that
it has removed the CPU<->Cache<->RAM bottleneck. So if the code scaled
with the AMD Barcelona then it would continue to scale with the
Nehalem right?

I'm posting a copy of my scaling plot here if it helps.


To remove most possible confounding factors this particular Nehlem
plot is produced with the following settings:

Hyperthreading OFF
24GB memory i.e. 6 banks of 4GB. i.e. optimum memory configuration

Even if we explained away the bizzare performance of the 4 node case
to the Turbo effect what is most confusing is how the 8 core data
point could be so much slower than the corresponding 8 core point on a
old AMD Barcelona.

Something's wrong here that I just do not understand. BTW, any other
VASP users here? Anybody have any Nehalem experience?

Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list