[Beowulf] bizarre scaling behavior on a Nehalem

Rahul Nabar rpnabar at gmail.com
Mon Aug 10 12:41:06 EDT 2009

A while ago Tiago Marques had provided some benchmarking info in a
thread ( http://www.beowulf.org/archive/2009-May/025739.html ) and
some recent tests that I've been doing made me interested in this
snippet again:

>One of the codes, VASP, is very bandwidth limited and loves to run in a
>number of cores multiple of 3. The 5400s are also very bandwith - memory and
>FSB - limited which causes that they sometimes don't scale well above 6
>cores. They are very fast per core, as someone mentioned, when compared to
>AMD cores.

>These are the times I get from a benchmark I usually run in VASP:
>VASP on Core i7:
>        - 1 core = 162.453s, 162.778s (no HT)
>       - 2 cores = 100s,102s (no HT)
>        - 3 cores = 77.835s, 78.195s (no HT)
>        - 4 cores = 87.63s, 87.322s (no HT)
>        - 6 cores = *76.56s, 76.4s*
>        - 6 cores DDR3-1600 CAS9 - 69.654s, 68.816s, 67.7s
>HT doesn't add much but DDR3-1600 does. Still, ~78s is very fast with a
>quad-core because our dual 5400s can only do *91s* at best, even using
>tweaks like CPU affinity, which brings it down from 95s, by distributing
>only 3 threads per socket and not 4/2 or having 4 of them constantly jumping
>from socket to socket.

Apparently it shows that the Nehalems for VASP scale well only to 3
cores? Putting 4 cores on the job actually causes the runtime to
increase? This seems pretty bizzare to me at first sight but this
seems close to what I am getting as well. Any other people seen
similar scaling? (I am trying the cpu affinity flags now to see if
that makes a difference)

How would you explain this? In the past I've seen the codes scale well
to core numbers higher than this.

Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list