[OT] Maximum performance on single processor ?

Joseph Landman landman at scalableinformatics.com
Sat Jun 21 09:37:35 EDT 2003

On Fri, 2003-06-20 at 13:43, Marc Baaden wrote:

> landman at scalableinformatics.com said:
> >>  As it does not appear that you have profiled the code (just a guess),
> Half-right half-wrong guess. I have very rapidly profiled the code, just
> confirming what I already guessed, that the bottleneck (> 80% of the time)
> is the energy calculation subroutine, which calculates pairwise interactions
> between atoms in the system.

Excellent.  I am glad I was half wrong :).

> There are indeed potential speedup factors in this code, like
> - introduction of a neighbour list
> - parallelization
> .. or maybe the use of a specialized board (MD-GRAPE) for just these 
> calculations.

One of the fastest speedup methods I have used for someone to accelerate
single processor performance of their code was to look at the pair
interactions, which required a square-root operation, and recast that
from force (which required the square-root operation) to potential
energy, which did not.  That gave me a factor of 5 in that loop which
had been dominating the calculation.  If you can live with 2 potential
energy evaluations at a fixed interval, you might be able to get away
from the square root.  You also want to rewrite the innards to get away
from division, even to the point of pre-calculating it and playing games

Most x86 do not have a hardware accelerated 1/x, or 1/sqrt(x).  Power4
ABI, MIPS n64 ABI, and the PA-RISC ABI do.  The compilers are even
"smart enough" in the case of the SGI compilers, or reasonable enough to
help recast your calculations this way.

Most of the pair-wise interactions I have encountered in research codes
has been a high precision pair potential, or a high precision sum over
all contributions from all interacting orbitals.  If you are dealing wit
MD (sounds like you are), you might look at whether or not you need as
high a precision.  

Other codes I have worked on use cut-off radii which limit the number of
neighbors (and the search through the system for neighbors with which to
compute interactions).  Square-roots tend to pop-up here as well, in the
distance calculation.  It is (usually) trivial (and still meaningful) to
avoid distance, and do a distance squared, so you don't have to pay for
the expensive square root.

> Unfortunately I currently do lack the competence for parallelization or other
> optimizations (probably trying to use SSE2 code would be even better), and
> part of the more specialized methods seems like a complicated and particularly
> short-lived business.

Understood.  You might explore with various groups whether or not they
could help you.  You might be able to work with a local computer
science, or physics/chemistry department to locate an exceptional
student who would gain from the experience of working with a company. 
You certainly could find a company to at least assess the situation for
you and let you know whether or not it is feasable.  If it were, I would
suspect the changes would cost less than 20k$.


> hahn at physics.mcmaster.ca said:
> >>  an athlon MP 2600+)
> >> a quite poor architecture for today; it was competitive a year ago.

I disagree with the characterization of "quite poor".  On certain
classes of code, it is significantly better than its direct competition,
though on other classes it is significantly worse.


> So - if I interpret your and other people's statements right - depending on
> the code, memory might be as important/critical as CPU power ?
> Is there a way to check on what the code depends most ?

Yes.  Using a combination of profiling techniques, and exploitation of
CPU profiling counters, you should be able to track the most expensive
portion of the calculation, and why it is most expensive.  See the
Oprofile bit on http://freshmeat.net
(http://freshmeat.net/projects/oprofile/?topic_id=45) and Troy Baer's
excellent lperfex (http://freshmeat.net/projects/lperfex/?topic_id=861).

Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
  web: http://scalableinformatics.com
phone: +1 734 612 4615

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list