Intel performance with loops vs. vectors -

David Lombard david.lombard at
Fri Aug 11 14:36:01 EDT 2000

"Lechner, David" wrote:
> Has anyone done any direct comparison of performance of modern Intel or AMD
> processors running floating point operations as both vector operations and
> as simple loops?  Does it make a difference any more?  Some of the results
> posted to this list seem to imply that loop calculations now run as fast as
> vector op.s on today's microprocessors.  We are considering some tests but
> would appreciate any insight or comment people would be willing to provide.

I'm not quite sure what your question is.

A "vector operation" implies the hardware has some sort of vector
instructions.  Intel has MMX and KNI instructions that provide specific
operations for very short vectors, i.e., 4x32-bit, &etc.  But, neither
Intel nor AMD have a general purpose vector capability such as found on
Cray or NEC systems.

By "vector operations", do you mean, for example, calling a BLAS
operation vs simply coding the loop directly?  If you do, then the first
point is that you've badly abused the terminology. At any rate, there
could be an advantage using BLAS or other library routines if the
library routines have advanced coding, either by guiding the compiler,
such as is done by Atlas, or by writing the function in assembly
language (as we do for our MSC.Nastran product), or both.

Another possible interpretation is: are today's Intel and AMD processors
as fast as vector systems?  For that, one must apply the standard
answer, "it depends".  If you have heavy integer and scalar operations
or other "poor" vector situations, then Intel can beat both Cray and
NEC.  If you have something that vectorizes well, you could hit 97% peak
theoretical on a T90 (as do we on a matrix-matrix multiply), and Intel
is very much slower.

David N. Lombard

Beowulf mailing list
Beowulf at

More information about the Beowulf mailing list