Niclas Andersson nican at nsc.liu.se
Thu Apr 5 09:45:01 EDT 2001

> don't get me wrong - AMD makes good stuff.  but just because whiz-bang
> processor comes out doesn't mean you can do whiz-bang stuff right away.
> and if you're running an App that is *performance* bound, then it's
> *your* job to figure out how to make it run as fast as possible on the
> hardware you've got, and optimize it accordingly.  never trust a
> compiler to do it right.  you can easily get 50-500% improvements in
> your programs if you know how to do this.  the same thing applies to
> partitioning the communication latencies in parallel apps.  but as 
> always, this is application-specific and YMMV.

Yes, I definitely agree on that. The possible performance improvements from
optimization are getting larger and larger for every new architecture. However,
before I buy a new computer and from a technology watch point of view I want to
see that it runs faster generally speaking (the
between-tumb-and-finger-benchmark). OS, standard apps, middleware, etc., bits
and pieces that is available and I can not or will not manually optimize for
every architecture. The computer/compiler should do a good, general effort on
performance on these.  I can not motivate the high cost of a new P4 with only
possible future performance. Therefore I tend to use, as a first test, well
known benchmarks. After that it becomes application specific and often requires
more time and effort to optimize and benchmark. That amount of time is
difficult to get before a purchase is made unless you are talking about large
scale cluster.

I became enthusiastically suprised by the good NAS performance. A benchmark
suite is never telling the the whole story but I have always regarded the NAS
benchmark suite as a good collection of well written benchmarks that
often reveals strengths and weaknesses and gives an hint of how other codes
will perform even in other application fields. Apparently, I have to revise my
view here.

Unfortunately, I no longer have access to a P4. It was just a test to see how
much effort I would have to spend to get good performance out a few
applications. If I could get any boost from just choosing P4 or if I had to
work on the code and optimize it. I will wait for some SSE2 support in
compilers and then restart evaluation. In the mean time I'll stick to Athlons.
They work well in our clusters and provide good performance to our users.

www.anandtech.com has an article today on "Socket-A Chipset Comparison" that
can shed some light on the performance of DDR SDRAM.


