Article Index

  • Server CPUs Are Best For HPC (particularly the Opteron) ?
    In some cases they probably are best. If you look at SPEC numbers you will see that processors like the Pentium D hold their own against their larger siblings. While the SPEC benchmarks are an important yardstick, real application benchmarks often give another data point with which to compare processors. The GROMACS molecular dynamics package is known to push processors very hard and is therefore a good test of overall number crunching capability.

    In my tests, a more expensive Opteron 270 (2 Ghz) was on average 22% slower than a Pentium D 940 (3.2 GHz) when running the GROMACS single processor benchmarks.

  • More Sockets Are Better ?
    Cramming cores and CPUs on motherboards sounds like a good idea. A dual socket motherboard can now support four cores (and soon eight cores). In some cases this is a good idea, in others I am not so sure. There is much to understand about four cores sharing memory and optimum performance. In addition, the more cores on a motherboard the more eggs your put in one basket. A failed power supply or motherboard now takes out four cores.

    The recent introduction of the Intel Caretta motherboard (Model S3000PT) has been designed to address these issues. The Caretta motherboard supports the Intel Pentium 4/Intel Pentium D processor (Presler), four DIMM slots (DDR2 533/667 with ECC, 2-way interleaved, unbuffered), Integrated 2 port SATA 3.0Gb/s with RAID 0 &1, an ATI ES1000 (16MB), Dual gigabit Ethernet LAN, and 5.95 inch x 13 inch form factor. Interestingly, the form factor is one half the size of a Extended ATX (12"x13") motherboard. These dimensions allow a standard rack mount ATX enclosure to hold two Caretta motherboards allowing for higher density, less memory contention, and a lower impact of component failure.

    A standard cluster node can then hold two separate Caretta motherboards each with its own memory and power supply. The Caretta is only available through integrators. Contact them, they know about it.

  • Gigabit Ethernet Is Too Slow ?
    For some applications gigabit Ethernet is too slow. Particularly if you are trying to service four cores on one motherboard. Most people are not aware, however, that if properly tuned, gigabit Ethernet can be very effective for some application.

    Using Netpipe, systems were able to achieve a maximum throughput of 111 MBytes/sec and a single byte TCP latency of 36 μseconds using one of the on-board Ethernet ports.

  • Gigabit Ethernet Will Not Scale ?
    As part of the testing, I wanted to see if gigabit Ethernet could keep up with the processors and test to see the effect of the dual cores on performance. A full accounting of the numbers are in the White Paper. Some of the conclusion are quite interesting:

    The NAS benchmark was run on four, eight, and 16 cores. As would be expected, some codes (Integer Sort) did not scale well over gigabit Ethernet, however, in the case of the LU benchmark, 8 processors using 16 cores delivered a speed-up of 11.7 times for a total of 6.34 GFLOPS.

    For the GROMACS benchmark, the 8-way scaling produced a 6.5 times speed-up (7.57 GFLOPS) and was able to achive a 9.3 times speed-up (10.84 GFLOPS) using 16 cores.

  • Servers Are the Price-To-Performance Leaders ?
    Leveraging commodity hardware with the proper cluster software can provide quite astounding price-to-performance. Again, price-to-performance should always be cast in terms of an application. For example:

    If the price ratios for Pentium D 940 and Opteron 270 systems are combined with the GROMACS performance data, then the price-to-performance of the Opteron solution is almost double that of the Pentium D solution -- which means you spend almost double the money to get the same performance!

Next Steps

If I had more time, I would have tested many more applications and worked on improving the current numbers, but the performance picture quickly became clear. If you are interested in getting more bang for your buck, then test your assumptions and take a look at all the hardware options, even the ones you dismissed in the past. Here are a few steps that can help you get started:
  1. Download the White Paper and read it carefully.
  2. Consult experienced integrators, like Appro, about your applications and needs. They are qualified to to take commodity technology and turn it into a real industrial strength cluster.
  3. Consult an experienced software partner, like Basement Supercomputing, they understand how to get the most out of your cluster and will be there when it is time to upgrade or make changes.
  4. Finally, never assume! If possible test as many assumptions that you can and do not be afraid to rethink your position.
Clustering began in the mid 1990's with commodity-off-the-shelf (COTS) hardware. It was a good idea then and it appears to be a a good idea now.

Sidebar:Testing Methodology

Tests were conducted using eight dual core Intel Pentium D (model 940) Presler servers operating at 3.2 GHz. Each server used a Nobhill motherboard (Intel Model SE7230NH1) which is functionally equivalent to the Caretta motherboard, but larger in size. Each node had 8GB of DDR2 RAM and two gigabit Ethernet ports (only one of which was used for the testing). A SMC 8508T Ethernet switch was used to connect the servers. Ethernet drivers were also set to provide best performance for a given test. In addition, where appropriate, the MPI tests were run with "sysv" flag to cause the same processor cores to communicate through memory. Contact the author for details. The software environment was as follows:

Base OS: Linux Fedora Core 4 (kernel 2.6.11)
Cluster Distribution: Basement Supercomputing Baseline Suite
Compilers: gcc and gfortran version 4.0.2
MPI: LAM/MPI version 7.1.1


  • Appro International: Appro is a leading developer of high-performance, density-managed servers, storage subsystems, and high-end workstations for the high-performance and enterprise computing markets.
  • Benchmark and Author Contact Information: Raw benchmark data are available here. Douglas Eadline, PhD can be reached at deadline ( at ) basement-supercomputing ( period ) com
  • GROMACS: GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
  • The Basement Supercomputing Baseline Cluster Suite is collection of highly integrated and optimized Linux cluster software.
  • NAS Parallel Benchmark: These tests are a small set of programs designed to help evaluate the performance of parallel supercomputers. The benchmarks, which are derived from computational fluid dynamics (CFD) applications, consist of five kernels and three pseudo-applications.

You have no rights to post comments


Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.


Creative Commons License
©2005-2019 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.