Remember Commodity Clusters?

Published on Tuesday, 15 August 2006 21:00
Written by Douglas Eadline
Hits: 13309

Of Pentium D, Ethernet, and those assumption we all make

Recently, I did some benchmarking using Intel Pentium D® processors and gigabit Ethernet. The data are pretty impressive. If I were a non-technical person, I would probably say, Pentium D kicks ass, but you know, I like numbers and have a professional reputation to uphold. Therefore, in a professional sense I can say, Pentium D really kicks ass. To prove my point, this article presents some of the highlights from a recent white paper I prepared for Appro International called Achieving High Performance at Low Cost: The Dual Core Commodity Cluster Advantage. For a more complete description of the tests and results (including benchmark numbers) you probably want to download the white paper.

Back In The Day

Back when clusters started stirring up trouble in High Performance Computing (HPC) world, there were those that said things like, there is no way commodity hardware can stand up against real iron, or you cannot build a real supercomputer from PC parts. We all know how that turned out.

Today's cluster nodes typically have dual cores sitting in dual sockets connected by a low latency/high throughput network. In market terms, this is data center/sever level hardware -- the good stuff (and expensive). At the lower end of the spectrum is the desk-top hardware, which one would assume is not really up to snuff as far as HPC goes. You certainly cannot build a real supercomputer from this type of hardware, you probably have to use gigabit Ethernet for heavens sake! Sounds like an assumption to me. Some numbers are needed.

In the past, fellow monkey Jeff Layton and I have written about very low cost commodity computing where $2500 can get you 14.5 GFLOPS running HPL (the Top500 Benchmark). These results can easily be improved upon today as the tests were performed in 2004. Indeed, the introduction of low cost dual core processors combined with some innovative motherboards make the commodity proposition a very real alternative to the high end server hardware. Enough talk, let's get to the results because they tell the real story. {mosgoogle right}

Pentium D You Say?

For the tests, I used the recently introduced 3.2 GHz Pentium D (Presler) processor from Intel (which will eventually be replaced by the Xeon 3000 line). The Presler series is a dual-core processor manufactured using the latest 65nm process and is currently available at speeds up to 3.40 GHz. More importantly for HPC users, each Presler has 4 MB of on-chip cache which it divides evenly between the two cores (2 MB each). These caches are fed using an 800MHz FSB and DDR2 memory. We used eight of these to create a 16 core cluster.

Dangerous Assumptions

As a way to introduce the results, lets look at some of the assumptions currently floating around the HPC market, but first the standard advisory. As with all things cluster, performance depends on your application. If your application(s) do not behave like the benchmarks, then you may want to do your own testing. In my testing, I used the NAS Parallel Benchmark Suite and the GROMACS Molecular Dynamics package. You also may want to look at Parallel Molecular Dynamics: Gromacs by Erik Lindahl.

The cluster consisted of eight Pentium D 940 (3.2 GHz) processors (16 cores total), one per motherboard, connected with an SMC 8 port gigabit Ethernet switch. (See the Testing Methodology Sidebar at the end of the article for more information.) Based on my testing, the following assumptions may be worth checking:

{mosgoogle right}

Next Steps

If I had more time, I would have tested many more applications and worked on improving the current numbers, but the performance picture quickly became clear. If you are interested in getting more bang for your buck, then test your assumptions and take a look at all the hardware options, even the ones you dismissed in the past. Here are a few steps that can help you get started:
  1. Download the White Paper and read it carefully.
  2. Consult experienced integrators, like Appro, about your applications and needs. They are qualified to to take commodity technology and turn it into a real industrial strength cluster.
  3. Consult an experienced software partner, like Basement Supercomputing, they understand how to get the most out of your cluster and will be there when it is time to upgrade or make changes.
  4. Finally, never assume! If possible test as many assumptions that you can and do not be afraid to rethink your position.
Clustering began in the mid 1990's with commodity-off-the-shelf (COTS) hardware. It was a good idea then and it appears to be a a good idea now.

Sidebar:Testing Methodology

Tests were conducted using eight dual core Intel Pentium D (model 940) Presler servers operating at 3.2 GHz. Each server used a Nobhill motherboard (Intel Model SE7230NH1) which is functionally equivalent to the Caretta motherboard, but larger in size. Each node had 8GB of DDR2 RAM and two gigabit Ethernet ports (only one of which was used for the testing). A SMC 8508T Ethernet switch was used to connect the servers. Ethernet drivers were also set to provide best performance for a given test. In addition, where appropriate, the MPI tests were run with "sysv" flag to cause the same processor cores to communicate through memory. Contact the author for details. The software environment was as follows:

Base OS: Linux Fedora Core 4 (kernel 2.6.11)
Cluster Distribution: Basement Supercomputing Baseline Suite
Compilers: gcc and gfortran version 4.0.2
MPI: LAM/MPI version 7.1.1

References

  • Appro International: Appro is a leading developer of high-performance, density-managed servers, storage subsystems, and high-end workstations for the high-performance and enterprise computing markets.
  • Benchmark and Author Contact Information: Raw benchmark data are available here. Douglas Eadline, PhD can be reached at deadline ( at ) basement-supercomputing ( period ) com
  • GROMACS: GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins and lipids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
  • The Basement Supercomputing Baseline Cluster Suite is collection of highly integrated and optimized Linux cluster software.
  • NAS Parallel Benchmark: These tests are a small set of programs designed to help evaluate the performance of parallel supercomputers. The benchmarks, which are derived from computational fluid dynamics (CFD) applications, consist of five kernels and three pseudo-applications.

Unfortunately you have Javascript disabled, please enable Javascript in order to experience the comments correctly