|
Page 1 of 2
This installment of the Best of the Beowulf Mailing List discusses issues about
Serial ATA (SATA) drives, I/O benchmarks, cluster benchmarks, and booting from
solid state USB. You can consult the archives for the actual converstions.
SATA or SCSI drives - Multiple Read/Write Speeds
There was an interesting discussion on the beowulf mailing list that
started on December 8, 2003 with a posting from Robin Laing who asked
about SATA drives (Serial ATA) versus IDE drives (also called Parallel
ATA or PATA drives) versus SCSI drives. In particular, he wanted to
know which one was better for multiple drive read and write operations.
While the resulting discussion wasn't about clusters per say, disk
I/O (Input/Output) performance can have a great impact on many
cluster applications. Bill Broadley responded that there was some
bad information and biases floating around about drive performance
(e.g. IDE versus SCSI) and strongly suggested benchmarking your own
code or some disk benchmark, such as Bonnie++ or Postmark (Postmark seems to have gone away), that is
close to your application. He pointed out that there are many factors
that can be adjusted to affect I/O performance. The discussion then
broke into two parts. The first part part discussed opinions and test
results of PATA and SCSI drives and controllers.
The ever present Mark Hahn followed up with some general rules of thumb
for different types of drives and pointed out that Jeff Garzik is
writing all new SATA drivers for the 2.6 kernel that should greatly
improve performance of SATA drives under Linux. Robert Brown asked Mark
his general opinion about SATA. Mark answered Robert's questions and
pointed out that his next server will have SATA controllers in it
(hint, hint). David Lombard followed up that he personally liked SCSI
drives because in his experience he could get much higher I/O rates
regardless of the CPU load (PATA drives involve more CPU usage than
SCSI drives). David mentioned that he had seen an I/O rate of
290 MB/second on x86 systems using SCSI drives. Bill Broadley followed
up with a large number of questions and offered that he has seen PATA
drive arrays reach speeds of 300-400 MB/second even when the CPU load
was under fairly high. David and Bill discussed some technical details,
including the fact that these tests were done using multiple
controllers and running RAID-0 (stripping). Bill finally finished with
the comment that he saw greater I/O rates using XFS as opposed to ext3.
To reinforce Bill's comments, there was a recent posting on the
Linux-IDE-Arrays mailing list from Dan Yocum with some test results
for a SATA drive array that uses three 3ware 8506-8 SATA RAID cards.
The cards were configured with Hardware RAID-5 than striped with
software RAID-0 across the three cards. Using Bonnie++ on a 125 GB
file with 64 KB chunks, Dan was able to achieve about 230 MB/sec for
block writes and 520 MB/sec block reads.
The second branch of the discussion dealt with some observations about
the Linux kernel in relation to I/O performance. Robin Laing responded
to the initial discussion by stating that his application used one or
two files that were much larger than memory while his code was
running. He noticed that his machine 'stutters' for a few seconds
every time there is a disk access. Mark Hahn responded that he thought
the 'stutter' was not a drive problem but rather the a memory
management problem within the kernel. He offered the observation that
Linux seemed to over-cache and can get to a point where it's scavenging
scans (looking to dump cached data that it's not likely to need and
then re-caching data). Robert Brown followed this with some comments,
which Mark had sent him off-line, that too much memory seems to confuse
the caching system of some kernels. Robert also mentioned that when
he used a 1.8 Ghz P4 system as a server he also saw some "mini-delays."
When he took the exact same drives and put them in a 400 MHz Celeron
system he got better performance.
This discussion was very useful in pointing out the need to test the
entire system, from the exact hard drives, to the RAID configuration
to be used (if one is to be used), to the exact kernel and kernel
configuration, to determine the I/O performance. Using multiple
controllers (whether SCSI or PATA or SATA), RAID-0, and XFS seems
to provide the best I/O performance. However, you need to pay
attention to the kernel and kernel configuration to extract the
best performance possible.
New Cluster Benchmark
Bill Broadley posted to the Beowulf mailing list on November 17, 2003
about a better benchmark for clusters than the Top500 benchmark. This
post grew out of a discussion about Virginia Tech's fantastic Top500
result on their new Apple G5 cluster. Bill was interested in the
performance of larger clusters since they are starting to dominate
the Top500 benchmark. In particular he thought the big difficulty
for larger cluster was scaling, which is usually an interconnect
issue. Jakob Oestergaard responded that he thought the Top500 was a
fine benchmark for what it was. But it's definitely not a benchmark
that measure the true power of a cluster for one's particular
application. He thought that developing a series of benchmarks to
quantify a cluster's performance would render the benchmarks useless
(he was also the first to use the famous paraphrase, "There are
lies, damn lies, and statistics... "). Robert Brown joined in
stating that the one true benchmark was one's application. Robert
brought up the point that he thought Microbenchmarks (a
Microbenchmark tests only a small single aspect of a single system
or a cluster) were more appropriate for benchmarking machines. He
suggested something like Larry McVoy's lmbench benchmark suite.
Moreover, he thought Larry's insistence that lmbench results could
only be published if all of the results are published was a very
good idea (chips companies are notorious for only publishing
certain benchmark results that make their chips look good). He then
stated that in his opinion he would like to see a full suite of
microbenchmarks to test core functions "that are building blocks of
real programs." These would include some microbenchmarks to test
clusters. He finally finished with a typical Brownian comment
that the Top500 benchmark was really intended to measure the size
of one's, umm, cluster and nothing else.
John Hearns pointed out that the old Paralogic website has a link (since
moved to here) to a set of tools called
the Beowulf Performance Suite (BPS). Robert Brown followed up
that Doug Eadline (then editor of ClusterWorld Magazine) had done a
good job putting together BPS and perhaps in the future a gathering
of cluster experts could extend it and define a good and useful
series of cluster benchmarks. Doug replied that BPS is called a
Performance Suite, not a Benchmark Suite because it
should be used to generate a baseline to measure changes (good
or bad) to the cluster. Felix Rauch also
chimed in with some very good comments about measuring network
performance in clusters. Robert Brown really liked Felix's comments
and went on to talk about a network microbenchmark that would
watch the performance of the system and switch algorithms at the
appropriate time to improve performance.
The Top500 benchmark is a simple benchmark with a long history. It
has provided useful information about the general trend in high
performance computing, that is, the increasing dominance of
clusters. However, using it to say my cluster is faster than
ours is a bit like using the heights of the basketball players to
indicate how good they are. The heights are not an accurate indication
of how good a team is, and the Top500 is not a measure of how useful
a cluster is (although it is fun to play with). The discussion was
very useful in providing good suggestions about how benchmarking
for cluster should proceed.
|