[Beowulf] RE: SATA or SCSI drives - Multiple Read/write speeds.

Andrew Latham lathama at yahoo.com
Tue Dec 9 22:16:44 EST 2003


Amature thought but give it a read.

Would the advances in compressed filesystems like cramfs allow you to access
the 18gig of info on 6gig of ram. I do not know what the file type is and I am
assuming that it is not flat text (xml or other). If however you where working
on a dataset in xml at about 18gig would a compressed filesystem on 6gig of ram
be fast?

Andrew Latham
Wanna Be Employed :-)



--- Bill Broadley <bill at cse.ucdavis.edu> wrote:
> On Tue, Dec 09, 2003 at 04:37:32PM -0800, Lombard, David N wrote:
> > From: Bill Broadley [mailto:bill at cse.ucdavis.edu]
> > > 
> > > On Tue, Dec 09, 2003 at 07:03:12AM -0800, Lombard, David N wrote:
> > > > Very big pro:  You can get much higher *sustained* bandwidth levels,
> > > > regardless of CPU load.  ATA/PATA requires CPU involvement, and
> > > > bandwidth tanks under moderate CPU load.
> > > 
> > > I've heard this before, I've yet to see it.  To what do you attribute
> > > this advantage?  DMA scatter gather?  Higher bitrate at the read head?
> > 
> > Non involvement of the CPU with direct disk activities (i.e., the bits
> > handled by the SCSI controller)
> 
> Er, the way I understand it is with PATA, SCSI, or SATA the driver
> basically says Read or write these block(s) at this ADDR and raise
> an interupt when done.  Any corrections?
> 
> > plus *way* faster CPU to handle the
> > high-level RAID processing
> 
> I'm a big fan of software RAID, although it's not a SATA vs SCSI issue.
> 
> > v. the pokey processors found on most RAID
> > cards. 
> 
> Agreed.
> 
> > With multiple controllers on separate busses, I don't funnel all
> > my I/O through one bus.  Note again, I only discuss maximal disk
> > bandwidth, which means RAID-0.
> 
> Right, sorry I missed the mention.
> 
> > Direct measurement with both standard testers and applications.
> > Sustained means a dataset substantially larger than memory to avoid
> > cache effects.
> 
> Seems that it's fairly common to manage 300 MB/sec +/- 50 MB/sec from
> 1-2 PCI cards.  I've done similar with 3 U160 channels on an older
> dual P4.  The URL I posted shows the same for SATA.
> 
> > > > The highest SCSI bandwidth rates I've seen first hand are 290 MB/S
> > for
> > > > IA32 and 380 MB/S for IPF. Both had two controllers on independent
> > PCI-X
> > > > busses, 6 disks for IA32 and 12 for IPF in a s/w RAID-0 config.
> >                                                  ==========
> > > Was this RAID-5?  In Hardware?  In Software?  Which controllers?
> 
> > See underlining immediately above.
> 
> Sorry.
> 
> > > Do you have any reason to believe you wouldn't see similar with the
> > > same number of SATA drives on 2 independent PCI-X busses?
> > 
> > I have no info on SATA, thus the question later on.
> 
> Ah, well the URL shows a single card managing 250 MB/sec which decays
> to 180 MB/sec on the slower tracks.  Filesystems, PCI busses, and memory
> systems seem to start being an effect here.  I've not seen much more
> the 330 MB/sec (my case) up to 400 MB/sec (various random sources).  Even
> my switch from ext3 to XFS helped substantially.  With ext3 I was getting
> 265-280 MB/sec, with XFS my highest sustained sequential bandwidth was
> around 330 MB/sec.
> 
> Presumably the mentioned raidcore card could perform even better with
> raid-0 then raid-5.
> 
> > > I've seen 250 MB/sec from a relatively vanilla single controller
> > setup.
> > 
> > What file size v. memory.
> 
> 18 GBs of file I/O with 6 GB ram on a dual p4 1.8 GHz
> 
> > and what CPU load *not* associated with
> > actually driving the I/O?
> 
> None, just a benchmark, but it showed 50-80% cpu usage for a single CPU,
> this was SCSI though.  I've yet to see any I/O system PC based system 
> shove this much data around without significant CPU usage.
> 
> > Direct measurement with both standard testers and applications.
> > Sustained means a dataset substantially larger than memory to avoid
> > cache effects.
> 
> Of course, I use a factor of 4 minimum to minimize cache effects.
> 
> > You repeated my comment, "fairly high rates of cpu usage" -- high cpu
> > usage _just_to_drive_the_I/O_ meaning it's unavailable for the
> > application.  Also, are you quoting a burst number, that can benefit
> > from caching, or a sustained number, where the cache was exhausted long
> > ago?
> 
> Well the cost of adding an additional cpu to a fileserver is usually
> fairly minimal compared to the cost to own of a few TB of disk.  My
> system was configured to look like a quad p4-1.8 (because of hyperthreading)
> and one cpu would be around 60-80% depending on FS and which stage
> of the benchmark was running.  I was careful to avoid cache effects.
> 
> I do have a quad CPU opteron I could use as a test bed as well.
> 
> > The high cpu load hurts scientific/engineering apps that want to access
> > lots of data on disk, and burst rates are meaningless.
> 
> Agreed.
> 
> > In addition, I've
> > repeatedly heard that same thing from sysadmins setting up NFS servers
> > -- the ATA/PATA disks have too great a *negative* impact on NFS server
> > performance -- here the burst rates should have been more significant,
> > but the CPU load got in the way.
> 
> An interesting comment, one that I've not noticed personally, can
> anyone offer a benchmark or application?  Was this mostly sequential?
> Mostly random?  I'd be happy to run some benchmarks over NFS.
> 
> I'd love to quantify an honest to god advantage in one direction or
> another, preferably collected from some kind of reproducable workload
> so that the numerous variables can be pruned down to the ones
> with the largest effect on performance or CPU load.
> 
> -- 
> Bill Broadley
> Information Architect
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list