[Beowulf] Software Raid
landman at scalableinformatics.com
Tue Dec 13 23:12:46 EST 2005
Michael T. Prinkey wrote:
> Honestly, I am wondering if the Software/Hardware RAID argument has
> devolved to the state of SCSI versus ATA or (heaven forfend) emacs versus
> vi. 8)
Possibly, though I am getting the sense that one of the participants is
missing some of the points being made, possibly due to a language issue.
> My experiences with hardware raid have been consistently lack luster over
> 7 years and several generations of hardware. My experiences with software
> RAID servers (specifically built for the task) have been largely positive.
> When I read comments extolling the virtues of hardware RAID solutions, I
> find myself constantly wondering if I could be missing something after
> some many years and many dozens of deployed units.
Hardware raid has a specific domain of utility, as does software raid.
There is some overlap. I am still not sure if I can use software RAID1
to do automatic (e.g. no fingers touching the keyboard) rebuilds on a
failed mirrored boot/root drive. I would like to.
Under very heavily loaded (heavy memory usage, heavy CPU usage)
situations, the possibility for a deadlock in the raid to file system
path or in contention for buffer manipulation versus memory space may
exist. I say may as I have not seen this under heavy load, but it is a
possible "corner case". I do see regular old file system access, say
with ext3, losing performance rather badly due to its journaling issues.
I haven't looked at ext3 code recently, so I cannot comment on
whether or not the journaling code is a point of serialization. Under
heavy load, you are far more likely to be running into ext3 limits than
the SW raid system. Of course you could use a better file system, in
which case you are more likely to stress the SW raid.
We use xfs on RAID0 for local disk performance on compute nodes. Doing
a little tuning, and we are hitting a fairly nice (sustained) 140 MB/s
across two SCSI disks on large block reads for some applications that
need this (per node), using SW raid. We are hitting about 120 MB/s on
SATA drives for a similar IO system.
> To provide numbers, I am really only concerned if the raid array can
> saturate the gigabit line feeding it. On-server performance is pretty
> useless as no work is ever done directly on the RAID servers. For reading
We have customers running atop/iftop all the time. Its nice to see your
NFS server pushing 300+ MB/s through the switches to the clients beating
on it. The problem we are running into are interrupts on the cards.
Most of the kernels are compiled with NAPI off, so there is no way
without rebuilding kernels to tell if this will mitigate a real live
interrupt tsunami from heavy NFS IO. For a number of reasons, we would
like to avoid rebuilding kernels (ask RGB why it is a bad idea).
> data, the server could certainly saturate gigabit...Bonnie on the NFS
> mount gave roughly 85 MB/sec for software RAID5. When we deployed an
> 8-drive RAID5 array using hardware RAID on the SATA 3ware card,
> performance was on the order of 15 MB/sec. We had initially deployed
> these raid servers using the hardware RAID5 setup, but we had several user
> complaints about poor storage performance. So we retooled with software
> RAID and the rest is history.
Use what works. We use both. HW raid where we must have
no-fingers-on-keyboard hotswap. SW raid for other things (local drive
performance). Note though that 3ware and others don't generally perform
well out of the box without some tweaking. After a little tuning
(blockdev and other bits), they can scream. If your server is
overloaded with interrupts from lousy network cards (grr), you probably
dont want to add more context switching sources (SW RAID).
Its a design choice. Both are good, both have domains of applicability.
Anyone suggesting otherwise might not be talking about the same thing
we are discussing here. File system bugs are nasty, and no block device
is going to save you from them, neither software nor hardware block device.
> Clearly, YMMV.
> On Tue, 13 Dec 2005, Joe Landman wrote:
>>Vincent Diepeveen wrote:
>>>>The remaining advantage of hardware is still hot-swapping
>>>>failed drives without having to shutdown the server.
>>>Those same nerds of above, they do not take into account that if
>>>something complex like a raid array gets suddenly handled in
>>>software instead of hardware, that even the tiniest
>>>undiscovered bug in a file system, will impact you.
>>As the raid device is being created at the block device level, and the
>>file system resides above this, a file system bug will be just as
>>detrimental to a hardware raid system as it would a software raid system.
>>Of course, you could have meant a bug in the software raid block device
>>driver. Yes such things do exist. So do bugs in the hardware raid
>>controllers. In *neither* case do you want to touch the buggy code.
>>Best case is completely innocuous behavior. Worst case, well, lets not
>>get into that.
>>Bugs can and do occur in any software. Whether burned into firmware,
>>written as VHDL/Verilog that creates the ASICs or FPGAs on the hardware
>>raid, or in the software raid block device.
>>>And be sure that there is bugs. So doing a hardware XOR (or whatever) in
>>>RAM of the raid controller instead of in the software, is a huge advantage.
>>RAID is *far more* than doing hardware XOR. Most XOR implementations
>>tend to be bug free given how atomic this operation is. The code around
>>it however occasionally has bugs. Firmware and software code.
>>>It reduces complexity of what software has to do, so it reduces the
>>>chance that a bug will occur in the OS somewhere, causing you to lose
>>>all your files.
>>No. Absolutely not. Software raid simply does in software what the
>>hardware may do in part in hardware. Any bug, anywhere in this process
>>(in either HW or SW raid) and you can have problems. Problems and bugs
>>are not just the provenance of SW raid.
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf