large filesystem & fileserver architecture issues.

Michael T. Prinkey mprinkey at aeolusresearch.com
Tue Aug 5 12:34:03 EDT 2003


On 5 Aug 2003, Nicholas Henke wrote:
> Definately does -- can you recommend hardware for the IDE RAID, or list
> what you guys have used ?
> 
> Nic
> 

I started building these arrays when 20 GBs was a big drive and hardware
ide raid controllers were very expensive.  So old habits die hard.  Most
of my experience has been with Software RAID in Linux.  We use Promise
Ultra66/100/133 controller cards, Maxtor 80 - 200 GB 5400-rpm drives, and
Intel-chipset motherboards.

I use the Promise cards, again because they were what was available and
supported in Linux in the late 90s.  They are limited to two IDE channels
per cards, but I have used 3 cards in addition to the on-board IDE in
large arrays before.  Some people buy the IDE Raid cards that have 4 or 8
IDE channels and then use Software RAID instead.  The conventional wisdom
is that you should only put one drive on each IDE channel to maximize
performance.  I have built arrays with single drive per channel and two
drives per channel and find that is not really true for ATA100 and faster
controllers.  Two of these drives cannot saturate a 100 or 133 MB/s
channel.

Typically, we put eight drives in an array.  I have been using a 4U rack
enclosure that has 8 exposed 5.25 bays.  This works well because mounting
the drives in a 5.25 bay gives a nice air gap for cooling.  Stacking 3 or
more drives tightly together heats the middle ones up quite a bit.  I also
usually use 5400-RPM drives to keep the heat production down.

I only use Intel chipset motherboards.  Normally just single CPU P4.  One
of the boards with 1 or 2 onboard gigabit controllers would be a nice
choice.  1 GB of RAM is more than enough, but do use ECC.  Also, if you
use the newest kernels, the onboard IDE controllers are fast enough to be
used in the array.  For an 8-drive array, I will normally use 1 promise
addin card and the two on-board channels.

Important Miscellany:

  - Power Supply.  Don't skimp.  400W+ from a good vendor

  - IDE cables <=24" long.  I tried to use the 36" IDE cables once and it
nearly drove me nuts with drive corruption and random errors.  The 24"  
ones work very well and usually give you enough length to route to 8
drives in an enclosure.  Once Serial ATA gets cheaper, this will no longer 
be an issue.

  - UPS.  In general, you can NEVER allow a power failure to take down the
raid server.  There is at least a 50% chance of low-level drive corruption
on an 8-drive array if it loses power.  (Don't ask about the time the
cleaning crew unplugged the array from the USP!)  We use a smart UPS and
UPS monitoring software (upsmon) to unmount the array and raidstop it if
the power goes out for more than 30 secs.  I am also tempted to not even
connect the power switch on the front panel.  Reseting a crashed system is
OK, but powering it off doesn't give the hard drives a chance to flush
their buffers to disk.  With 8+ spinning drives, there is a good chance at
least one of them will be corrupted.

  - Bonnie and burn-in.  There are many problems that can crop up when you
build the array.  IRQ issues, etc.  It is paramount that you throughly
abuse the array with something like bonnie to make sure that everything is
working.  I typically mkraid which starts the array synching, mke2fs on
the raid device, and then mount the filesystem and run bonnie on it all
while it is still synching.  This is pretty hard on the whole system and
if there is a problem, you will notice quickly.  Once it is done
resyncing, I usually run bonnie overnight to burn it in and verify that
performance is reasonable.

  - Fixing things.  If you do have a power failure and the raid doesn't 
come back up, it is usually do to a hard drive problem.  The only way to 
fix it is to run a low-level utility (Maxtor Powermax) on the drive.  
Maybe someone know how to do something similary within Linux.  If so, I 
would love to hear about it.

Again, our approach is not necessarily exhaustively researched.  This is
just "what we do."  So, take it for what it's worth.

Best,

Mike Prinkey
Aeolus Research, Inc.

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list