[Beowulf] Surviving a double disk failure

Marian Marinov mm at yuhu.biz
Mon Apr 13 03:56:28 EDT 2009

On Friday 10 April 2009 23:15:54 David Mathog wrote:
> Billy Crook <billycrook at gmail.com> wrote:
> > As a very,
> > very, general rule, you might put no more than 8TB in a raid5, and no
> > more than 16TB in a raid6, including what's used for parity, and
> > assuming magnetic, enterprise/raid drives.  YMMV, Test all new drives,
> > keep good backups, etc...
> Thankfully I don't have to do this myself, not having data anywhere near
> that size to cope with, but it seems to me that backing up a nearly full
> 16TB RAID is likely to be a painful, expensive, exercise.
> Going with tape first...
> The fastest tape drives that I know of are Ultrium 4's at 120 MB/s.  In
> theory that could copy 1GB every 8.3 seconds, 1TB every 8300 seconds (
> AKA 138 minutes, or a bit over 2 hours), and for that 16 TB data set,
> something over 32 hours.  Except that there is no tape with that
> capacity, Max listed is still 800 GB, so it would take 20 tapes.  And
> really obtaining a sustained 120MB/s from the RAID to the tape is likely
> extremely challenging.  In any case, it looks like this calls for a tape
> robot of some sort, with many drives in it.  Not cheap.  On the plus
> side, transporting a box of 20 tape cartridges to "far away" is not
> particularly difficult, and they are fairly impervious to abuse during
> shipment.
> The other obvious option is to replicate the RAID.  Now if the duplicate
> RAID is on site, connected by a 1000baseT network, one could obtain a
> very similar transfer rate - and a full backup would take just as long
> as for the single tape drive (neglecting rewind and cartridge change
> times).  This at the expense of still losing all the data in some sort
> of sitewide disaster.  I can imagine, and suspect somebody has this
> already, implementing, a specialized disk->disk connect, such that one
> would plug Raid A into Raid B, and all N disks in A could copy
> themselves in parallel onto all N disks in B at full speed.  Assuming
> 1TB disks and a sustained 75Mb/sec read from A and write to B, the whole
> copy would be done in about 222 minutes.  Not exactly the blink of an
> eye, but a heck of a lot better than 32 hours.   Placing the backup RAID
> physically offsite would improve the odds of the data surviving, but
> reduce the bandwidth available, and moving the copied RAID physically
> offsite after each backup is a recipe for short disk lives.
> Since all of the obvious options are so slow, I expect most sites are
> doing incremental backups.  Which is fine, until the day comes when one
> has to restore the entire data array from two year's worth of
> incremental backups.  Or maybe folks  carry the tape incremental backups
> to the offsite backup RAID and apply them there?
> Is there an easier/faster/cheaper way to do all of this?

I had a client where we setup 2 servers in 2 different physical locations with 
good interconnect between them(1Gbit/s).

So both servers had identical hardware setup (RAID5 with 8x 1TB disks, 1 Hot 
spare and 2 NICs, one dedicated for backup and one for system usage).

What I did, was to setup a DRBD device between both machines so when there is 
a power outage in the first location or a disaster they had another server 20km 
away that was serving their data(this includes a MySQL, PostgreSQL and files).

This setup is used both as backup(DR) and failover.

Marian Marinov
Head of System Operations at Siteground.com

Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the Beowulf mailing list