[Beowulf] Re: Cooling vs HW replacement

Greg Lindahl lindahl at pathscale.com
Fri Jan 21 17:15:06 EST 2005

On Fri, Jan 21, 2005 at 03:10:31PM -0500, Robert G. Brown wrote:

> Has anyone observed that a megahour is 114 years?  Has anyone observed
> that this is so ludicrous a figure as to be totally meaningless?  Show
> me a single disk on the planet that will run, under load, for a mere two
> decades and I'll bow down before it and start sacrificing chickens.
> Humans don't live a megahour MTBF.  Disks damn sure don't.

That's not what MTBF means.

A device has 3 phases in its life: infant mortality, middle age, and
old age. If you draw the failure rate, it looks like a bathtub:

F R \                                     /
a a  \                                   /
i t   \                                 /
l e    \_______________________________/
    infant        middle-age           old-age

The MTBF comes from the failure rate in middle age. It does not say
when old age starts. The MTBF is usually much longer than the start of
old age, because most disks survive to old age.

And yes, a megahour is the right scale for MTBF: that just means that
1 in 1400 disks dies per month in middle age. If middle age lasts
3 years, then 2.6% of disks will fail in middle age.

-- greg

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list