[Beowulf] Memory errors poll

Greg Lindahl lindahl at pbm.com
Mon Mar 30 20:48:26 EDT 2009


On Mon, Mar 30, 2009 at 01:11:20AM -0400, Mark Hahn wrote:
>> /Could those of you running ECC memory give me an updated figure on the
>> number of errors detected/corrected per day per system? /
>
> we replace dimms which show > 1000 corrected ECCs per day
> (or any overflows, for which counts are inaccurate, or any uncorrectable 
> errors.)

These systems are a couple of generations old, right?

I think I have Linux set up to record single-bit errors, and the rate
I get is basically zero oh, uh, 5 terabytes of modern ram, at sea
level.

When I installed some new memory I had a few systems with modest
numbers of single-bit upsets, and the vendor was happy to swap dimms
until the problem went away. I think he also does that during his
factory burn-in.

-- greg


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Beowulf mailing list