[Beowulf] Re: ECC Memory and Job Failures (Huw Lynes)

Prentice Bisbal prentice at ias.edu
Fri Apr 24 08:55:55 EDT 2009

Gerry Creager wrote:
> David Mathog wrote:
>> Huw Lynes <lynesh at cardiff.ac.uk> wrote:
>>> http://blog.revolution-computing.com/2009/04/blame-it-on-cosmic-rays.html
>>> Apparently someone ran a large cluster job with both ECC and none-ECC
>>> RAM. They consistently got the wrong answer when foregoing ECC.
>> There were not very many details given.  I would not rule out the
>> possibility that the nonECC memory was slightly faulty, and that the
>> observed errors had nothing to do with gamma rays at all.  A better test
>> would have been to use the same ECC memory for both tests, and to turn
>> ECC memory correction on and off in the BIOS.
> Where's Jim Lux.  I'm sure he's an opinion on this, too...

Opinion? I think he could write a book on this topic!

Last time this issue came up, he included links to several papers on
this topic published by Boeing. As you go up in the atmosphere, the
[prevalence|probability|concentration] of cosmic rays goes up
significantly. Boeing has done a lot of research on this topic, since it
can affect the operation of their [products|weapons].

Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the Beowulf mailing list