[Beowulf] Re: ECC Memory and Job Failures (Huw Lynes)

David Mathog mathog at caltech.edu
Thu Apr 23 15:22:53 EDT 2009

Huw Lynes <lynesh at cardiff.ac.uk> wrote:

> http://blog.revolution-computing.com/2009/04/blame-it-on-cosmic-rays.html
> Apparently someone ran a large cluster job with both ECC and none-ECC
> RAM. They consistently got the wrong answer when foregoing ECC.

There were not very many details given.  I would not rule out the
possibility that the nonECC memory was slightly faulty, and that the
observed errors had nothing to do with gamma rays at all.  A better test
would have been to use the same ECC memory for both tests, and to turn
ECC memory correction on and off in the BIOS.


David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the Beowulf mailing list