[Beowulf] Fwd: H8DMR-82 ECC error
lindahl at pbm.com
Wed Aug 17 16:59:58 EDT 2011
> Memtest was ok, I done 9 cycles without any problems.
You should be using the HPL implementation of the Linpack benchmark
for testing memory. It exercises all of the memory and all of the
cores, and is what most HPC vendors seem to use for node burnin.
There's even a bootable DVD with a kernel with enhanced EDAC that was
mentioned here a while back.
> Hardware Error
> CPU0 Machine Check Exception 4 Bank 2 b200200000000863
> TSC 108dd369444
> Processor 2:40f13 Time 1311847912 Socket 0 APIC 0
> MC2-Status: Uncorredted error, report: yes MisV: invalid
> CPU context corrupt: yes UECC Error
> Bud Unit Error: prefetch/ECC error in data read from NB: local node originated
> Transaction type: prefetch (mem access), no timeout, cache level L3/generic.
> Participating Processors: local node originated (SRC)
And I take it that the location information given here (socket 0, bank
2) isn't useful?
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Beowulf