[Beowulf] RAM ECC errors
henning.fehrmann at aei.mpg.de
Mon Feb 22 09:04:14 EST 2010
we started monitoring the rate of correctable errors appearing in the RAM.
We also observed few uncorrectable errors. The corresponding kernel
module 'edac_core' can cause a Kernel Panic when such an event occurs,
which makes sense to avoid corrupted results.
Is there a way to get some useful information before the kernel panics?
In particular are we looking for the process list to find out which
user was running what before the UE errors occurred.
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf