[Beowulf] Barcelona hardware error: how to detect
Greg Lindahl
lindahl at pbm.com
Thu Jun 5 14:30:20 EDT 2008
On Thu, Jun 05, 2008 at 10:09:58PM +0400, Mikhail Kuzminsky wrote:
> This was interesting for me also, because I
> have no information how this hardware problem may be affected in the
> "real life".
I have 4 chips with the bug, in 2 servers. I see about 1 lockup per
month with my workload, which doesn't include any VMs. (VMs are
reputed to trigger the bug quickly.) I found a webpage with the
details, and indeed this is what I see:
| The system may experience a machine check event reporting an L3
| protocol error has occurred. In this case, the MC4 status register
| (MSR 0000_0410) will be equal to B2000000_000B0C0F or
| BA000000_000B0C0F. The MC4 address register (MSR 0000_0412) will be
| equal to 26h.'
-- greg
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Beowulf
mailing list