[Beowulf] Barcelona hardware error: how to detect

Greg Lindahl lindahl at pbm.com
Thu Jun 5 14:30:20 EDT 2008


On Thu, Jun 05, 2008 at 10:09:58PM +0400, Mikhail Kuzminsky wrote:

> This was interesting for me also, because I 
> have no information how this hardware problem may be affected in the 
> "real life". 

I have 4 chips with the bug, in 2 servers. I see about 1 lockup per
month with my workload, which doesn't include any VMs. (VMs are
reputed to trigger the bug quickly.) I found a webpage with the
details, and indeed this is what I see:

| The system may experience a machine check event reporting an L3
| protocol error has occurred. In this case, the MC4 status register
| (MSR 0000_0410) will be equal to B2000000_000B0C0F or
| BA000000_000B0C0F. The MC4 address register (MSR 0000_0412) will be
| equal to 26h.'

-- greg



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Beowulf mailing list