[Beowulf] Anyone having IPMI problems on Intel S3200 series motherboards?

Perry E. Metzger perry at piermont.com
Wed Apr 15 17:16:15 EDT 2009


Greg Lindahl <lindahl at pbm.com> writes:
> On Wed, Apr 15, 2009 at 04:51:57PM -0400, Perry E. Metzger wrote:
>
>> Unfortunately, every once in a while, the IPMI BMCs on my test systems
>> simply stop talking to the network. This isn't overly tragic since I can
>> have a process go over to such a board when it detects that pings have
>> stopped working and use a local IPMI command to cold rest the BMC, but
>> it is still really Not The Right Thing.
>
> Hey, you're lucky that you have a way to reset the BMC without power
> cycling the box. It is not unusual for IPMI implementations to be much
> buggier.

It is in the IPMI spec -- you can request a hard reset presuming the BMC
is responding at all -- luckily in this case it still responds locally.

Usage in ipmitool (which I've largely abandoned for freeipmi for the
moment since it seems less buggy):

# ipmitool bmc reset cold


>> Also, I suspect every once in a great while I'll get a simultaneous
>> OS and IPMI BMC failure and shoe leather will be needed to reset the
>> box, which I don't like.
>
> Belt and suspenders -- that's what remote-controlled power strips are
> for. It doesn't sound like you'd see a double-failure very often.

Unless this is a lot more unreliable than expected, my expected double
failure rate looks low enough that I'm not going to bother. Having to
reset a box here and there isn't that big a deal. I just wish it was not
even an issue...

Perry
-- 
Perry E. Metzger		perry at piermont.com
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



More information about the Beowulf mailing list