[Beowulf] Best Practices SOL vs Cyclades ACS
mm at yuhu.biz
Sat Oct 10 01:33:23 EDT 2009
On Saturday 10 October 2009 08:09:45 Mark Hahn wrote:
> > We have more then 400 machines. Every month there is one machine that we
> > can not reboot using IPMI or the SOL is not working.
> we have something like 2500 nodes, mostly HP dl145g2's, and have a
> BMC-wedge probably 6-12 times/year. can I ask what brand/model has such
> flakey IPMI? if you run "ipmi mc reset" on the node, does it resolve the
> problem? I wonder whether flakiness might also correspond to some config or
> usage pattern. (ours dhcp from a local server - actually all the traffic
> is local.)
These are only Dell machines used for shared hosting.
Usually these problem appear when there is DoS/DDoS or very high system
resource usage(for example load over 100 on machine with 4 cores).
Our problem is that in such situations IPMI sometimes is unreliable as you can
not connect on serial nor reboot the machine.
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf