[Beowulf] Best Practices SOL vs Cyclades ACS

Marian Marinov mm at yuhu.biz
Sat Oct 10 01:33:23 EDT 2009


On Saturday 10 October 2009 08:09:45 Mark Hahn wrote:
> > We have more then 400 machines. Every month there is one machine that we
> > can not reboot using IPMI or the SOL is not working.
>
> we have something like 2500 nodes, mostly HP dl145g2's, and have a
> BMC-wedge probably 6-12 times/year.  can I ask what brand/model has such
> flakey IPMI? if you run "ipmi mc reset" on the node, does it resolve the
> problem? I wonder whether flakiness might also correspond to some config or
> usage pattern.  (ours dhcp from a local server - actually all the traffic
> is local.)

These are only Dell machines used for shared hosting. 

Usually these problem appear when there is DoS/DDoS or very high system 
resource usage(for example load over 100 on machine with 4 cores).

Our problem is that in such situations IPMI sometimes is unreliable as you can 
not connect on serial nor reboot the machine.

-- 
Best regards,
Marian Marinov
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list