[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?

Mark Hahn hahn at mcmaster.ca
Mon Apr 6 18:50:51 EDT 2009

> An arrangement like this just muddies the situation even further. If I
> had a software problem, do I call cluster, or the 3rd-party hired to
> install the software?

if you had a problem that could be clearly attributed to whoever
sold you the package, you should call them.  they sold it, and if 
what they sold included support, they're on the hook.  of course, 
they might claim you broke it, but that's always an option for a vendor
wanting to avoid support.

for us, we currently run HP's RHEL version (XC) on our HP clusters;
it includes Platform LSF.  when we have problems, we open a ticket 
with HP - whether they use their own inhouse expertise or punt 
to Platform is up to them.  same as for hardware, really.

> I think you mean "buy a shrink-wrapped cluster from a well-respected,
> cluster-specific vendor that has proven in-house cluster expertise"

no, I don't - what's required is a paid-for support contract
and a vendor who takes it seriously.  if I bought a cluster with 
support, I'd go straight to the legal dept if I thought the vendor
wasn't living up to the contract...

all vendors are, by nature and necessity, interested in avoiding 
support costs.  there are always hoops to jump through - mainly,
I think, to filter out the randoms.  that is, enough friction to 
cause you to think about reading the docs first, but also enough
layers of support to keep their heaviest tech from explaining vi ;)

regards, mark hahn.
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

More information about the Beowulf mailing list