[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
csamuel at vpac.org
Mon Apr 6 08:46:29 EDT 2009
----- "Rahul Nabar" <rpnabar at gmail.com> wrote:
> I contact Dell. Responses range from the clueless to absurd. First,
> they convinced us it was Fedora. So I shifted to CentOS. They still
> claim CentOS is "unvalidated" but I refuse to spend a fortune to move
> over to RHEL like they want me to.
Not that this helps, but you have my sympathy as I've
been dealing with the same stuff from IBM over a storage
server they sold us.
Turns out I can make 7-12 drives in their external
enclosures fail in short order (seconds to minutes
between failures) by telling the software RAID to
do a check, thus:
for i in md; do
echo check > /sys/block/$i/md/sync_action
Even though we could reproduce it on 64-bit Debian
and 32-bit CentOS they wouldn't escalate the issue
until we could reproduce it on RHEL5 - which we did
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Beowulf