[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?
landman at scalableinformatics.com
Mon Apr 6 09:11:50 EDT 2009
Chris Samuel wrote:
> ----- "Rahul Nabar" <rpnabar at gmail.com> wrote:
>> I contact Dell. Responses range from the clueless to absurd. First,
>> they convinced us it was Fedora. So I shifted to CentOS. They still
>> claim CentOS is "unvalidated" but I refuse to spend a fortune to move
>> over to RHEL like they want me to.
> Not that this helps, but you have my sympathy as I've
> been dealing with the same stuff from IBM over a storage
> server they sold us.
> Turns out I can make 7-12 drives in their external
> enclosures fail in short order (seconds to minutes
> between failures) by telling the software RAID to
> do a check, thus:
> for i in md; do
> echo check > /sys/block/$i/md/sync_action
Are these softirq cpu hangs?
could you tell me what
> Even though we could reproduce it on 64-bit Debian
> and 32-bit CentOS they wouldn't escalate the issue
> until we could reproduce it on RHEL5 - which we did
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Beowulf