[Beowulf] Need some advise: Sun storage' management server hangs repeatedly

Sangamesh B forum.san at gmail.com
Tue Jan 12 02:19:47 EST 2010


Hi HPC experts,

     I seek your advise/suggestion to resolve a storage(NAS) server'
repeated hanging problem.

     We've a 23 nodes Rocks-5.1 HPC cluster. The Sun storage of capacity 12
TB is connected to a management server Sun Fire X4150 installed with RHEL
5.3 and this server is connected to a Gigabit switch which provides cluster
private network. The home directories on the cluster are NFS mounted from
storage partitions across all nodes including the master.

   This server gets hanged repeatedly. As an initial troubleshooting we
installed Ganglia, to check network utilization. But its normal. We're not
getting how to troubleshoot it and resolve the problem. Can anybode help us
resolve this issue?

Thanks,
Sangamesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.clustermonkey.net/pipermail/beowulf/attachments/20100112/3854d372/attachment-0001.html>
-------------- next part --------------
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


More information about the Beowulf mailing list