[Beowulf] Transient NFS Problems in New Cluster
jlforrest at berkeley.edu
Tue Feb 2 17:00:37 EST 2010
I have a new cluster running CentOS 5.3.
The cluster uses a Sun 7310 storage server
that provides NFS service over a private
1Gb/s ethernet with 9K jumbo frames to the
We've noticed that a number of the compute
nodes sometimes generate the
automount: umount_autofs_indirect: ask umount returned busy /home
message. When this happens the program running on the
node dies. This has happened between 10 and 20 times.
We're not sure what's going on on a node when this
happens. Most of the time everything is fine and
the home directories are automounted without problem.
I've googled for this problem and I see that other people
have seen it too, but I've never seen a resolution,
especially not for RHEL5.
The auto.master line for this mount is
/home /etc/auto.home --timeout=1200
The network interface configuration is
eth0 Link encap:Ethernet HWaddr 00:30:48:B9:F6:52
inet addr:10.1.255.233 Bcast:10.1.255.255 Mask:255.255.0.0
inet6 addr: fe80::230:48ff:feb9:f652/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:32999308 errors:0 dropped:0 overruns:0 frame:0
TX packets:27468315 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:24225053296 (22.5 GiB) TX bytes:73313582546 (68.2 GiB)
Interrupt:74 Base address:0x2000
Any advice on what to do?
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf