upgrading rh73 on an xCAT cluster
ctibirna at giref.ulaval.ca
Wed Sep 24 14:51:52 EDT 2003
Yesterday I upgraded (first time after 7 months... I know, I know) the rh73
rpms and the kernel. Since then, I have two nasty issues:
The update installed a new openssh (3.1.p1-14)
The auth of sshd through pam is annoyingly slower. All ssh connections (both
from outside to the master and from any node to any node inside) _are_
succeeding, but a lot slower. I see this in the /var/log/messages too:
Sep 24 13:16:04 n15 sshd(pam_unix): authentication failure; logname=\
uid=0 euid=0 tty=NODEVssh ruser= rhost=n01 user=root
Sep 24 13:16:04 n15 sshd(pam_unix): session opened for user root by\
Both messages are for the same ssh connection attempt and the attempt
succeeds, as I said. The only visible effect to the user is the slowness (the
first failure is followed by a programmed delay in pam).
I looked a bit around the 'net and people have already complained a lot about
this problem but I found no solution.
I also updated the kernel to 2.4.20-20.7 (redhat rpm).
Afterwards, my (and other users') SGE qmake jobs just get stuck in the middle
(i.e. function correctly for a while then suddenly just sit there and do
nothing for long time, without having completed). I feel it's some sort of
NFS lockup problem as the master node (NFS server) gets very high loads
(6.0-8.0) compared to before (2.0-3.0) the update of the kernel. The
/var/log/messages says nothing useful.
Did anybody already updated a rh73 cluster equipped with SGE and using ssh
internally? Observed these problems? Found solutions?
Thanks a lot.
Cristian Tibirna (1-418-) 656-2131 / 4340
Laval University - Quebec, CAN ... http://www.giref.ulaval.ca/~ctibirna
Research professional at GIREF ... ctibirna at giref.ulaval.ca
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf