[Beowulf] running out of rsh ports
Dan Stromberg
strombrg at dcs.nac.uci.edu
Wed May 3 16:07:02 EDT 2006
On Wed, 2006-05-03 at 15:21 -0400, Joe Landman wrote:
> David Simas wrote:
>
> > Except that it probably won't help with the problem, which I'm
> > guessing is caused by a given host attempting more than 1024
> > RSH connections to a given server in less than TCP TIME WAIT
> > seconds (minutes, whatever). If the original correspondent
>
> Actually it handles exactly these cases. The FANOUT variable lets you
> indicate the appropriate parallelism for rsh. I believe pdsh is in use
> on the big clusters ( > 1024 nodes at the national labs )
Nod. I was pleased to learn of pdsh. FWIW, loop doesn't try to run all
n at once either, though this degree of parallelism is controlled with a
command line option.
> > doesn't want to use SSH for RSH, which would fix things
>
> True, and you can use ssh with pdsh. Or rsh. With no syntax change to
> the end user.
>
> > SSH isn't restricted to low-numbered ports, he could try to
> > re-implement his application in MPI.
>
> The basic question a few of us have is exactly what is Bruce and team
> doing that is causing them to run out of ports. Once we see this, we
> can stop guessing and make better/targetted suggestions.
Yup, and strace/truss/whatever is your friend for that:
http://dcs.nac.uci.edu/~strombrg/debugging-with-syscall-tracers.html
...though based on the message, I'm guessing they are trying to run too
many rsh's in parallel, and hence running out of reserved ports.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list