timm at fnal.gov
Fri Oct 5 09:32:10 EDT 2001
The issue is not only the cpu load on the server, if I understand
it correctly, but also the network load. 150 nodes doing
a simultaneous yp lookup is enough to make timeouts.
>From what I have seen (running 2.2.x kernels and ypbind-1.7-8)
there is also a yp lookup that happens as long as you have
files and nis in nsswitch.conf, even if the user is root and
happens to be found in files. Has anyone else seen this?
Steven C. Timm (630) 840-8525 timm at fnal.gov http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations
On Thu, 4 Oct 2001, Tim Carlson wrote:
> On Thu, 4 Oct 2001, Donald Becker wrote:
> > > If you were running
> > > 1000 small jobs in a couple of minutes I could imagine having problems
> > > authenticating against any non-local mechanism.
> > Hmmm, a reasonable goal is running a small cluster-wide job every
> > second. I suspect the NIS delays alone take longer than one second with
> > just a few nodes.
> So I ran the following test on one of our small clusters.
> 6 client NIS nodes with one NIS master (front end node) and no NIS slave
> servers. Dual 800Mhz Pentium IIIs connected on a fast ethernet switch.
> Forgive my sloppy C shell programming :)
> The "script" which is basically 100 rsh calls and some NIS work on
> looking up the ownership of a file.
> I am doing an ls on /tmp which contains only 3 or 4 files, but I own two
> of them so NIS is consulted for file ownership. I took NFS delays out by
> going to /tmp.
> set i=0
> while ($i < 100)
> rsh $1 ls -l /tmp > /dev/null
> set i=`expr $i + 1`
> [tim at frontend-0 tim]$ time ./script compute-0-0
> real 0m12.704s
> user 0m0.520s
> sys 0m0.440s
> So if the job takes zero time and connecting to a machine takes zero time
> then the NIS overhead is about 1/8 of a second. I ran this a half a dozen
> times and the run varied between 10 and 13 seconds.
> Now I point this script at 6 nodes at the same time (or at least as fast
> as I can type a return in 6 xterms) and the mean time per run is about 31
> seconds. That puts my potential NIS delay at a maximum of 1/3 of a
> second. But I have also launched 600 jobs in 31 seconds.
> Two examples from the larger test:
> [tim at frontend-0 tim]$ date; time ./script compute-0-0
> Thu Oct 4 21:06:53 PDT 2001
> real 0m30.905s
> user 0m0.600s
> sys 0m0.540s
> [tim at frontend-0 tim]$ date; time ./script compute-0-2
> Thu Oct 4 21:06:52 PDT 2001
> real 0m30.075s
> user 0m0.530s
> sys 0m0.710s
> Before and after "ps -ax | grep ypserv" on the master node.
> 639 ? S 73:08 ypserv
> 639 ? S 73:10 ypserv
> So I used 2 seconds of CPU time with ypserv
> My first version of the script was a "touch /tmp/testfile" and produced
> similar results. My /etc/nsswitch.conf files go "files nis" and the only
> entry in /etc/passwd on the compute nodes is root
> I am willing to be enlightened as to how my test is flawed. I'll run
> different tests if asked. Is my test too trivial?
> Tim Carlson
> Voice: (509) 376-0300
> Email: Tim.Carlson at pnl.gov
> EMSL UNIX System Support
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf