Gigabit performance issues and NFS

Mon Jun 2 10:17:53 EDT 2003

Jakob,

For the switch I have a hp procurve 2708, from the netpipe results i'd say 
its a cut-through switch, however I may have not applied enough load to it 
to find out, and I have seen no data either way to determine otherwise.  On 
writing from the SGI to my Linux NFS server I have 4 nfsd's running on the 
Linux box, with all 4 in either a R or D state.  I've included copies of 
the top results for reference.   In regards to the use of async instead of 
sync in the exports, the speed was 10-14MB/s, there were again 2 nfsd's in 
R state and 2 in D state as before.  with a load average of ~ 4, each of 
the nfsd consuming ~ 3% cpu, and each of the biod's consuming about 
6%.    I hope that maybe some of these numbers/figures (while possibly 
excessive) might help the group shed some light onto my problem.

Thank you all again,

Doug Farley

For SGI Writing to Linux:

ping before:
         ----Linux-g.localdomain PING Statistics----
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max = 0.219/0.286/0.427 ms

----Linux-g.localdomain PING Statistics----
22889 packets transmitted, 22888 packets received, 0.0% packet loss
round-trip min/avg/max = 0.202/0.476/2.307 ms
   2072.5 packets/sec sent, 2072.5 packets/sec received

--- SGI-g.localdomain ping statistics ---
13 packets transmitted, 13 received, 0% loss, time 11993ms
rtt min/avg/max/mdev = 0.108/0.178/0.230/0.036 ms
/ping before

ping during:

----Linux-g.localdomain PING Statistics----
6 packets transmitted, 6 packets received, 0.0% packet loss
round-trip min/avg/max = 0.240/0.333/0.380 ms

----Linux-g.localdomain PING Statistics----
33140 packets transmitted, 33140 packets received, 0.0% packet loss
round-trip min/avg/max = 0.149/0.264/2.670 ms
   3687.1 packets/sec sent, 3687.1 packets/sec received

--- SGI-g.localdomain ping statistics ---
9 packets transmitted, 9 received, 0% loss, time 7999ms
rtt min/avg/max/mdev = 0.130/0.230/0.316/0.050 ms

--- SGI-g.localdomain ping statistics ---
25450 packets transmitted, 25449 received, 0% loss, time 11371ms
rtt min/avg/max/mdev = 0.076/0.243/2.221/0.059 ms, ipg/ewma 0.446/0.249 ms

/ping during

SGI:
top header before:
----
IRIX64 SGI 6.5 IP35          load averages: 0.02 0.00 0.00
57 processes:  56 sleeping, 1 running
8 CPUs: 99.4% idle,  0.0% usr,  0.4% ker,  0.0% wait,  0.0% xbrk,  0.2% intr
Memory: 8192M max, 7313M avail, 7264M free, 4096M swap, 4096M free swap
/top header before

top header during:
IRIX64 SGI 6.5 IP35          load averages: 1.28 0.48 0.18
61 processes:  59 sleeping, 2 running
8 CPUs: 83.6% idle,  0.1% usr,  7.3% ker,  6.2% wait,  0.0% xbrk,  2.9% intr
Memory: 8192M max, 7455M avail, 5703M free, 4096M swap, 4096M free swap

        PID       PGRP USERNAME PRI  SIZE   RES STATE    TIME WCPU% CPU% 
COMMAND
  511570519  511570519 dfarley   20  416K  304K sleep    0:29 22.2 30.16 ln
  511571530  511571530 root      21    0K    0K sleep    0:08  6.5  6.63 bio3d
    8975393    8975393 root      21    0K    0K sleep    0:07  6.1  6.53 bio3d
  511549378  511549378 root      21    0K    0K run/1   12:24  6.0  6.15 bio3d
  511563810  511563810 root      21    0K    0K sleep    0:08  6.5  5.74 bio3d
  511543776  511541950 root      20    0K    0K sleep   49:18  2.2  2.88 nfsd
  511568500  511568500 dfarley   20 2208K 1536K run/3    0:00  0.2  0.24 top
  511567378  511549203 dfarley   20 4208K 3104K sleep    0:00  0.0  0.02 sshd
       8928       8928 root      20 1808K 1088K sleep    1:02  0.0  0.01 prngd
        410        410 root      20 2512K 2512K sleep    1:34  0.0  0.01 ntpd
        381          0 root      20 2816K 2064K sleep    0:32  0.0  0.00 ipmon
/top header during
/SGI

Linux Box:
vmstat before:
    procs                      memory    swap          io     system 
  cpu
  r  b  w   swpd   free          buff    cache   si  so    bi     bo   in 
   cs  us  sy  id
  0  0  0  18980  18012  516412  418384  0   0     22    40   44    41   0 
  1   47
/vmstat before

vmstat during:
    procs                      memory    swap          io     system 
  cpu
  r  b  w   swpd   free          buff    cache   si  so    bi    bo   in 
      cs  us  sy  id
  0  0  5  18980   8552  518828 
445648   0   0    22    43   45          44   0   1  47
/vmstat during

top header before:
up 9 days, 17:03,  2 users,  load average: 0.33, 0.06, 0.00
72 processes: 70 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  0.0% user,  1.0% system,  0.0% nice, 99.0% idle
Mem:  1031204K av, 1015316K used,   15888K free,       0K shrd,  532128K buff
Swap: 2048276K av,   18980K used, 2029296K free                  422944K cached

/top header before

top header during:
up 9 days, 17:27,  2 users,  load average: 2.60, 0.76, 0.26
70 processes: 66 sleeping, 4 running, 0 zombie, 0 stopped
CPU states:  0.0% user, 25.6% system,  0.0% nice, 74.4% idle
Mem:  1031204K av, 1022200K used,    9004K free,       0K shrd,  517376K buff
Swap: 2048276K av,   18980K used, 2029296K free                  446512K cached

   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
14732 root      15   0     0    0     0 RW    4.2  0.0   1:21 nfsd
14725 root      15   0     0    0     0 DW    3.8  0.0   0:55 nfsd
14726 root      14   0     0    0     0 RW    3.8  0.0   2:08 nfsd
14729 root      14   0     0    0     0 DW    3.6  0.0   2:52 nfsd
     5 root       9   0     0    0     0 SW    0.6  0.0   1:12 kswapd
     6 root       9   0     0    0     0 SW    0.6  0.0  32:40 kscand
14730 root       9   0     0    0     0 SW    0.2  0.0   1:23 nfsd
     1 root       9   0   156  128    96 S     0.0  0.0   0:04 init
     2 root       9   0     0    0     0 SW    0.0  0.0   0:00 keventd
     3 root       9   0     0    0     0 SW    0.0  0.0   0:00 kapmd
/top header during

/Linux Box

==============================
Doug Farley

Data Analysis and Imaging Branch
Systems Engineering Competency
NASA Langley Research Center

< D.L.FARLEY at LaRC.NASA.GOV >
< Phone +1 757 864-8141 >

At 12:15 PM 6/2/2003 +0200, Jakob Oestergaard wrote:
>On Thu, May 29, 2003 at 09:35:54AM -0400, Doug Farley wrote:
> > Fellow Wulfers,
> >
> > I know this isnt 100% wulf related, although it is part of my wulfs setup,
> > but this is the best forum where everyone has alot of good experience.
>
>NFS is 'wulf related, whether we like it or not   :)
>
> >
> > Well Heres the deal, I have a nice 2TB Linux file server with an Intel
> > e1000 based nic in it.  And I have an SGI O3 (master node) that is dumping
> > to it with a tigon series gigabit card.  I've tuned both, and my ttcp and
> > netpipe performance average ~ 80-95MB/s which is more than reasonable for
> > me.  Both the fibre channel on my SGI and the raid (3ware) on my Linux box
> > can write at 40MB/s sustained, read is a little faster for both maybe ~
> > 50MB/s sustained.  I can get ftp/http transfers between the two to go at
> > 39-40MB/s, which again i'm reasonably happy with.  BUT, the part that is
> > killing me is nfs and scp.  Both crawl in at around 8-11MB/s with no other
> > devices on the network.
>
>11MB/sec with scp is quite good - considering everything is encrypted
>and what not...
>
>With NFS that's pretty poor though, I'd agree.
>
> >  Any exports from the SGI i've exported with the
> > 32bitclients flag, and i've pumped my r&wsize windows up to 32K, and 
> forced
> > nfs v3 on both Linux and Irix.  After spending a week scouring the web 
> I've
> > found nothing that has worked, and SGI support thinks its a Linux nfs
> > problem, which could be, but i'd like to get the opinion of this crowd in
> > hopes of some light!
>
>What does top and vmstat on your NFS server tell you?
>
>How many nfsd threads are busy (in R or D state), during the writes ?
>
>The default number of nfsd threads is 8, which may be a little low. I
>run 16 threads here, on a somewhat smaller NFS server (also with a Gbit
>NIC).   If you only see one or two nfsd threads in R or D state,
>anywhere near the top of your "top", then this should not be the
>problem.
>
>Try specifying the "async" parameter for the given mount in your exports
>file on the NFS server.  Just to see if this helps.  There are some
>considerations you need to make here - if the client does a sync() and
>you use the async option on the server, you are not guaranteed that the
>data has reached the disk platters by the time the client sync() call
>returns.  This may or may not matter for you.
>
>What does vmstat say during such a big write?  Is the CPU idle or busy,
>is it spending all it's time in the kernel?
>
>How's the ping between the machines, when doing the write and when the
>network is more idle?   You may have  a switch in between that does
>store-and-forward instead of cut-through, when the network gets loaded.
>Latency hurts NFS.
>
>--
>................................................................
>:   jakob at unthought.net   : And I see the elder races,         :
>:.........................: putrid forms of man                :
>:   Jakob Østergaard      : See him rise and claim the earth,  :
>:        OZ9ABN           : his downfall is at hand.           :
>:.........................:............{Konkhra}...............:

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf