Small GigE Switches

From Cluster Documentation Project
Jump to: navigation, search

Initially I started posting results for Open-MX over GigE on my Limulus Cluster. I used Netpipe MPI/TCP (2.4) for most of the tests. As Open MX requires Jumbo packets, I noticed that using Jumbo packets actually reduced the throughput! I'm still in the process of collecting data, so I cannot make any definite conclusions. I just noticed that the latest version of OpenMX (0.9.1) can run over standard frame sizes (1500). More results coming soon.

Kernel: 2.6.23
CPU: Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz
Interconnect: Intel 82572EI Gigabit Ethernet PCIe 1X
Switches: SMC 8505T (5 port), SMC 8508T (8 port), SMC GS16 (16 port), and a Cross-over Cable (Note: I tested a 5 port 3com and it would not only negotiate 100BT so it went back. I also tested an 8 port ProCurve and it work similarly to the SMC switch, more tests are needed.)
MPI: LAM, MPICH-MX (Open MX 0.6)

Summary Data[edit]

The following data use the 8505T, 8508T switches and a X-over cable. Netpipe MPI using MPICH-MX and LAM(TCP) are shown. In addition Netpipe TCP results are given for comparison. Click on the images to get high resolution versions.

Signature Graph
Latency Graph
Throughput Graph


Interesting Comparisons Frame Size[edit]

If some specific data are compared we find some surprising results. First, if you compare LAM using a 1500 byte MTU (frame), LAM using a 9000 byte MTU, and MPICH-MX using a 9000 byte MTU, you find the that best through put comes from smaller frame size! This result is opposite of what you would expect -- bigger frames better thoughput!

LAM/1500, LAM/9000, MPICH-MX/9000

The above result suggested another experiment. In the following graph LAM MPi was run over a range for MTU (frame) sizes. As the frame gets bigger, the throughput gets less! Something is not right here1

LAM run over range of MTU size

Perhaps it is the switch. The following results show the difference between the 8505T, 8508T and a cross over cable. These results would indicate the switches are not working well will large frames (although they are advertised to work with large frames)

Switch Effect on MPICH-MX

Next a better swtich was used, an SMC GS16 Switch. The same tests were run and the variation of MTU (frame) for LAM was recorded Note Jumbo frames still reduces throughput, except at 3000! This switch is of better quality than the small switches, maybe it is not a switch issue.

LAM run over range of MTU size

Mystery Is Solved[edit]

Turn off Flow Control! I turned off Flow Control using Ethtool and the jumbo packets at the high end got much better, but the variability got much worse (see results below). Also, the kernel is now 2.6.26.2 (Using Fedora 8 now) More tests are needed, but we see that flow control was the problem.

Netpipe TCP run over range of MTU size with Flow Control Off

There are still some issues to resolve and more tests are needed. (coming soon)

Using Ethtool[edit]

Here is the Ethtool sequence I used to turn off Flow Control (check the man page for Ethtool for a full description of options).

# ethtool -a eth2
Pause parameters for eth2:
Autonegotiate:  on
RX:             on
TX:             on

# ethtool -A eth2 autoneg off rx off tx off
# ethtool -a eth2
Pause parameters for eth2:
Autonegotiate:  off
RX:             off
TX:             off


I also set the InterruptThrottleRate to "dynamic". See ~/Documentation/networking/e1000.txt in the kernel source directory for an an explanation of this. From the e1000.txt file:

For situations where low latency is vital such as cluster or grid computing, the algorithm can reduce latency even more when InterruptThrottleRate is set to mode 1. In this mode, which operates the same as mode 3, the interruptThrottleRate will be increased stepwise to 70000 for traffic in class "Lowest latency".

The sequence below sets the InterruptThrottleRate (which is rx-usecs to Ethtool) to 1 (dynamic)

# ethtool -c eth2
Coalesce parameters for eth2:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 3
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0
tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0
# ethtool -C eth2 rx-usecs 1
# ethtool -c eth2
Coalesce parameters for eth2:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0
rx-usecs: 1
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0
rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0
rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0