How to Benchmark TCP/IP Ethernet Performance
There are two ways to look at designing HPC clusters. On the positive side, there is a plethora of hardware and software options. On the negative side, there is a plethora of hardware and software options! The cluster designer has a special burden because unlike putting together one or two servers for the office, a cluster multiplies your decisions by N, where N is the number of nodes. A wrong decision can have two negative consequences. First, a fix for the problem will probably require more money and time. And, second, before the problem is fixed you may only be getting a fraction of the performance possible from your cluster. Clusters have a way of amplifying bad decisions.
All is not lost, however. Choosing the right stuff is not that difficult. It takes some investigation, some testing, and asking the right questions. As an example, let's look at a scenario where you would like to retro-fit an older cluster with Gigabit Ethernet (GigE) or even build a new cluster using GigE. We are not going to do an exhaustive review of all GigE adapters, but rather demonstrate how one might benchmark a specific adapter. GigE adapters may also be integrated on motherboards in which case it is a good idea to test these as well. The common assumption that an on-board GigE adapter will run better than a separate adapter may not necessarily be true. Ultimately your application(s) should determine the hardware, but before getting too specific, we can develop a set of general protocols to help narrow down the hardware and software choices.
Before you bet your career on an assumption, why not do some testing. Indeed, let's probe a 32 bit adapter to see just how it measures up. Turning to the Internet again we find that the Netgear GA302T seems to be a good choice. It is based on a Broadcom chipset, works at both 33 and 66 MHz, and can be purchased for about $32 (including shipping!). Note: Since writing this article, the GA302T has gone out of production, which is all the more reason to use this article as a guide for testing Ethernet adapters!
Our testing checklist includes the following:
When all the above is complete, the testing can begin. Once the two adapters are up and running, using Netpipe is quite easy, see the Running Netpipe Sidebar. It should be mentioned that Netpipe can test TCP, MPI, and PVM performance. Newer versions can also test many popular high performance interfaces as well. For the purposes of this column, we will measure the TCP performance. It is best to perform the tests as root as you will be taking the interface up and down quite a bit.
As part of out testing we would like to answer the following questions.
To answer these questions we will place the adapters in both the 32 bit/33 MHz and 64 bit/66Mhz slots. We will also vary the MTU size.
Click here for a larger version of Figure One. The Netpipe authors also suggest plotting a "Network Signature Graph" (throughput vs time). This result is shown in Figure Two. A larger version of Figure Two is available here.
Netpipe is a great tool for testing the network performance of your cluster designs. Of course, there are many other factors we did not consider such as motherboards, chipset implementation, adding an MPI or PVM layer, and introducing a switch. Fortunately the effect of all these variables can be easily measured with Netpipe. Finally, you will notice that this type of information is not normally part of the product literature. Without proper testing, design decisions based on product data sheets and glossies is at best a guess and at worst a costly mistake. In future articles, we will examine other issues that influence cluster price and performance.
|Sidebar One: Configuring An Interface|
Adding a second Ethernet interface for testing purposes is not difficult. The most important thing is to use up-to-date kernels and drivers. The driver should
be compiled as a module so that it can be easily added or removed from the
kernel. Assuming you have two nodes that can communicate through a network, the follow steps, performed on each node, should allow you to easily bring up the test interface.
Enter the following command to load the Tigon 3 module (the module name may vary for the adapter under test).
# insmod tg3
The module should load successfully. Check the end of the output from
dmesg |tailIf you are using the tg3 module, you should see two lines similar to the following. Other adapter modules will produce a different message, but still list the Ethernet port. You may also want to check to see if the driver allows for any tunable parameters such as interrupt mitigation settings, which may effect latency.
eth1: Tigon3 [partno(AC91002A1) rev 0105 PHY(5701)] (PCI:33MHz:32-bit) \ 10/100/1000BaseT Ethernet 00:09:5b:22:cd:bc
The dmesg output tells us, among other things that the card is assigned to eth1 and the card is in 33Mhz:32 bit PCI slot.
Now that the driver is loaded and recognizes the card, we need to bring up the interface. Because we will be playing with a parameter (MTU size -- Ethernet packet size), we will use
ifconfigto assign the IP address (in this case 192.168.1.2) and start the interface.
# ifconfig eth1 inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 mtu 1500
If this command was successful, you should be able to issue a
ifconfig eth1command and get something similar to the following.
eth1 Link encap:Ethernet HWaddr 00:09:5B:60:18:E5 inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:869 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:18
Once this process is performed on both nodes, using different IP addresses of course, you should be able to successfully ping between the nodes. In our case, we used 192.168.1.1 and 192.168.1.2 as the IP numbers so we know the interface is communicating if we issue the following:
from the node whose interface we assigned as 192.168.1.2. Once the interfaces are up we can begin testing.
To vary any of the parameters (such as IP or MTU size), you simply take the interface down using an ifconfig Interface down
The MTU for both adapters must be the same.
|Sidebar Two: Running Netpipe|
From one of the test nodes, open a window and login to the other node. Start Netpipe in receive mode (-r) by entering:
Open a second window on the first node and enter:
NPtcp -t -h 192.168.1.2 -P -o NP_output_file
using the IP address for the receiving machine (-h option). The -P option tells Netpipe to print to the screen, the -o option produces an output file for plotting, and the -t options tells Netpipe to run in transmit mode. There are other options for Netpipe, but these will provide a basic test of the interface. Once it is running you should see something similar to the following:
Latency: 0.000035 Now starting main loop 0: 1 bytes 7241 times --> 0.23 Mbps in 0.000033 sec 1: 2 bytes 7511 times --> 0.46 Mbps in 0.000033 sec 2: 3 bytes 7473 times --> 0.68 Mbps in 0.000034 sec 3: 4 bytes 4956 times --> 0.91 Mbps in 0.000033 sec 4: 6 bytes 5601 times --> 1.37 Mbps in 0.000033 sec 5: 8 bytes 3734 times --> 1.81 Mbps in 0.000034 sec 6: 12 bytes 4642 times --> 2.69 Mbps in 0.000034 sec (continues)
The default Netpipe test is self limiting as the block size is increased from a single byte (by various non-standard increments) until the transmission time exceeds 1 second.
|Sidebar Three: Plotting Results|
The Netpipe output file can be easily plotted with Gnuplot. The plotting file for Figure One, a standard plot of Throughput vs. Blocksize, is as follows.
# gnuplot file for plotting Netpipe data # set title "Netpipe TCP - Throughput vs. Blocksize \n Netgear GA302T Adapter" set xlabel "Blocksize (Bytes)" set ylabel "Throughput (Mbits/s)" set logscale x set key bottom right #Uncomment to produce a png file #set terminal png picsize 1200 896 #set output "netpipe.throughput_vs_blocksize.png" # Uncomment these to produce an eps file #set terminal postscript monochrome "Helvetica" 10 #set pointsize .6 #set output "netpipe.throughput_vs_blocksize.eps" #set size 0.6,0.6 plot   \ "NP.1500.33-1" using 4:2 title "1500 MTU 33 MHz PCI" w linespoints, \ "NP.3000.33-1" using 4:2 title "3000 MTU 33 MHz PCI" w linespoints, \ "NP.1500.66-1" using 4:2 title "1500 MTU 66 MHz PCI" w linespoints, \ "NP.3000.66-1" using 4:2 title "3000 MTU 66 MHz PCI" w linespoints # wait so we can view it! (comment out when making files) pause -1
The output files produced by Netpipe are listed as part of the plot line (i.e. NP.1500.33-1, etc.) You can easily edit the Gnuplot file to view your tests results. To plot data, simply enter:
filename.gpis the name of the Gnuplot file similar to the one shown above. You can also generate other views of the data. See the Netpipe/Gnuplot documentation for more information.
To plot the "Netpipe Signature" graph shown in Figure Two, you can use this gnuplot file:
set title "Netpipe Data - Signature Graph (Throughput vs. Time)\n Netgear GA302T Adapter" set xlabel "Time" set ylabel "Throughput (Mb/s)" set logscale x #Uncomment to produce a png file #set terminal png picsize 1200 896 #set output "netpipe.network_signature_graph.png" # Uncomment these to produce an eps file #set terminal postscript monochrome "Helvetica" 10 #set pointsize .6 #set output "netpipe.network_signature_graph.eps" #set size 0.6,0.6 set key bottom right plot   \ "NP.1500.33-1" using 1:2 title "1500 MTU 33 MHz PCI" w linespoints, \ "NP.3000.33-1" using 1:2 title "3000 MTU 33 MHz PCI" w linespoints, \ "NP.1500.66-1" using 1:2 title "1500 MTU 66 MHz PCI" w linespoints, \ "NP.3000.66-1" using 1:2 title "3000 MTU 66 MHz PCI" w linespoints # wait so we can view it! (comment out when making files) pause -1
|Sidebar Four: Resources|
This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish visit Linux Magazine.