[Beowulf] How Would You Test Infiniband in New Cluster?
bill at cse.ucdavis.edu
Tue Nov 17 17:46:43 EST 2009
Jon Forrest wrote:
> Bill Broadley wrote:
>> My first suggest sanity test would be to test latency and bandwidth to
>> you are getting IB numbers. So 80-100MB/sec and 30-60us for a small
>> would imply GigE. 6-8 times the bandwidth certainly would imply SDR or
>> better. Latency varies quite a bit among implementation, I'd try to get
>> within 30-40% of advertised latency numbers.
> For those of us who aren't familiar with IB utilities,
> could you give some examples of the commands you'd use
> to do this?
Here's 2 that I use:
So to compile, assuming a sane environment:
mpicc -O3 relay.c -o relay
The command to run an MPI program varies by environment and mpi
implementation, and batch queue environment (especially tight integration).
It should be something close to:
mpirun -np <number of nodes> -machinefile <list of nodes> ./relay 1
mpirun -np <number of nodes> -machinefile <list of nodes> ./relay 1024
mpirun -np <number of nodes> -machinefile <list of nodes> ./relay 8192
You should see something like:
size= 1, 16384 hops, 2 nodes in 0.75 sec ( 45.97 us/hop) 85 KB/sec
size= 1024, 16384 hops, 2 nodes in 2.00 sec (121.94 us/hop) 32803 KB/sec
size= 8192, 16384 hops, 2 nodes in 6.21 sec (379.05 us/hop) 84421 KB/sec
So basically on a tiny packet 45us of latency (normal for gigE), and on a
large package 84MB/sec or so (normal for GigE).
I'd start with 2 nodes, then if you are happy try it with all nodes.
Now for infiniband you should see something like:
size= 1, 16384 hops, 2 nodes in 0.03 sec ( 1.72 us/hop) 2274 KB/sec
size= 1024, 16384 hops, 2 nodes in 0.16 sec ( 9.92 us/hop) 403324 KB/sec
size= 8192, 16384 hops, 2 nodes in 0.50 sec ( 30.34 us/hop) 1054606 KB/sec
Note the latency is some 25 times less and the bandwidth some 10+ times
higher. Note the hostnames are different, don't run multiple copies on the
same node unless you intend to. Running 4 copies on a 4 cpu node doesn't test
So once you get what you expect I'd suggest something a bit more
comprehensive. Something like:
mpirun -np <number of nodes> -machinefile <list of nodes> ./mpi_nxnlatbw
I'd expect some different in latency and bandwidth between nodes, but not any
big differences. Something like:
[0<->1] 1.85us 1398.825264 (MillionBytes/sec)
[0<->2] 1.75us 1300.812337 (MillionBytes/sec)
[0<->3] 1.76us 1396.205242 (MillionBytes/sec)
[0<->4] 1.68us 1398.647324 (MillionBytes/sec)
[1<->0] 1.82us 1375.550155 (MillionBytes/sec)
[1<->2] 1.69us 1397.936020 (MillionBytes/sec)
Once those numbers are consistent and where you expect them (both latency and
bandwidth) I'd follow up with a production code that produces a known answer
and is likely to provide much wider MPI coverage.
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf