interconnect latency, dissected.
johnt at quadrics.com
Wed Jul 2 06:01:43 EDT 2003
I agree with Joachim et al on the merit of the paper - it raises some
important issues relating to the overall efficacy of MPI in certain
In relation to IB there has been some work at Ohio State, comparing Myrinet
and QsNet. The latter however only discusses MPI, where as the UPC group in
the former discuss lower level APIs that suit better some algorithms as well
as being the target of specific compiler environments.
On the paper specifically at Berkeley my only concern is that there is no
mention on the influence of the PCI-Bridge implementation, not withstanding
its specification. For instance the system at ORNL is based on ES40 which on
a similar system gives an 8byte latency so...
prun -N2 mping 0 8
1 pinged 0: 0 bytes 7.76 uSec 0.00 MB/s
1 pinged 0: 1 bytes 8.11 uSec 0.12 MB/s
1 pinged 0: 2 bytes 8.06 uSec 0.25 MB/s
1 pinged 0: 4 bytes 8.35 uSec 0.48 MB/s
1 pinged 0: 8 bytes 8.20 uSec 0.98 MB/s
1 pinged 0: 524288 bytes 2469.61 uSec 212.30 MB/s
1 pinged 0: 1048576 bytes 4955.28 uSec 211.61 MB/s
similar to the latency and bandwidth achieved for the author's benchmark.
whereas the same code on the same Quadrics hardware running on a Xeon
(GC-LE) platform gives
prun -N2 mping 0 8
1 pinged 0: 0 bytes 4.31 uSec 0.00 MB/s
1 pinged 0: 1 bytes 4.40 uSec 0.23 MB/s
1 pinged 0: 2 bytes 4.40 uSec 0.45 MB/s
1 pinged 0: 4 bytes 4.39 uSec 0.91 MB/s
1 pinged 0: 8 bytes 4.38 uSec 1.83 MB/s
1 pinged 0: 524288 bytes 1632.61 uSec 321.13 MB/s
1 pinged 0: 1048576 bytes 3252.28 uSec 322.41 MB/s
It may also be the case that the Myrinet performance could also be improved
(it is stated as PCI 32/66 in the paper) based on benchmarking a more recent
PCI-bridge. These current performance measurements may lead to differing
conclusions w.r.t latency although there is still the issue of the two-sided
For completeness here is the shmem_put performance on a new bridge.
prun -N2 sping -f put -b 1000 0 8
1: 4 bytes 1.60 uSec 2.50 MB/s
1: 8 bytes 1.60 uSec 5.00 MB/s
1: 16 bytes 1.58 uSec 10.11 MB/s
> -----Original Message-----
> From: Joachim Worringen [mailto:joachim at ccrl-nece.de]
> Sent: 01 July 2003 09:03
> To: Beowulf mailinglist
> Subject: Re: interconnect latency, dissected.
> James Cownie:
> > Mark Hahn wrote:
> > > does anyone have references handy for recent work on interconnect
> > > latency?
> > Try http://www.cs.berkeley.edu/~bonachea/upc/netperf.pdf
> > It doesn't have Inifinband, but does have Quadrics, Myrinet
> 2000, GigE and
> > IBM.
> Nice paper showing interesting properties. But some metrics
> seem a little bit
> dubious to me: in 5.2, they seem to see an advantage if the "overlap
> potential" is higher (when they compare Quadrics and Myrinet)
> - which usually
> just results in higher MPI latencies, as this potiential (on
> small messages)
> can not be exploited. Even with overlapping mulitple communication
> operations, the faster interconnect remains faster. This is
> especially true
> for small-message latency.
> From the contemporary (cluster) interconnects, SCI is missing next to
> Infiniband. It would have been interesting to see the results
> for SCI as it
> has a very different communication model than most of the
> other interconnects
> (most resembling the T3E one).
> Joachim Worringen - NEC C&C research lab St.Augustin
> fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe)
> visit http://www.beowulf.org/mailman/listinfo/beowulf
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf