|
From the "I'll take eight department"
The Linux cluster world is moving towards InfiniBand for many reasons:
bandwidth, latency, message rate, N/2, price/performance, and other factors
that affect performance and price. But usually it's focused on larger systems, many
times greater 64 nodes up to multiple thousand nodes. At that same time the
reasons for moving to InfiniBand are still valid for smaller clusters,
particularly performance, but the economics are not. Basically InfiniBand is
just too expensive for smaller systems and usually does not make sense from a
price/performance perspective. But that has just changed...
The Rise of InfiniBand
InfiniBand has made a remarkable rise in performance since inception. Just a
few years ago, Single Data Rate (SDR) InfiniBand was the standard. SDR has a
10Gbit/s signaling rate and about a 8Gbit/s data rate
(recall that GigE is 1GBit/s for signaling and data). Coupled with these high
bandwidths was a much lower latency and CPU overhead.
The performance of InfiniBand was a very attractive feature that attracted cluster
people to it like moths to a flame.
The very first InfiniBand products were pricey. Shortly thereafter, the price started to drop to the point where
you could get SDR InfiniBand for less than $1,500 a node (includes the HCA or IB card,
cable, and switch port costs). Sometimes you could get it for less than $1,000
a node. In short order it became a much selected interconnect for clusters.
Not long after SDR was out, Double-Data Rate (DDR) InfiniBand came out. DDR
InfiniBand has a 20Gbit/s signaling rate and about a 16Gbit/s data rate.
Basically you had twice the bandwidth of SDR. In conjunction with the bandwidth
increase was a drop in latency. Initially DDR was priced just a bit above SDR,
but quickly DDR was priced the same as SDR. So now you could get twice the
bandwidth and lower latency compared to SDR for less than $1,200 a node.
Consequently, SDR all but disappeared.
Recently
Mellanox has announced that Quad-Data
Rate (QDR) InfiniBand silicon
for the HCA's was available and silicon for QDR switches would be available
soon. QDR InfiniBand now has a signal rate of 40Gbit/s and a data rate of about
32Gbit/s. You should start to see QDR HCA's and switches for purchase in late
Q3 or Q4 of this year.
Overall InfiniBand provides performance benefits to many applications
including
those that use MPI and also the those used in the traditional data centers
such as Oracle, VMWare, financial etc. The ever-growing demands for compute capabilities for those
applications drive the growth of InfiniBand.
A Quick Network Comparison
As you are probably aware of, the network can
have a big impact on code performance, particularly if you are running parallel
codes that use MPI (or God help you - PVM). Table One below lists some common publicly reported
interconnect characteristics for GigE, low-latency GigE, 10GigE, SDR InfiniBand
(two flavors), and DDR InfiniBand.
Table One - Common Network Characteristics
| Network |
Latency (microseconds) |
Bandwidth (MBps) |
N/2 (bytes) |
| GigE |
~29-120 |
~125 |
~8,000 |
| Low Latency GigE: GAMMA |
~9.5 (MPI) |
~125 |
~7,600 |
10 GigE: Chelsio (Copper) |
9.6 |
~862 |
~100,000+ |
| Infiniband: Mellanox SDR Infinihost (PCI-X) |
4.1 |
760 |
512 |
| InfiniBand: Mellanox Infinihost III EX SDR |
2.6 |
938 |
480 |
| InfiniBand: Mellanox Infinohost III EX DDR |
2.25 |
1502 |
480 |
| Infiniband: Mellanox ConnectX DDR PCIe Gen2
| 1 |
1880 |
256 |
I don't want to cover the details of these characteristics in this article
(here's an
article that might help despite it's age). You can see from the table that SDR
InfiniBand is still much better than GigE, low-latency GigE, or even 10GigE.
The Rise of SDR InfiniBand
IB is expensive for smaller clusters because the HCA's are fairly expensive
and most of the time, the smallest switch you could buy had 24-ports. So if you
only had, let's say, 4 to 8 nodes, than the per node
cost for the switch was just too high (a factor of 3-4 compared to 24 nodes).
But on the application performance side, smaller clusters could use InfiniBand,
particularly as the number of cores per node increases. The smaller clusters
don't necessarily need to huge bandwidth that DDR InfiniBand offers and many
times don't need the extremely low latency of DDR InfiniBand. The bandwidth and
latency of SDR InfiniBand will greatly help the applications. But InfiniBand is
has always been considered too expensive. Until now.
Mellanox
and Colfax International have teamed up
to bring back SDR but at a price point that makes it extremely attractive for
small clusters. At this point you're saying "Shut up and tell me the
prices!" As I tell my children, "Just relax" but I usually end
up with something thrown in my general direction. Since I don't want anyone to
thrown things at me, let's go over the prices. BTW - the website with all
of the prices is here.
Note: The HCA listed in Table Two does not seem to have recent public benchmark data available. Therefore, actual performance may differ from that shown in Table One.
Table Two - SDR Infiniband Pricing from Colfax
| Product |
Price ($) without shipping |
Colfax Product Description/Part Number |
| SDR HCA NIC PCI-Express x4 |
$125
| MHES14-XTC InfiniHost III Lx, Single Port 4X InfiniBand / PCI-Express x4,
Low Profile HCA Card, Memory Free, RoHS (R5) Compliant, (Tiger) |
| 8-port 4X SDR switch |
$750 |
Flextronics ODM model F-X430066, 8 Port 4X SDR InfiniBand switch |
| 24-port 4X 1U SDR Infiniband switch (Unmanaged)
| $2,400 |
Flextronics ODM, 4X SDR InfiniBand switch model F-X430060,
24-port 4X SDR w/ Media Adapter Support, one power supply |
| 0.5 meter SDR cable |
$35 |
MCC4L30-00A 4x microGiGaCN latch, 30 AWG, 0.5 meter |
| 1 meter SDR cable |
$39 |
MCC4L30-001 4x microGiGaCN latch, 30 AWG, 1 meter |
| 2 meter SDR cable |
$46 |
MCC4L30-002 4x microGiGaCN latch, 30 AWG, 2 meters |
| 3 meter SDR cable |
$52 |
MCC4L30-003 4x microGiGaCN latch, 30 AWG, 3 meters |
| 4 meter SDR cable |
$58 |
MCC4L28-004 4x microGiGaCN latch, 28 AWG, 4 meters |
| 5 meter SDR cable |
$65 |
MCC4L28-005 4x microGiGaCN latch, 28 AWG, 5 meters |
| 6 meter SDR cable |
$86 |
MCC4L24-006 4x microGiGaCN latch, 24 AWG, 6 meters |
| 7 meter SDR cable |
$93 |
MCC4L24-007 4x microGiGaCN latch, 24 AWG, 7 meters |
| 8 meter SDR cable |
$99 |
MCC4L24-008 4x microGiGaCN latch, 24 AWG, 8 meters |
So let's do a little math. Table Three below has the InfiniBand prices for 8 nodes.
Table Three - 8 nodes with SDR InfiniBand
| HCA |
Price ($) without shipping |
| HCA's (8 of them) |
$1,000 |
| 8-port SDR switch |
$750 |
| 1 meter CX-4 cables (8 of them) |
$280 |
| Total |
$2,030 |
| Price Per Node |
$253.75 |
So if you buy SDR InfiniBand for 8 nodes you will pay less than $255 a node!
(without shipping of course).
Let's do the same thing for a 24 node SDR cluster
Table Four - 24 nodes with SDR InfiniBand
| HCA |
Price ($) without shipping |
| HCA (24 of them) |
$3,000 |
| 24-port SDR switch |
$2,400 |
| 1 meter CX-4 cables (24 of them) |
$840 |
| Total |
$6,240 |
| Price Per Node |
$260.00 |
The price is slightly higher than for 8-ports because of the switch costs. I'm
not sure about you, but this is a fantastic price and is moving down in the
general direction of GigE! (Well, not quite, but it's getting there!)
How do I Get Me Some of That?
Ordering SDR InfiniBand at these prices is easy. Colfax International has
set up a
webpage that allows you to order on-line! Just
go to the page and place your order. If you need large quantities or special
arrangements please send an email to sales( you know what to put here) colfaxdirect.com.
Please Note: ClusterMonkey or any of its authors have no
financial interest in Colfax International. We just like cheap hardware.
To Infinity and Beyond!
I hate to end in a Buzz Light-year quote, but it seems somewhat appropriate.
For smaller clusters you usually had to rely on GigE as the
interconnect. Now you can afford to add SDR InfiniBand to these systems
without it being too expensive. So this means we now get a big boost in
performance on these smaller systems (including the one in my basement! Woo!
Hoo!).
Now we can truly begin to think outside the box or more like outside the server room.
We can start thinking about adding a parallel file system to these smaller
clusters or even think about exporting NFS over native IB protocols from the master
node. Also don't forget that you can run TCP over IB. (See the The OpenFabrics Alliance for the complete software stack.) Even with SDR InfiniBand you will get much faster TCP
performance over IB than GigE. So you can start thinking about applications or
places were GigE limits performance (anyone wants to play multi-player games
using IPoIB?).
Jeff Layton is having way too much fun writing this article, proving that it's
hard to keep a good geek down. When he's not creating havoc in his household,
he can be found hanging out at the Fry's coffee shop (never during working hours)
and admiring the shiny new CPUs that come in, and cringing when someone buys
Microsoft Vista.
Comment on this article
You must login to leave comments...
Other Visitors Comments
|