Home
Learning About Clusters
Programming Clusters
Administering Clusters
Benchmarking Clusters
File Systems for Clusters
Cluster Applications/Grid
Cluster News
Site Map
 
    Home
Search
Monkey Support
Main Menu
Home
News
Features
Columns
Reviews
Links
FAQ's
Contact
Site Information
Cluster Classifieds
Projects
Conference Reports
Cluster Agenda
Site Map
ClusterRant
Add This Article
Login Form





Lost Password?
No account yet? Register
Syndicate

Cluster Agenda

Cluster Builder

Appro International


InfiniBand for the Masses Print E-mail
Written by Jeff Layton   
Sunday, 27 January 2008

From the "I'll take eight department"

The Linux cluster world is moving towards InfiniBand for many reasons: bandwidth, latency, message rate, N/2, price/performance, and other factors that affect performance and price. But usually it's focused on larger systems, many times greater 64 nodes up to multiple thousand nodes. At that same time the reasons for moving to InfiniBand are still valid for smaller clusters, particularly performance, but the economics are not. Basically InfiniBand is just too expensive for smaller systems and usually does not make sense from a price/performance perspective. But that has just changed...

The Rise of InfiniBand

InfiniBand has made a remarkable rise in performance since inception. Just a few years ago, Single Data Rate (SDR) InfiniBand was the standard. SDR has a 10Gbit/s signaling rate and about a 8Gbit/s data rate (recall that GigE is 1GBit/s for signaling and data). Coupled with these high bandwidths was a much lower latency and CPU overhead. The performance of InfiniBand was a very attractive feature that attracted cluster people to it like moths to a flame.

The very first InfiniBand products were pricey. Shortly thereafter, the price started to drop to the point where you could get SDR InfiniBand for less than $1,500 a node (includes the HCA or IB card, cable, and switch port costs). Sometimes you could get it for less than $1,000 a node. In short order it became a much selected interconnect for clusters.

Not long after SDR was out, Double-Data Rate (DDR) InfiniBand came out. DDR InfiniBand has a 20Gbit/s signaling rate and about a 16Gbit/s data rate. Basically you had twice the bandwidth of SDR. In conjunction with the bandwidth increase was a drop in latency. Initially DDR was priced just a bit above SDR, but quickly DDR was priced the same as SDR. So now you could get twice the bandwidth and lower latency compared to SDR for less than $1,200 a node. Consequently, SDR all but disappeared.

Recently Mellanox has announced that Quad-Data Rate (QDR) InfiniBand silicon for the HCA's was available and silicon for QDR switches would be available soon. QDR InfiniBand now has a signal rate of 40Gbit/s and a data rate of about 32Gbit/s. You should start to see QDR HCA's and switches for purchase in late Q3 or Q4 of this year.

Overall InfiniBand provides performance benefits to many applications including those that use MPI and also the those used in the traditional data centers such as Oracle, VMWare, financial etc. The ever-growing demands for compute capabilities for those applications drive the growth of InfiniBand.

A Quick Network Comparison

As you are probably aware of, the network can have a big impact on code performance, particularly if you are running parallel codes that use MPI (or God help you - PVM). Table One below lists some common publicly reported interconnect characteristics for GigE, low-latency GigE, 10GigE, SDR InfiniBand (two flavors), and DDR InfiniBand.

Table One - Common Network Characteristics

Network
Latency
(microseconds)
Bandwidth
(MBps)
N/2
(bytes)
GigE ~29-120 ~125 ~8,000
Low Latency GigE: GAMMA ~9.5
(MPI)
~125 ~7,600
10 GigE: Chelsio
(Copper)
9.6 ~862 ~100,000+
Infiniband: Mellanox SDR Infinihost (PCI-X) 4.1 760 512
InfiniBand: Mellanox Infinihost III EX SDR 2.6 938 480
InfiniBand: Mellanox Infinohost III EX DDR 2.25 1502 480
Infiniband: Mellanox ConnectX DDR PCIe Gen2 1 1880 256


I don't want to cover the details of these characteristics in this article (here's an article that might help despite it's age). You can see from the table that SDR InfiniBand is still much better than GigE, low-latency GigE, or even 10GigE.

The Rise of SDR InfiniBand

IB is expensive for smaller clusters because the HCA's are fairly expensive and most of the time, the smallest switch you could buy had 24-ports. So if you only had, let's say, 4 to 8 nodes, than the per node cost for the switch was just too high (a factor of 3-4 compared to 24 nodes). But on the application performance side, smaller clusters could use InfiniBand, particularly as the number of cores per node increases. The smaller clusters don't necessarily need to huge bandwidth that DDR InfiniBand offers and many times don't need the extremely low latency of DDR InfiniBand. The bandwidth and latency of SDR InfiniBand will greatly help the applications. But InfiniBand is has always been considered too expensive. Until now.

Mellanox and Colfax International have teamed up to bring back SDR but at a price point that makes it extremely attractive for small clusters. At this point you're saying "Shut up and tell me the prices!" As I tell my children, "Just relax" but I usually end up with something thrown in my general direction. Since I don't want anyone to thrown things at me, let's go over the prices. BTW - the website with all of the prices is here.

Note: The HCA listed in Table Two does not seem to have recent public benchmark data available. Therefore, actual performance may differ from that shown in Table One.

Table Two - SDR Infiniband Pricing from Colfax

Product
Price ($)
without shipping
Colfax Product Description/Part Number
SDR HCA NIC PCI-Express x4
$125
MHES14-XTC InfiniHost III Lx, Single Port 4X InfiniBand / PCI-Express x4,
Low Profile HCA Card, Memory Free, RoHS (R5) Compliant, (Tiger)
8-port 4X SDR switch
$750
Flextronics ODM model F-X430066, 8 Port 4X SDR InfiniBand switch
24-port 4X 1U SDR Infiniband switch (Unmanaged)
$2,400
Flextronics ODM, 4X SDR InfiniBand switch model F-X430060,
24-port 4X SDR w/ Media Adapter Support, one power supply
0.5 meter SDR cable
$35
MCC4L30-00A 4x microGiGaCN latch, 30 AWG, 0.5 meter
1 meter SDR cable
$39
MCC4L30-001 4x microGiGaCN latch, 30 AWG, 1 meter
2 meter SDR cable
$46
MCC4L30-002 4x microGiGaCN latch, 30 AWG, 2 meters
3 meter SDR cable
$52
MCC4L30-003 4x microGiGaCN latch, 30 AWG, 3 meters
4 meter SDR cable
$58
MCC4L28-004 4x microGiGaCN latch, 28 AWG, 4 meters
5 meter SDR cable
$65
MCC4L28-005 4x microGiGaCN latch, 28 AWG, 5 meters
6 meter SDR cable
$86
MCC4L24-006 4x microGiGaCN latch, 24 AWG, 6 meters
7 meter SDR cable
$93
MCC4L24-007 4x microGiGaCN latch, 24 AWG, 7 meters
8 meter SDR cable
$99
MCC4L24-008 4x microGiGaCN latch, 24 AWG, 8 meters


So let's do a little math. Table Three below has the InfiniBand prices for 8 nodes.

Table Three - 8 nodes with SDR InfiniBand

HCA
Price ($) without shipping
HCA's (8 of them) $1,000
8-port SDR switch $750
1 meter CX-4 cables (8 of them) $280
Total $2,030
Price Per Node $253.75


So if you buy SDR InfiniBand for 8 nodes you will pay less than $255 a node! (without shipping of course).

Let's do the same thing for a 24 node SDR cluster

Table Four - 24 nodes with SDR InfiniBand

HCA
Price ($) without shipping
HCA (24 of them) $3,000
24-port SDR switch $2,400
1 meter CX-4 cables (24 of them) $840
Total $6,240
Price Per Node $260.00


The price is slightly higher than for 8-ports because of the switch costs. I'm not sure about you, but this is a fantastic price and is moving down in the general direction of GigE! (Well, not quite, but it's getting there!)

How do I Get Me Some of That?

Ordering SDR InfiniBand at these prices is easy. Colfax International has set up a webpage that allows you to order on-line! Just go to the page and place your order. If you need large quantities or special arrangements please send an email to sales( you know what to put here) colfaxdirect.com.

Please Note: ClusterMonkey or any of its authors have no financial interest in Colfax International. We just like cheap hardware.

To Infinity and Beyond!

I hate to end in a Buzz Light-year quote, but it seems somewhat appropriate. For smaller clusters you usually had to rely on GigE as the interconnect. Now you can afford to add SDR InfiniBand to these systems without it being too expensive. So this means we now get a big boost in performance on these smaller systems (including the one in my basement! Woo! Hoo!). Now we can truly begin to think outside the box or more like outside the server room.

We can start thinking about adding a parallel file system to these smaller clusters or even think about exporting NFS over native IB protocols from the master node. Also don't forget that you can run TCP over IB. (See the The OpenFabrics Alliance for the complete software stack.) Even with SDR InfiniBand you will get much faster TCP performance over IB than GigE. So you can start thinking about applications or places were GigE limits performance (anyone wants to play multi-player games using IPoIB?).


Jeff Layton is having way too much fun writing this article, proving that it's hard to keep a good geek down. When he's not creating havoc in his household, he can be found hanging out at the Fry's coffee shop (never during working hours) and admiring the shiny new CPUs that come in, and cringing when someone buys Microsoft Vista.

Comment on this article
You must login to leave comments...


Other Visitors Comments
There are no comments currently....
Last Updated ( Wednesday, 10 December 2008 )
 
< Prev Article   Next Article >
HPC Community
HPC Community: Open Software, Help, and Tips
Poll
How Long Do Your Applications Run ? (approximately)
 
Latest Stories/News
Popular
Cluster Ranting By Eadline
InsideHPC
  • The Week in Vis

    Randall Hand from VizWorld.com, the web's best site dedicated to computer graphics and scientific visualization, recap's the week's best stories related to supercomputing in the visualization and graphics industries. This week he talks about the use of simulations in Formula 1, shattering objects, and bringing down buildings.

  • Sun Video Presentation: Performance Tuning

    The Sun HPC Watercooler posted yet another helpful video presentation focused on the trials and tribulations of HPC.  This video is actually the first ‘module’ in a series entitled “An Introduction to Parallel Programming.’  The series will focus on the basics of parallel programming, debugging and general application development tips. In order to help developers and [...]

  • Green HPC podcast series, the transcript

    Just a quick note to update you on the Green HPC podcast series, which has gotten a tremendous response (thanks!). I’ve added a transcript of the first episode, in case you’re more of a reading person than a listening person. If you don’t know about the series yet, take a listen to the first episode. Get [...]

Who's Online
We have 6 guests online
Worldwide Front Page Visits

Locations of visitors to this page

Monkey Stats
Google PageRank modul - Camelpark SEO centrum

 

Creative Commons License
  ©2005-2008 Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.