Print
Hits: 9590

Nothing like real data from real machines running real problems, really

The best way to evaluate any technology is through benchmarks. Recently, I read this short white paper entitled CORE-Direct: The Most Advanced Technology for MPI/SHMEM Collectives Offloads. A long title, but it talks about a good idea Mellanox has added to it's end-to-end InfiniBand (IB) hardware stack. (By the way, Mellanox also has end-to-end 10 GbE solutions). After reading this paper, I wondered if there were any benchmarks for the CORE-Direct feature. After a little searching, I found some pdf presentation slides that are worth reviewing.

Both of the following presentations used OpenFOAM, a professional open source CFD (Computational Fluid Dynamics) software package. The first paper, OpenFOAM Performance With MPI Collectives Acceleration showed how you can get a performance increase of 17% using CORE-Direct collective offloads. Analysis of OpenFOAM shows that the MPI collective operation AllReduce generates most of the communication overhead (~80%). These results were for a "small" cluster (16 nodes, 192 cores) and further improvements are to be expected as the cluster size increases.

The second presentation has more OpenFOAM benchmarks that compare GigE (GbE), 10GigE (10GbE), and InfiniBand and is entitled; OpenFOAM Performance Benchmark and Profiling. In the tests, InfiniBand was shown to be up to 219% higher performance versus GigE and 109% higher than 10GigE using 24 Dell SC 1435 servers (dual 4-core AMD 2382 processors). Interestingly, if you compute the power cost savings due to the increased productivity (jobs getting done quicker), using IB can save up to $8400/year versus GigE and up to $6400/year versus 10GigE. Of course, your mileage may vary, but in general better efficiency does translate into better total cost of ownership.

Overall, the results seem to indicate Mellanox is on the right track with their offload technology. In terms of Ethernet versus IB, for OpenFOAM the results are clear when the number of nodes increases -- IB is a big winner. As always, benchmarking your applications is the best choice!