Getting the data where it needs to go is only half the story. getting it there quickly and with minimal latency is the issue with clusters. Whether it is one byte or a gigabyte, interconnects are the get the work done.

When the best solution just won't fit the box and the budget

Note: Although this hack seems to work for point to point communications, when used with many simultaneous messages, like MPI program, there are some stalls that reduce performance. NAS benchmarks and HPL results for other similar switch-less designs will be posted soon. Good news, for small numbers of nodes performance is quite good.

Modern Ethernet technology is based on network adapters and switches. Using Ethernet without a switch only happens in rare situations where a small number (e.g. two) systems need to be directly connected together. Such a connection is often called a "cross over" connection because a special cable may be needed.

The cost of adapters and switches follow a very predictable commodity pricing trend. At first the cost of systems is quite high and decreases as the sales volumes increase. Currently Gigabit Ethernet (GigE or GbE) enjoys low cost and wide availability from multiple vendors. Ten Gigabit Ethernet (10GigE or 10GbE) is now experiencing greater acceptance and thus decreased costs. Although volumes are growing, 10GigE still commands a high per port price (Adapter/Switch) and thus can be an expensive option for many small projects.

Nothing like real data from real machines running real problems, really

The best way to evaluate any technology is through benchmarks. Recently, I read this short white paper entitled CORE-Direct: The Most Advanced Technology for MPI/SHMEM Collectives Offloads. A long title, but it talks about a good idea Mellanox has added to it's end-to-end InfiniBand (IB) hardware stack. (By the way, Mellanox also has end-to-end 10 GbE solutions). After reading this paper, I wondered if there were any benchmarks for the CORE-Direct feature. After a little searching, I found some pdf presentation slides that are worth reviewing.

How to Benchmark TCP/IP Ethernet Performance

There are two ways to look at designing HPC clusters. On the positive side, there is a plethora of hardware and software options. On the negative side, there is a plethora of hardware and software options! The cluster designer has a special burden because unlike putting together one or two servers for the office, a cluster multiplies your decisions by N, where N is the number of nodes. A wrong decision can have two negative consequences. First, a fix for the problem will probably require more money and time. And, second, before the problem is fixed you may only be getting a fraction of the performance possible from your cluster. Clusters have a way of amplifying bad decisions.


Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.


This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.