Infiniband: cost-effective switchless configurations

Alan Ward award at
Sun Jul 27 04:03:14 EDT 2003

If I understand correctly, you need all-to-all connectivity?
Do all the nodes need to access the whole data set, or only
share part of the data set between a few nodes each time?

I had a case where I wanted to share the whole data
set between all nodes, using point-to-point Ethernet connections
(no broadcast). I put them in a ring, so that with e.g. four nodes:

    A -----> B -----> C -----> D
    ^                          |
    |                          |

Node A sends its data, plus C's and D's to node B.
Node B sends its data, plus D's and A's to node C.
Node C sends its data, plus A's and B's to node D
Node D sends its data, plus B's and C's to node A.

Data that has done (N-1) hops is no longer forwarded.

We used a single Java program with 3 threads on each node:

- one to receive data and place it in a local array
- one to forward finished data to the next node
- one to perform calculations

The main drawback is that you need a smart algorithm to determine
which pieces of data are "new" and which are "used"; i.e. have
been used for calculation and been forwarded to the next node,
and can be chucked out to make space. Ours wasn't smart enough :-(

Alan Ward

En/na Mikhail Kuzminsky ha escrit:
>   It's possible to build 3-nodes switchless Infiniband-connected
> cluster w/following topology (I assume one 2-ports Mellanox HCA card
> per node):
>     node2 -------IB------Central node-----IB-----node1
>      !                                             !
>      !                                             !
>      ----------------------IB-----------------------
> It gives complete nodes connectivity and I assume to have
> 3 separate subnets w/own subnet manager for each. But I think that
> in the case if MPI broadcasting must use hardware multicasting,
> MPI broadcast will not work from nodes 1,2 (is it right ?).
> OK. But may be it's possible also to build the following topology
> (I assume 2 x 2-ports Mellanox HCAs per node, and it gives also
> complete connectivity of nodes) ? :
>   node 2----IB-------- C e n t r a l  n o d e -----IB------node1
>        \              /                      \           /
>          \          /                         \         /
>            \       /                           \      /
>              \--node3                         node4--
> and I establish also additional IB links (2-1, 2-4, 3-1, 3-4, not
> presenetd in the "picture") which gives me complete nodes connectivity.
> Sorry, is it possible (I don't think about changes in device drivers)?
> If yes, it's good way to build very small
> and cost effective IB-based switchless clusters !
> BTW, if I will use IPoIB service, is it possible to use netperf
> and/or netpipe tools for measurements of TCP/IP performance ?
> Yours
> Mikhail Kuzminsky
> Zelinsky Institute of Organic Chemistry
> Moscow
> _______________________________________________
> Beowulf mailing list, Beowulf at
> To change your subscription (digest mode or unsubscribe) visit

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list