Infiniband: cost-effective switchless configurations
Alan Ward
award at andorra.ad
Sun Jul 27 04:03:14 EDT 2003
If I understand correctly, you need all-to-all connectivity?
Do all the nodes need to access the whole data set, or only
share part of the data set between a few nodes each time?
I had a case where I wanted to share the whole data
set between all nodes, using point-to-point Ethernet connections
(no broadcast). I put them in a ring, so that with e.g. four nodes:
A -----> B -----> C -----> D
^ |
| |
--------------------------
Node A sends its data, plus C's and D's to node B.
Node B sends its data, plus D's and A's to node C.
Node C sends its data, plus A's and B's to node D
Node D sends its data, plus B's and C's to node A.
Data that has done (N-1) hops is no longer forwarded.
We used a single Java program with 3 threads on each node:
- one to receive data and place it in a local array
- one to forward finished data to the next node
- one to perform calculations
The main drawback is that you need a smart algorithm to determine
which pieces of data are "new" and which are "used"; i.e. have
been used for calculation and been forwarded to the next node,
and can be chucked out to make space. Ours wasn't smart enough :-(
Alan Ward
En/na Mikhail Kuzminsky ha escrit:
> It's possible to build 3-nodes switchless Infiniband-connected
> cluster w/following topology (I assume one 2-ports Mellanox HCA card
> per node):
>
> node2 -------IB------Central node-----IB-----node1
> ! !
> ! !
> ----------------------IB-----------------------
>
> It gives complete nodes connectivity and I assume to have
> 3 separate subnets w/own subnet manager for each. But I think that
> in the case if MPI broadcasting must use hardware multicasting,
> MPI broadcast will not work from nodes 1,2 (is it right ?).
>
> OK. But may be it's possible also to build the following topology
> (I assume 2 x 2-ports Mellanox HCAs per node, and it gives also
> complete connectivity of nodes) ? :
>
>
> node 2----IB-------- C e n t r a l n o d e -----IB------node1
> \ / \ /
> \ / \ /
> \ / \ /
> \--node3 node4--
>
> and I establish also additional IB links (2-1, 2-4, 3-1, 3-4, not
> presenetd in the "picture") which gives me complete nodes connectivity.
> Sorry, is it possible (I don't think about changes in device drivers)?
> If yes, it's good way to build very small
> and cost effective IB-based switchless clusters !
>
> BTW, if I will use IPoIB service, is it possible to use netperf
> and/or netpipe tools for measurements of TCP/IP performance ?
>
> Yours
> Mikhail Kuzminsky
> Zelinsky Institute of Organic Chemistry
> Moscow
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list