[Beowulf] Can one Infiniband net support MPI and a parallel file system?
Craig.Tierney at noaa.gov
Wed Aug 6 14:47:03 EDT 2008
andrew holway wrote:
> It works(ish) and people are doing it but my research has shown that
> it is not yet stable. I have been talking to various companies
> offering lustre support. They have all told me that they can do it but
> none have been able to offer a reference site.
We are running our filesystems and MPI traffic over the same IB network.
We are having no problems with this configuration. The system consists
of two trees (each with ~70% bisection bandwidth) connected via an top level tree
to share IB communications between the filesystems and the compute nodes.
One side of the tree has ~350 woodcrest nodes, the other ~250 harpertown nodes.
We don't run jobs between the two systems, but both systems share the same
Just to complicate matters, we are supporting both Rapidscale and Lustre
(v22.214.171.124) on our nodes. The most obvious job contention we have seen
on the IB network is at the filesystem, not between the filesystem traffic
and the MPI traffic.
We had some issues with the subnet manager initially, but we have worked through
them. The latest version of Lustre has been quite stable in our environment.
As another posted already stated, I suspect that this configuration
would be of issue with codes that are very latency sensitive. Our
codes are more latency sensitive than bandwidth sensitive, and we haven't
seen any significant issues (and the configuration has been stable so far).
> I bet you the chaps behind roadrunner aren't going to be publishing
> any downtime figures.
> as mentioned by mark, If you try and force lots of stuff down the
> tubes you are going to break something. I guess its a _bit_ like
> torrents on a naff home router, lots and lots of small torrent
> connections filling up the nat table which cannot purge itself fast
> enough which mean that larger downloads time out as they fall off the
> bottom of the table.
> Data on how hard IB switches have to work would be interesting. I have
> a feeling that many people are taking their fabric to the very edge
> and back again!
> Perhaps someone can shed more light on the problems?
> On Tue, Aug 5, 2008 at 10:25 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
>> Hello Beowulf fans
>> Is anybody using Infiniband to provide both
>> MPI connection and parallel file system services on a Beowulf cluster?
>> I thought to have a storage node that would
>> serve a parallel file system to the beowulf nodes over IB
>> (something like a NFS on steroids).
>> The same IB net would also work as the MPI interconnect.
>> Is this design possible?
>> On a small cluster, does it require two separate IB physical networks (cards
>> and switch),
>> or can it be done with a single IB card per node and one switch?
>> Is this design efficient?
>> Are there other practical and cost effective alternatives to this idea?
>> Would this type of design work with GigE instead of IB?
>> I confess I know nothing about parallel file systems and IB.
>> So, please forgive me if my questions are nonsense.
>> I also appreciate any links to readings that would mitigate my ignorance
>> on these subjects.
>> Thank you,
>> Gus Correa
>> Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
>> Lamont-Doherty Earth Observatory - Columbia University
>> P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
Craig Tierney (craig.tierney at noaa.gov)
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf