MPICH 1.2.5 failures (net_recv)

Mike Snitzer msnitzer at
Mon Jul 14 16:03:33 EDT 2003

On Fri, Jul 11 2003 at 10:13,
Jeff Layton <jeffrey.b.layton at> wrote:

> Good afternoon!
>    Our cluster has been recently upgraded (from a 2.2 kernel to a 2.4
> kernel). I've built MPICH-1.2.5 on it using the PGI 4.1 compilers,
> with the following configuration:
>    Does anybody have any ideas? I've I searched around the net a bit and
> the results  were inconclusive ("use LAM instead", may have bad NIC
> drivers, problematic TCP stack, etc.).

Hey jeff,

you might try compiling mpich with gcc to eliminate PGI as a potential
source of error.  This would at least allow you to verify the integrity of
the drivers, tcp stack, nic, etc.

PGI should be perfectly fine given the minimal mpich configure you
provided but the compiler is one variable that is easy enough to eliminate
as a potential problem. If you see the same problem with gcc compiled
mpich then there is a deeper issue.  You might confine the mpirun to use
only 2 nodes and then scale up accordingly.


Mike Snitzer                           msnitzer at
Linux Networx                 

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list