MPICH- Problem

Gabriel J. Weinstock gabriel.weinstock at
Wed Oct 24 15:12:21 EDT 2001

  I'm trying to get MPICH running on a 4 node cluster of PIII 1 GHz 
machines. the tstmachines program runs without error and the rsh mechanism is 
set up and functioning properly. LAM-MPI works out of the box, so we decided 
to use that for awhile, but we're going to need a production environment and 
MPICH seemed more suitable.
  Anyway, I compile the example `cpi.c' program, and do `mpirun -v -np 4 
cpi'. Nothing happens for a few minutes, then I get a flurry of `Connection 
failed for reason: : Connection timed out' messages, followed by

p1_10899: p4_error: Timeout in establishing connection to remote process: 0
p3_15707: p4_error: net_recv read: probable EOF on socket: 1
bm_list_4303: (378.120857) Listener: Unable to interrupt client pid=4302.

  We had a similar problem about 2 months ago which led us to abandon this 
implementation. There seem to be a number of people having this problem, but 
no one, and I mean no one, seems to know the answer. Any help would be 
greatly appreciated.

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list