pvm crashing

Iwan Maximus Cornelius ic02 at uow.edu.au
Sun Sep 14 23:31:45 EDT 2003


Hi, 
 I am having some problems running a Monte Carlo physics
simulation on a 9 machine (all linux redhat 9.0) cluster using
pvm (version 3.4.4). The application I am running is composed
of a parent application and 9 identical daughter applications.
The parent essential does house keeping work, starting the pvm
daemon, adding machine etc, sets catchout and spawns the
daughters. It then sits in a loop, listening for messages from
the daughters. Being a Monte Carlo simulation this can result
in 10,000,000 or so messages being passed between daughters
and the parent. Well it should but the pvmd crashes
mysteriously after about 3,000,000 with  the following error
message:

libpvm [t40001]: mxfer() mxinput bad return on pvmd sock

For some reason the pvmd is crashing. Could anybody suggest if
something here is obvious, or if not, could suggest a way of
debugging such a problem? 
Thanks for your time, 
Iwan

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list