pvm crashing
Iwan Maximus Cornelius
ic02 at uow.edu.au
Sun Sep 14 23:31:45 EDT 2003
Hi,
I am having some problems running a Monte Carlo physics
simulation on a 9 machine (all linux redhat 9.0) cluster using
pvm (version 3.4.4). The application I am running is composed
of a parent application and 9 identical daughter applications.
The parent essential does house keeping work, starting the pvm
daemon, adding machine etc, sets catchout and spawns the
daughters. It then sits in a loop, listening for messages from
the daughters. Being a Monte Carlo simulation this can result
in 10,000,000 or so messages being passed between daughters
and the parent. Well it should but the pvmd crashes
mysteriously after about 3,000,000 with the following error
message:
libpvm [t40001]: mxfer() mxinput bad return on pvmd sock
For some reason the pvmd is crashing. Could anybody suggest if
something here is obvious, or if not, could suggest a way of
debugging such a problem?
Thanks for your time,
Iwan
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list