[Beowulf] Problems with a JS21 - Ah, the networking...

Bruce Allen ballen at gravity.phys.uwm.edu
Sat Sep 29 01:02:10 EDT 2007


> The myrinet connection was working right, but sometimes a user program 
> just got stuck - one of the processes was sleeping, and all others were 
> running. Then, the program hangs.
>
> Any suggestions? I can provide any log necessary.

Ivan, you probably already know this, but if not it can be very useful. If 
your cluster is Linux based, then you can often use the 'strace' utility 
on the stuck user program to understand why it is sleeping, for example 
what message is it waiting for that is not arriving.  This might help you 
in diagnosing the problem.

Cheers,
 	Bruce
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

!DSPAM:46fddcf2318723122173853!



More information about the Beowulf mailing list