[Beowulf] Two problems related to slowness and TASK_UNINTERRUPTABLE process

Tahir Malas tmalas at ee.bilkent.edu.tr
Wed Jun 13 08:37:08 EDT 2007

> -----Original Message-----
> From: Mark Hahn [mailto:hahn at mcmaster.ca]
> Sent: Tuesday, June 12, 2007 6:15 PM
> To: Tahir Malas
> Cc: mvapich-discuss at cse.ohio-state.edu; beowulf at beowulf.org;
> teoman.terzi at gmail.com; 'Ozgur Ergul'
> Subject: Re: [Beowulf] Two problems related to slowness and
> > For 32 processes (4 process per node), the arrays with 512-Byte size are
> > communicated slower than the 4096-Byte size arrays. For both of them, we
> do you mean that this is not the case in other configurations?
> an interconnect _should_ have some steep rise in effective bandwidth
> as packet size is increased.  it's a useful metric to know the packet
> size at which half-peak bandwidth is achieved, since this offers some
> "sense of scale" to programmers judging whether their own packet sizes
> are appropriate.

> > this abnormal case is persistent. More specifically, communication of
> > 4k-Byte packages are 2 times faster than the communication of 512-Byte
> > packages.
> perhaps I'm dense this morning, but what's unexpected about that?
Considering the latency and bw measures, my expectation for the
communication times in us:
512: 5.48 + 512/592.34 = 6.34
4096: 11.02 + 4096/906.04 = 15.54
Our test:
512: 29.434
4096: 16.209

So, somehow, isn't communication time for 512 bytes is unexpectedly slow?

> >
> > 2. SOMETIMES, after the test with overall 32 processes, one of the four
> > processes at node3 hangs in TASK_UNINTERRUPTABLE "D" state. Hence, the
> test
> > program shows a "done." and waits for sometime. We can neither kill the
> > process nor soft reboot the node. We have to wait for that process to
> > terminate, which can last long.
> does /proc/$pid/wchan (on the 'D' state process) tell you anything?
> do all the ranks return from MPI_Finalize?

The file tells "__lock_buffer". Yes, all ranks return; but I think, this
problematic process (i.e. one of the processes on node3) returns always the

Thanks, and regards,

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


More information about the Beowulf mailing list