[Beowulf] Re: SATA or SCSI drives - Multiple Read/write speeds.

Lombard, David N david.n.lombard at intel.com
Wed Dec 10 16:25:16 EST 2003


From: Robert G. Brown; Sent: Wednesday, December 10, 2003 11:08 AM
> 
> On Wed, 10 Dec 2003, Robin Laing wrote:
> 
> > > I have to ask.  Is it a P4?  Strictly empirically I have
experienced
> > > similar things even without filling memory.  I actually moved my
> > > fileserver off onto a Celeron (which it has run flawlessly)
because it
> > > was so visible, so annoying.
> >
> > Dell P4 with 512M ram.  IDE drive.
> 
> One thing Mark suggested (offline, I think) is that TOO MUCH memory
can
> confuse the caching system of at least some kernels.  Since I never
> fully debugged this problem, but instead worked around it (a Celeron,
> memory, motherboard, case costs maybe $350 and my time and annoyance
are
> worth much more than this) I don't know if this is true or not, but it
> got to where it could actually crash the system when it was running as
> an NFS server with lots of sporadic traffic.  It behaved like it was
> swapping (and getting behind in swapping at that), but it wasn't.  It
> may well have been a memory management problem, but it seemed pretty
> specific to that system.

This is very much like the kernel i/o tuning problems that I described
earlier, that were fixed by replacing the kernel (the offending kernel
was a 2.4.17 or 2.4.18), or in some cases, by tuning i/o parameters.

I first saw this on IPF systems with a very high-end I/O subsystem, I
later saw it on other fast 32-bit systems.  All involved significant I/O
traffic, -- the system would appear to hang for extended periods and
then continue on.  The impact ranged from annoying (the IPF) to
debilitating.  The underlying cause was in the use and retirement of
buffers by the kernel.  IIRC, the kernel got to the point of holding on
to too much cache, and then deciding it needed to dump it all before
continuing on.

As I said before, the problem was reported several times on the LK list.
The first reports were with really poor I/O devices, and were dismissed
as such, but later reports showed up with well configured I/O systems,
but any system with the right I/O load could trigger it.

-- 
David N. Lombard

My comments represent my opinions, not those of Intel.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list