[Beowulf] SWAP management

Mark Hahn hahn at physics.mcmaster.ca
Thu Dec 11 12:02:39 EST 2003

> with scalar codes. I would like to learn more on how swap memory pages
> are handled by a Linux OS. 

in Linux, there is user memory and kernel memory.  the latter is unswappable,
and only for internal kernel uses, though that includes some user-visible
caches like dcache.  it's not anything you can do anything about, so I'll 
ignore it here.

user-level memory includes cached pages of files, user-level stack or sbrk
heap, mmaped shared libraries, MAP_ANON memory, etc.  some of this is what 
you think of as being part of your process's virtual address space.  other
pages are done behind your back - especially caching of file-backed pages.
all IO normally goes through the page cache and thus competes for physical
pages with all the other page users.  this means that by doing a lot of IO,
you can cause enough page scavenging to force other pages (sufficiently idle)
out to swap or backing store.  (for instance, backing store of an mmaped file
is the file itself, on disk.)

> My problem is that when I'm running a code, it starts swapping even if
> its memory requirements are lower than the total amount of memory
> availble. For exaple if there 750 Mb of memory, the program swaps when
> using only 450 Mb.

are you also doing a lot of file IO?

with IO, the problem is that pages doing IO are "hot looking" to the kernel,
since they are touched by the device driver as well as userspace.  the kernel
will tend to leave them in the pagecache at the expense of other kinds of
pages, which may not be touched as often or fast.  in a way, this is really
a problem with the kernel simply not having enough memory for the properties
of a virtual page.

> How can avoid such a thing to happen?

there is NOTHING wrong with swapping, since it is merely the kernel trying 
to find the set of pages that make the best use of a limited amount of ram.
a moderate amount of swap OUT traffic is very much a good thing, since 
it means that old/idle processes won't clutter up your ram which could be 
more effectively used by something recent.

the problem (if any) is swap IN - especially when there's also swapouts
happening.  when this happens, it means that the kernel is choosing the wrong
pages to swap out, and is winding up having to read them back in immediately.
this is called "thrashing", and barring kernel bugs (such as early 2.4
kernels) the only solution is to add more ram.

> One solution
> could be not to create any SWAP partition during the installation but I
> think this is a very dramatic solution. 

disk is very cheap; ram is still a lot more expensive.  a modest amount of 
swapouts are really a tradeoff: move idle ram pages into cheap disk so the 
expensive ram can be used for something more important.

> Is there any other method to force a code to use only RAM ?

of course: mlock.

> It seems that my Linux OS [RH 7.3 basically, in some cases SuSE 8.2]
> tries to avoid that the percentage of memory used by a single process
> becomes higher than 60-70 %.

I don't believe there is any such heuristic.  it wouldn't have anything to do 
with the distribution, of course, only with the kernel.

regards, mark hahn.

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list