|
Page 2 of 2
Allocating Memory in a 2.4 kernel
On December third 2003, Roland Krause asked a fairly common question
on the Beowulf mailing list. Roland asked why he couldn't
allocate more than 2GB of memory in a continuous chunk using
the malloc function. He was running a code on a dual
Athlon system with 4 GB of memory. The system is running Red Hat 9
with a 2.4.20 kernel. He built the kernel with CONFIG_HIGHMEM
and CONFIG_HIGHMEM4G turned on so the kernel can address
all 4 GB of memory. He could allocate a total amount of memory
close to the 3GB limit. However, he could only allocated about
2GB in a continuous chunk and wanted to get closer to 3 GB in
a single chunk. The unpatched Linux kernel compiled with the
typical 4 GB options (as mentioned previously), splits memory
into two parts. The first 1 GB belongs to the kernel and the
remaining 3GB belongs to user applications. This is typically
called the 3/1 VM (Virtual Memory) split.
Mark Hahn quickly replied and gave us all a lesson in how memory
works on 32-bit Linux. He replied that the address space should
look something like the following:
0-128MB zero page
128MB + small program text
sbrk heap (grows up)
1GB mmap areana (grows up)
3GB - small stack base (grows down)
3GB-4GB kernel direct-mapped area
Mark explained that the 2 GB continuous chunk was allocated in
the mmap arena and its associated 2 GB limit. The other
1 GB was allocated in the sbrk heap. He also explained that
by statically linking your code, you could bypass the mmap
area entirely and see almost 3 GB for the heap or stack. He also
posted a very useful C code to explain and demonstrate all of
these concepts.
Mark also mentioned that it is possible to move these memory limits
around by modifying your kernel. He said that you could move the
default C base by adjusting the value of TASK_UNMAPPED_BASE
in the kernel source and rebuilding and installing the kernel. He
also said you could squeeze the 3 GB barrier for user applications
by adjusting the value of the kernel variable TASK_SIZE. There
is a popular patch available that moves TASK_SIZE so that you
have 3.5 GB available for user space applications and 0.5 GB for the
kernel. Mark also mentioned a patch that makes TASK_UNMAPPED_BASE
a variable in the /proc filesystem that you can adjust on the fly
so you do not have to recompile the kernel.
Finally, Mark commented that is a patch available that eliminates
the kernel's 1 GB area entirely. This is the 4G:4G patch from the
well-known kernel coder, Al Viro. The patch allows a full separate
4GB VM for the kernel and separate, full (and per-process)
4GB VM's for user applications. However, the patch can come at a
price because it can impact kernel performance.
Roland Krause replied back to the list that he modified his kernel
changing the value of the C variable (moving it
down) so that he could allocated enough memory for this application.
This short discussion provides some valuable insight into how memory
is allocated. Even though this isn't necessarily cluster specific,
it does help you tune the nodes in your cluster for your application.
It also points out how convenient it is to have the source code for
the kernel. It allows you to adjust certain kernel parameters to
match your needs. If anyone has any comments on kernel 2.6 memory management, please add a comment below. A good summary of Linux Memory can be found
here.
This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.
Jeff Layton has been a cluster enthusiast since 1997 and spends far
too much time reading mailing lists. He occasionally finds time to perform
experiments on clusters in his basement. He also has a Ph. D. in Aeronautical
and Astronautical Engineering and he's not afraid to use it.
Comment on this article
You must login to leave comments...
Other Visitors Comments
There are no comments currently....
|