Gigabit Switches, Channel Bonding, Opterons, and Large Memory Allocations

Article Index

Allocating Memory in a 2.4 kernel

On December third 2003, Roland Krause asked a fairly common question on the Beowulf mailing list. Roland asked why he couldn't allocate more than 2GB of memory in a continuous chunk using the malloc function. He was running a code on a dual Athlon system with 4 GB of memory. The system is running Red Hat 9 with a 2.4.20 kernel. He built the kernel with CONFIG_HIGHMEM and CONFIG_HIGHMEM4G turned on so the kernel can address all 4 GB of memory. He could allocate a total amount of memory close to the 3GB limit. However, he could only allocated about 2GB in a continuous chunk and wanted to get closer to 3 GB in a single chunk. The unpatched Linux kernel compiled with the typical 4 GB options (as mentioned previously), splits memory into two parts. The first 1 GB belongs to the kernel and the remaining 3GB belongs to user applications. This is typically called the 3/1 VM (Virtual Memory) split.

Mark Hahn quickly replied and gave us all a lesson in how memory works on 32-bit Linux. He replied that the address space should look something like the following:


  0-128MB        zero page
  128MB + small  program text
                 sbrk heap (grows up)
  1GB            mmap areana (grows up)
  3GB - small    stack base (grows down)
  3GB-4GB        kernel direct-mapped area

Mark explained that the 2 GB continuous chunk was allocated in the mmap arena and its associated 2 GB limit. The other 1 GB was allocated in the sbrk heap. He also explained that by statically linking your code, you could bypass the mmap area entirely and see almost 3 GB for the heap or stack. He also posted a very useful C code to explain and demonstrate all of these concepts.

Mark also mentioned that it is possible to move these memory limits around by modifying your kernel. He said that you could move the default C base by adjusting the value of TASK_UNMAPPED_BASE in the kernel source and rebuilding and installing the kernel. He also said you could squeeze the 3 GB barrier for user applications by adjusting the value of the kernel variable TASK_SIZE. There is a popular patch available that moves TASK_SIZE so that you have 3.5 GB available for user space applications and 0.5 GB for the kernel. Mark also mentioned a patch that makes TASK_UNMAPPED_BASE a variable in the /proc filesystem that you can adjust on the fly so you do not have to recompile the kernel.

Finally, Mark commented that is a patch available that eliminates the kernel's 1 GB area entirely. This is the 4G:4G patch from the well-known kernel coder, Al Viro. The patch allows a full separate 4GB VM for the kernel and separate, full (and per-process) 4GB VM's for user applications. However, the patch can come at a price because it can impact kernel performance.

Roland Krause replied back to the list that he modified his kernel changing the value of the C variable (moving it down) so that he could allocated enough memory for this application.

This short discussion provides some valuable insight into how memory is allocated. Even though this isn't necessarily cluster specific, it does help you tune the nodes in your cluster for your application. It also points out how convenient it is to have the source code for the kernel. It allows you to adjust certain kernel parameters to match your needs. If anyone has any comments on kernel 2.6 memory management, please add a comment below. A good summary of Linux Memory can be found here.

{mosgoogle right}

Sidebar One: Links Mentioned in Column

Beowulf Mailing List

Scali MPI Connect

Channel Bonding

LAN switches

Linux Memory Management

STREAM benchmark

This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Jeff Layton has been a cluster enthusiast since 1997 and spends far too much time reading mailing lists. He occasionally finds time to perform experiments on clusters in his basement. He also has a Ph. D. in Aeronautical and Astronautical Engineering and he's not afraid to use it.

    Search

    Login And Newsletter

    Create an account to access exclusive content, comment on articles, and receive our newsletters.

    Feedburner

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.