Difference between revisions of "Memory"

From Cluster Documentation Project
Jump to: navigation, search
(swiotlb)
 
Line 1: Line 1:
'''Error Detection And Correction'''
+
==Error Detection And Correction==
  
 
A cluster of systems with large amounts of RAM provides system integrators and administrators with an opportunity to become familiar with [http://en.wikipedia.org/wiki/Soft_error Soft Errors].
 
A cluster of systems with large amounts of RAM provides system integrators and administrators with an opportunity to become familiar with [http://en.wikipedia.org/wiki/Soft_error Soft Errors].
  
According to arch/*/kernel/mce.c and arch/*/kernel/traps.c, Linux kernels older than 2.6.16 will either see an uncorrectable bit error as a Machine Check Exception (MCE), print out a message with the DIMM bank, and panic; or as an NMI and continue on with a "Dazed" message.  An NMI would be seen if MCE panic was disabled with the mce=off boot parameter.
+
According to arch/*/kernel/mce.c and arch/*/kernel/traps.c, Linux kernels older than 2.6.16 will either see an uncorrectable bit error as a Machine Check Exception (MCE), print out a message with the DIMM bank, and panic; or as a Non-Maskable Interrupt (NMI) and continue on with a "Dazed" message.  An NMI would be seen if MCE panic was disabled with the mce=off kernel boot parameter.
  
 
There are new capabilites beginning with the 2.6.16 kernel.  The code from the [http://bluesmoke.sourceforge.net EDAC] project was merged into the kernel as optional modules.  The modules provide counters for correctable and uncorrectable errors, the ability to reset counters through sysfs, a reset counter - seconds since last reset, etc.
 
There are new capabilites beginning with the 2.6.16 kernel.  The code from the [http://bluesmoke.sourceforge.net EDAC] project was merged into the kernel as optional modules.  The modules provide counters for correctable and uncorrectable errors, the ability to reset counters through sysfs, a reset counter - seconds since last reset, etc.
  
  * [http://lwn.net/Articles/168972/ short LWN EDAC writeup]
+
* [http://lwn.net/Articles/168972/ short LWN EDAC writeup]
  * [http://lwn.net/Articles/168975/ edac.txt from the 2.6.16 kernel docs]
+
* [http://lwn.net/Articles/168975/ edac.txt from the 2.6.16 kernel docs]
  
 
The Linux EDAC modules support the following memory controllers:
 
The Linux EDAC modules support the following memory controllers:
  
* AMD 76x
+
* AMD 76x
* Intel e752x
+
* Intel e752x
* Intel e7xxx
+
* Intel e7xxx
* Intel 82860
+
* Intel 82860
* Intel D82875P
+
* Intel D82875P
* Radisys 82600
+
* Radisys 82600
  
 
   
 
   
'''I/O on 64-bit Systems With Large Amount of Memory'''
+
==I/O on 64-bit Systems With Large Amounts of Memory==
  
Part of the transition from 32-bit x86 to x86_64 with large amounts of memory involves handling I/O devices that only support 32-bit memory addresses.  The AMD chipsets include a hardware IOMMU that makes everything work transparently.  Intel EM64T (and IA64) chipsets do not include an IOMMU, so the Linux kernel implements a "software I/O translation buffer".  The memory allocated to the swiotlb is made unavailable to normal processes, and some devices (such a NVIDIA graphic cards) may require more memory to be reserved in order to operate reliably.  See the Linux kernel's documentation for information about the swiotlb boot parameterThis paragraph is a short summary of part of an [http://lwn.net/Articles/91870/ LWN DMA article] by Jonathan Corbet.
+
Part of the transition from 32-bit x86 to x86_64 with large amounts of memory involves handling I/O devices that only support 32-bit memory addresses.  AMD products include a hardware IOMMU that makes everything work transparently, for the most part.  Intel EM64T and IA64 products do not include an IOMMU, so the Linux kernel implements a "software I/O translation buffer".  The memory allocated to the swiotlb is made unavailable to normal processes, and some device drivers (such as the proprietary [http://download.nvidia.com/XFree86/Linux-x86_64/1.0-8178/README/appendix-l.html NVIDIA graphics driver]) may require more memory to be reserved in order to operate reliably.  See the Linux kernel's documentation for information about the swiotlb and iommu boot parametersMuch of the information summarized in this paragraph was learned from an [http://lwn.net/Articles/91870/ LWN DMA article] by Jonathan Corbet.

Latest revision as of 06:48, 21 February 2006

Error Detection And Correction[edit]

A cluster of systems with large amounts of RAM provides system integrators and administrators with an opportunity to become familiar with Soft Errors.

According to arch/*/kernel/mce.c and arch/*/kernel/traps.c, Linux kernels older than 2.6.16 will either see an uncorrectable bit error as a Machine Check Exception (MCE), print out a message with the DIMM bank, and panic; or as a Non-Maskable Interrupt (NMI) and continue on with a "Dazed" message. An NMI would be seen if MCE panic was disabled with the mce=off kernel boot parameter.

There are new capabilites beginning with the 2.6.16 kernel. The code from the EDAC project was merged into the kernel as optional modules. The modules provide counters for correctable and uncorrectable errors, the ability to reset counters through sysfs, a reset counter - seconds since last reset, etc.

The Linux EDAC modules support the following memory controllers:

  • AMD 76x
  • Intel e752x
  • Intel e7xxx
  • Intel 82860
  • Intel D82875P
  • Radisys 82600


I/O on 64-bit Systems With Large Amounts of Memory[edit]

Part of the transition from 32-bit x86 to x86_64 with large amounts of memory involves handling I/O devices that only support 32-bit memory addresses. AMD products include a hardware IOMMU that makes everything work transparently, for the most part. Intel EM64T and IA64 products do not include an IOMMU, so the Linux kernel implements a "software I/O translation buffer". The memory allocated to the swiotlb is made unavailable to normal processes, and some device drivers (such as the proprietary NVIDIA graphics driver) may require more memory to be reserved in order to operate reliably. See the Linux kernel's documentation for information about the swiotlb and iommu boot parameters. Much of the information summarized in this paragraph was learned from an LWN DMA article by Jonathan Corbet.