Article Index

Figure One: Memtester Output (one pass)

Hard Data on Soft Errors

The anecdotal evidence offered above is based on personal accounts rather than facts or research. With this in mind, I periodically run memory testing programs for extended periods of time to see just how sturdy non-ECC memory can be.

In my latest test, I ran memtester for 57 days (24x7) on 24 GBytes of DDR4-2667 non-ECC memory (Micron). Memtester runs 18 separate tests over the memory (see Figure One). The system employs a 6-core Intel i7-8700 @ 3.20GHz and performed a total of 1296 passes through memory without a single error.

For the record, the ambient temperature was a constant 72 F (22 C) and according to Google maps, the elevation is 328 ft. (100 m.) above sea level. Another piece of personal anecdotal evidence is the lack of any kind of issue using a cell phone, listening to music in airplane mode, or using a laptop while flying at high altitude in commercial aircraft.

The non-ECC test was stopped because the system was needed for other benchmarking. A similar test was performed on a system with ECC memory using a 6-core Xeon E-2176 @3.7GHz with DDR4-2667 ECC memory. This test was stopped after 41 days. There were no recorded events in the mcelog during the test (i.e. the ECC memory did not correct any errors).

An additional data point for memory hard errors can be found on the Puget Systems website. This annual survey of hardware failures is quite valuable. The page reports very low "in-field" hard failure rates for both DDR4 non-ECC (1 in every 1000 DIMM modules failed) and DDR4 ECC (1 in every 5000 DIMM modules failed). These are hard errors that required replacement of the DIMM module.

Somewhat Contentious Guidance

Based on both anecdotal and extended tests, the use of non-ECC memory should not be discouraged when an ECC solution is not available. When building and designing Edge systems, many options may need to be considered due to space, power, and noise requirements. Non-ECC consumer level processors like the new Raspberry Pi 4, the Intel Core series, and the AMD Ryzen series may offer the best Edge option. (Note: Some Ryzen processors and motherboards allow an un-supported ECC option). It should also be pointed out that there are several best practices that can provide a high level of confidence when working with non-ECC memory.

Best Practices for all Types of Memory

Living with non-ECC can also be made safer with some basic good memory practices, which by the way, applies to ECC memory systems as well. It is worth noting that systems faults and failed applications can come from a variety of sources including random software bugs.
  • Write periodic check points if running long HPC applications, which is always a good practice in any case.
  • Include a sanity check on intermediate results.
  • Run important applications at least twice to confirm the results.
  • Run a memory tester during down times (memtester is freely available) every so often.
  • Buy quality hardware (motherboards and memory) or use a reputable vendor.

A Conclusion for Computing Purists

Always use ECC memory. Good advice.

A Conclusion for Computing Pragmatists

Non-ECC memory is a viable option for designing many types of systems. In particular Edge based systems, with a somewhat more restrictive environmental envelope, may require the consideration of high quality, non-enterprise grade hardware. As stated in the beginning, ECC is always preferred, but there are times when it may limit choices and performance options. As with many things in the computer-verse, your mileage and choices will vary depending upon your needs.

You have no rights to post comments

Search

Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.

Feedburner


This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.