memory nightmare

Stephen Gaudet sgaudet at wildopensource.com
Wed Jul 2 13:26:29 EDT 2003


Hello Jack,


<snip>

> So here is the problem:  I have these 4 batches, of 3 sticks each,
> which failed memtest86 when tested in batches of 3.  The failures did
> not occur on each pass of memtest's 16 tests.  Instead the sticks would
> pass all of the tests for several passes.  In one case the failure
> did not occur until after memtest86 had been running, without error,
> for 42 hours on that machine.  That particular failure was in a single
> word in test 6.  The worst of the 4 batches failed at 14 memory
> locations.  I have now been testing 9 of these 12 suspect sticks,
> 1 stick per motherboard, for several days.  Several have now passed
> more than 100 hours of memtest86 without error.
> 
> Can I trust them?
> 
> Should I keep them or return them?
> 
> If I return them, how long must I run memtest86 on the replacements
> before I can trust those?
> 
> Can I trust the 55 or so sticks that passed 48 hours of memtest86 in
> batches of 3?
> 
> The vendor has been making a good-faith effort to solve the problem,
> and has even agreed to refund my money for the whole purchase if I'm
> not happy with it.
> 
> What would you do in this situation?

First, I'd make sure the memory comes from a major supplier, Kingston, 
Crucial, Virtium, Ventura, Transend, etc...

Next, make sure all the ram has the same chipset Samsung, Infineon, 
etc...  If you have various sticks in these systems where the chip 
manufacture is different they sometime don't behave well.  So try to 
make everything match.

Last I check cooling.  Do these systems have proper cooling?



> Those are the most urgent questions for which I need answers, but I
> have a few others of a more general nature:
> 
> Is there a specific vendor or brand of memory that is much more
> reliable than others?  Since the above-described ordeal, I've heard
> that Kingston has a good reputation.  Anyone care to endorse or
> refute that?  Any other good brands/vendors you care to mention?

See above.  I personally never buy ram unless it's on Intel's approved 
list and comes with a lifetime warranty.  I realize this is an AMD 
solution.  However, anyone that is approved by Intel in most cases is a 
real supplier with technical depth and could of helped with this problem.

When I had strange problems like this in the past with various systems,
Virtium, Ventura and others took a system into their lab in order to
fix the problem.


> My understanding is that ECC can correct only single-bit errors, and
> so would not help with the kind of multibit errors that have been
> troubling me lately.  But I have some basic questions on ECC that
> you might be able to answer (I've asked the motherboard maker's tech
> support, but to no avail!):
> 
> In the bios for my GA7DPXDW-P motherboards, there are these 4
> alternatives for the SDRAM ECC Setting:
> 
>     Disabled
>     Check only
>     Correct Errors
>     Correct + scrub
> 
> I'm pretty sure I understand what 'Disabled' does.  Can anyone
> explain to me what the others do, and how they differ?  Also, if ECC
> correction is enabled, does this slow down the machine in any way?
> Is there any disadvantage to having ECC correction enabled?

What's the motherboard manufacture call for?

Cheers, and Happy 4th of July,

Steve Gaudet

Wild Open Source (home office)
----------------------
Bedford, NH 03110
pH:603-488-1599
cell:603-498-1600
http://www.wildopensource.com


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list