memory nightmare

Jack Wathey wathey at salk.edu
Wed Jul 2 14:31:10 EDT 2003



On Wed, 2 Jul 2003, Donald Becker wrote:

>
> My immediate reaction is that you have a motherboard that has memory
> configuration restrictions.  A typical restriction is that can only use
> two DIMMs when they are "double sided" (with two memory chips per signal
> line instead of one) or have larger-capacity memory chips.

I'll look into that. I doubt this is the problem, though, because last
December I got a batch of 30 1-gig sticks from the same vendor that pass
memtest86 just fine in batches of 3 per board, on the very same
motherboards.  The batch from December used Nanya chips and were
high-profile.  The latest batch are Samsung low-profile.  I don't know if
these are "double-sided" or not.  The only restriction I know of, from the
motherboard manual, is that the memory must be "registered ECC ddr", which
these are.  Also, most of the failing sticks I've seen fail when tested
one stick per board.

>
> My second reaction is that you are running the chips too fast for ECC,
> either because the serial EEPROM has been reprogrammed to claim that the
> chips are faster or the BIOS settings have been tweaked.  Remember than
> a ECC memory system is slower than the same chips without ECC!

ECC was turned off during the memtest86 runs.  I'm using the default bios
settings for memory timing parameters.

>
> > In the bios for my GA7DPXDW-P motherboards, there are these 4
> > alternatives for the SDRAM ECC Setting:
> >
> >     Disabled
> >     Check only
>
>    As the memory read is happening, start checking the data.  If the check
>    fails, interrupt later.
>
> >     Correct Errors
>
>    When the memory read is started, check the data.  Hold the result
>    until the check passes or the data is corrected.
>
> >     Correct + scrub
>
>    Correct read data as above, holding the transaction and writing
>    corrected data back to the DIMM if an error is found.
>
> > I'm pretty sure I understand what 'Disabled' does.  Can anyone
> > explain to me what the others do, and how they differ?  Also, if ECC
> > correction is enabled, does this slow down the machine in any way?
>
> Yes.  The typical cost is one clock cycle of read latency.
> It might seem obviously easy to overlap the ECC check when it usually
> passes, but you can't really hide all of the cost.  The memory-read path is
> always latency-critical.

Thanks, Don!  That helps a lot.

Best wishes,
Jack

>
> --
> Donald Becker				becker at scyld.com
> Scyld Computing Corporation		http://www.scyld.com
> 914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
> Annapolis MD 21403			410-990-9993
>

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list