AMD Opteron memory bandwidth (was Re: CPUs for a Beowulf)

Greg Lindahl lindahl at keyresearch.com
Wed Sep 10 03:01:34 EDT 2003


On Tue, Sep 09, 2003 at 10:20:04PM -0400, Donald Becker wrote:

> > That's kind-of OK for small systems,
> > but doesn't scale.
> 
> Errrm, I have exactly the opposite viewpoint: ECC will fail to catch and
> correct most multibit errors, and most HT errors will be multibit.
> It's better to fail on corruption than to silently further corrupt.

Well, I said "kind of OK" because you won't notice the failures on
small systems until you have a bunch of them. That's not really
acceptable, but then again lots of people seem willing to bet their
data on systems where they haven't really thought about errors at all.

I was speaking loosely when I said ECC. The CRC being used will (in
theory) catch most multi-bit errors, but it's always scary to not know
the pattern of the errors when you chose the CRC. The system does
crash quickly enough that it's unlikely that your bad data makes it
onto a network or disk.

I get a kick out of asking link layer people, "So, what bad packets
have you observed in the wild?" They give me the blankest looks...

> Remeber the goal of a single commodity motherboard supporting either an
> AMD x86 chip or an Alpha?  Look where we are today...

It was a great idea until the mechnical problems nuked it: Slot A
vs. Socket A. I was not impressed by the reliability of Slot A.
Sometimes it's the little things that cause the biggest problems.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list