AMD Opteron memory bandwidth (was Re: CPUs for a Beowulf)

Donald Becker becker at
Tue Sep 9 22:20:04 EDT 2003

On Tue, 9 Sep 2003, Greg Lindahl wrote:

> While we're reinventing the T3E, it's worth noting:
> 1) HT's current generation doesn't have ECC, it just has a CRC-check
> and crashes when the check fails.

I think that it's more precisely a longitudinal check.  "CRC" usually
means a polynomial calculation over all of the bits, where any bit flip
might change any bit in the final check word.  A longitudinal-only check
means that a bit flip only impacts the check word in that bit position.

Longitudinal checks are much easier to implement in very high speed
systems because you don't have to handle data skew combined with
different length logic paths.  But they catch fewer errors precisely
because they are easier to implement -- they don't combine as many
source bits into the result.

> That's kind-of OK for small systems,
> but doesn't scale.

Errrm, I have exactly the opposite viewpoint: ECC will fail to catch and
correct most multibit errors, and most HT errors will be multibit.
It's better to fail on corruption than to silently further corrupt.

> 2) AMD's cache coherence protocol is snoopy, which doesn't scale.

That's a point often missed: in order to implement a HT switch for a SMP
system, you need to implement something like a cache directory.

> I'd never heard the rumor about HT being the EV7+ bus (and I'd really
> doubt it due to (1)), but I do know that AMD only bought the API
> Networks technical team, not their technology. Digital/Compaq did have
> a long-term technical collaboration with Samsung and AMD regarding the
> EV6 bus.

Remeber the goal of a single commodity motherboard supporting either an
AMD x86 chip or an Alpha?  Look where we are today...

