D-Link switch and ecc-memory.

Serguei Patchkovskii patchkov at ucalgary.ca
Tue Jan 16 14:24:35 EST 2001

On Tue, 16 Jan 2001, Greg Lindahl wrote:
> > My best estimate is that our system corrects one single bit error (SBE)
> > per week in 37.5 GB of ECC memory.  This translates into SBE event
> > intervals of about 9 months per GB of RAM.  Your mileage may vary...
> Josip neglected to mention that he is at sea level. If you are at a higher
> altitude, you will see more errors.

Indeed. Here in Calgary (1 kilometer above the sea level), I count an average
of 50 corrected memory errors _per_day_ for 220 Gbytes of memory over the
last three months - or about fifty times the Josip's rate. This average 
excludes three systems with failing memory - which we hadn't got around to
replace yet. (These three have the error rate of about 30 times the median).

How much of the difference is due to an increase in cosmic radiation, and
how much is due to the differences in parts quality and system design,
I am not qualified to assess.



Home page: http://www.cobalt.chem.ucalgary.ca/ps/

Beowulf mailing list
Beowulf at beowulf.org

More information about the Beowulf mailing list