Frequency of upsets was Re: [Beowulf] ECC support on motherboards?

Jim Lux James.P.Lux at jpl.nasa.gov
Wed May 14 12:37:38 EDT 2008


At 03:38 PM 5/13/2008, Greg Lindahl wrote:
>On Tue, May 13, 2008 at 03:27:11PM -0700, Jim Lux wrote:
>
> > Some data from Fermilab with 160 Gbit of DRAM
> > showed 2.5 upset/day.  Extrapolating (always
> > dangerous with these kinds of radiation effects
> > data, but I'll plunge in regardless).. that means
> > a workstation with 4-8 Gbyte of DRAM might see an upset per day.
>
>You can't extrapolate to devices of a different density or made
>with a different process, right?


You can and you can't.

In general, you are combining the overall flux through the device 
against the cross-section of the devices.  So, if you make the device 
with half sized geometry, you get 4 times as many bits in the same 
sized die. The odds of that particle hitting a specific bit has been 
cut by 1/4, but there's 4 times as many.  So, the "upsets/device/unit 
time" will probably stay about the same.

But there's other factors too... smaller geometries mean more devices 
might get affected in one event.

Different geometries have different sensitivities to particles of a 
particular energy.  (Consider the neutrino.. lots o' energy, small 
cross section for interactions)  Big slow heavy ions are very 
different than zippy little protons.

However, if you're looking at rough order of magnitudes, and the year 
of technology is similar, extrapolating is safe(r); i.e. everything 
built from 2002 technology parts tends to have similar technologies 
and feature sizes.  Be aware that in the space biz, we build stuff 
from old parts all the time.  For instance, the Phoenix spacecraft 
that will land on Mars next week was actually a spare from a 2001 
mission, but in turn, was actually spares from the 1998 missions.  So 
if you see a paper in, say, 2010, talking about the upset behavior of 
the Phoenix flight computer, you're talking about parts that were 
probably bought in 1995, and based on technology that was matured in 
1991 or 1992.

(Here at JPL, we keep those old databooks around.. hiding them from 
the office neatness police, of course: "why do you need those dusty 
old books, everything is on line, isn't it?"  Uh, no, not for parts 
made in 1985, so we keep that ancient National Semiconductor databook 
printed on the grubby newsprint that is decaying as you read 
this)  My 1977 National Semiconductor CMOS Databook, with all the 
data for the CD4000 and 74C series logic is invaluable, 
nothwithstanding that it was printed before many of the engineers 
here were born.  The old 4000 series CMOS is quite radiation tough 
(giant feature sizes!), and, although ESD sensitive, can tolerate 
huge voltage ranges.  And, they still make it... probably some guy 
with an old 3" fab line in a warehouse or something..

Jim 


_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

!DSPAM:482b16a4153802071360113!



More information about the Beowulf mailing list