beowulf in space

Gerry Creager N5JXS gerry.creager at
Wed Apr 16 20:59:33 EDT 2003

We can consider a 10e2 (or more) cost multiplier for space-qualified 
hardware, excluding the design work someone like Harris does to 
radiation harden the processors... and memory... and glue-logic.  Intel 
doesn't tend to make space-qualified hardware, or rad-hard hardware. 
They license that out to Harris and some of the research labs.

Now: Using industrial-grade devices is more cost effective, and loses 
some of the paperwork burden (the 2 are tied intimately).  But nothing's 
  been done about radiation hardening.  Which is an issue.

Let's talk about radiation hardening and single-event upsets.

Radiation hardening refers, generally to resistance to the effects of 
transient bit resets due to hits by heavy particles.  (Is that the sound 
of RGB winding up?)  Transient bit flips are one thing: You have to do 
error detection (and correction?) but the device recovers.  In 
spacecraft memory, one runs almost continuous housecleaning code to 
detect permanent holes and remap the memory around them.  This is a very 
important aspect of planning.  If we're talking about losing enough 
cycles to housecleaning to drag our processing power down, are we really 
gaining much in "flying" a cluster?

Ah, yes... speed.  It's generally accepted in building flight 
processors, that the faster they go, the easier they are to upset.  Thus 
, that 3GHz Pentium.... Oh.  Sorry.  The 2.4GHz (non-vaporware) device 
is significantly more prone to SEU than the Pentium I/166.

Trace/mask sizing makes a difference.  The finer the lines, the more 
prone to failure.  So, once again, the old stuff (especially CMOS) 
outlasts the new x-ray lithography chips.

OKAY.  Pretty pessimistic.  The real world of space-qualified processors 
_IS_ conservative, as changing a CPU requires a service call of a couple 
of hundred miles (vertical) plus the delta-v and guidance to manage to 
match orbits... So you get your industrial-grade devices, burn 'em in on 
the ground, in higher-than-expected temperatures ("accelerated life 
testing") and qualify your systems that way.  You review the literature 
(Sandia National Labs has some great stuff) and decide the break-points 
for memory, processor and bus speeds.  Overclocking is _right_out_.  You 
design your spaceframe to accommodate adequate cooling (remember those 
heat-pipes for the new processors?  Ever wonder where the technology 
came from?  Thank the USAF.)  You add some layered polyethylene and gold 
layers to improve hardening, and you rewrite your code to accomplish 
memory and (processor) register housecleaning.

It's not impossible but it's not quite the same as building a 256 node 
COTS cluster, either.


Jim Lux wrote:
> A
>> > There's also a non-negligble cost of having more items on the "bill of
>> > materials": each different kind of part needs drawings, 
>> documentation, test
>> > procedures, etc., a lot of which is what makes space stuff so expensive
>> > compared to the commercial parts (for which the primary cost driver 
>> is that
>> > of sand (raw materials) and marketing) so again, systems comprised 
>> of many
>> > identical parts have advantages.
>> Hmmm, so the primary cost determinant of VLSIC's is the cost of sand...?
>> Verrry Eeenteresting...
>> Now marketing, that I'd believe;-)
> Say it costs a billion dollars to set up the fab (which can be spread 
> over 2-3 years, probably), and maybe another half billion to design the 
> processor (I don't know... 2500 work years seems like a lot, but...?)... 
> How many Pentiums does Intel make? It's kind of hard to figure out just 
> how many chips Intel makes in a given time (such being a critical aspect 
> of their profitibility), but...
> consider that Intel Revenue for 2002 was about $27B....
> As for marketing... in an article about P4s from April of 2001:
> Intel has told news sources that it plans to spend roughly $500 million 
> to promote the new technology among software makers, and another $300 
> million on general advertising.
> Such enormous volumes are why commodity computing even works..The NRE 
> for truly high performance computing devices is spread over so many 
> units...
> _______________________________________________
> Beowulf mailing list, Beowulf at
> To change your subscription (digest mode or unsubscribe) visit 

Gerry Creager -- gerry.creager at
Network Engineering -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Page: 979.228.0173
Office: 903A Eller Bldg, TAMU, College Station, TX 77843

Beowulf mailing list, Beowulf at
To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list