beowulf in space
Gerry Creager N5JXS
gerry.creager at tamu.edu
Wed Apr 16 20:59:33 EDT 2003
We can consider a 10e2 (or more) cost multiplier for space-qualified
hardware, excluding the design work someone like Harris does to
radiation harden the processors... and memory... and glue-logic. Intel
doesn't tend to make space-qualified hardware, or rad-hard hardware.
They license that out to Harris and some of the research labs.
Now: Using industrial-grade devices is more cost effective, and loses
some of the paperwork burden (the 2 are tied intimately). But nothing's
been done about radiation hardening. Which is an issue.
Let's talk about radiation hardening and single-event upsets.
Radiation hardening refers, generally to resistance to the effects of
transient bit resets due to hits by heavy particles. (Is that the sound
of RGB winding up?) Transient bit flips are one thing: You have to do
error detection (and correction?) but the device recovers. In
spacecraft memory, one runs almost continuous housecleaning code to
detect permanent holes and remap the memory around them. This is a very
important aspect of planning. If we're talking about losing enough
cycles to housecleaning to drag our processing power down, are we really
gaining much in "flying" a cluster?
Ah, yes... speed. It's generally accepted in building flight
processors, that the faster they go, the easier they are to upset. Thus
, that 3GHz Pentium.... Oh. Sorry. The 2.4GHz (non-vaporware) device
is significantly more prone to SEU than the Pentium I/166.
Trace/mask sizing makes a difference. The finer the lines, the more
prone to failure. So, once again, the old stuff (especially CMOS)
outlasts the new x-ray lithography chips.
OKAY. Pretty pessimistic. The real world of space-qualified processors
_IS_ conservative, as changing a CPU requires a service call of a couple
of hundred miles (vertical) plus the delta-v and guidance to manage to
match orbits... So you get your industrial-grade devices, burn 'em in on
the ground, in higher-than-expected temperatures ("accelerated life
testing") and qualify your systems that way. You review the literature
(Sandia National Labs has some great stuff) and decide the break-points
for memory, processor and bus speeds. Overclocking is _right_out_. You
design your spaceframe to accommodate adequate cooling (remember those
heat-pipes for the new processors? Ever wonder where the technology
came from? Thank the USAF.) You add some layered polyethylene and gold
layers to improve hardening, and you rewrite your code to accomplish
memory and (processor) register housecleaning.
It's not impossible but it's not quite the same as building a 256 node
COTS cluster, either.
Jim Lux wrote:
>> > There's also a non-negligble cost of having more items on the "bill of
>> > materials": each different kind of part needs drawings,
>> documentation, test
>> > procedures, etc., a lot of which is what makes space stuff so expensive
>> > compared to the commercial parts (for which the primary cost driver
>> is that
>> > of sand (raw materials) and marketing) so again, systems comprised
>> of many
>> > identical parts have advantages.
>> Hmmm, so the primary cost determinant of VLSIC's is the cost of sand...?
>> Verrry Eeenteresting...
>> Now marketing, that I'd believe;-)
> Say it costs a billion dollars to set up the fab (which can be spread
> over 2-3 years, probably), and maybe another half billion to design the
> processor (I don't know... 2500 work years seems like a lot, but...?)...
> How many Pentiums does Intel make? It's kind of hard to figure out just
> how many chips Intel makes in a given time (such being a critical aspect
> of their profitibility), but...
> consider that Intel Revenue for 2002 was about $27B....
> As for marketing... in an article about P4s from April of 2001:
> Intel has told news sources that it plans to spend roughly $500 million
> to promote the new technology among software makers, and another $300
> million on general advertising.
> Such enormous volumes are why commodity computing even works..The NRE
> for truly high performance computing devices is spread over so many
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.847.8578
Office: 903A Eller Bldg, TAMU, College Station, TX 77843
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf