beowulf in space

Art Edwards edwardsa at plk.af.mil
Thu Apr 17 16:03:21 EDT 2003


Just so you don't think  that the space program is run by a bunch of
out-of-the-loop dopes, we have been doing clustering, althought these
are by no means beowulfs. I sent a message to one of the brightest
architectural designers, who is in our branch, and I paste his reply.
Please copy to him any posts/responses to this.

>From Jim Lyke

Pretty cool.  Sure, there has been publications on SAFE, and I have
submitted a longer paper for publication.

Sensor and Fusion Engine (SAFE) in its best case is 96 processors,
broken
into 12 bussed groups (the bus a customized Futurebus+) with a Myrinet
bridge.  The system is small enough in scale to be serviced by a single,
16-port duplex non-blocking Myrinet crossbar.  So, 12 of the hubs are
occupied with the 96 processors, which are of a special design
(microprogrammable with IEEE 754(?) double-precision floating point
support).  Two of the remaining four hubs are equipped with FPGA-based
front-end processors, to massage real-time sensor data into the packeted
formats amenable to the 96-nodes.  One of the remaining two hubs is
occupied
by a boot processor, which distributes program loads over the network
and
kicks off processor groups.  The final port is a user/telemetry port,
which
could be a simple Linux box equipped with a Myrinet card.

Everything above (except Linux box) is designed to be crammed into a
conduction-cooled 5x5x8 inch parallelopiped container.  The Myrinet
protocol
was gutted and replaced with a lower latency protocol with a one sigma
latency of about 2uS on messages based on the statistics of our problem.
The max sustainable peak is about 12 GFLOPS, which is because the chips
were
built on 0.5um.  The theoretic density of the system (even so) is
slightly
over one TFLOP/cu.ft.  We are moving forward to modernize the system,
but
are funding limited.  The ultimate barrier will be thermal.  Even though
we
use carbon-matrix composite materials that have 5X better heat
conduction
than aluminum, the ultimate power densities as we encroach on
>10TFLOPS/cu.ft. will overtake the ability of that material to draw heat
away.  There is discussion of trying to create a new type of thermal
management material based on either carbon or boron nanotubes, which are
claimed to beat natural diamond by about 2X.

I wouldn't mind being copied on the posts/replies either.

END OF LYKE

Art Edwards

On Thu, Apr 17, 2003 at 08:14:38AM -0400, Robert G. Brown wrote:
> On Wed, 16 Apr 2003, Art Edwards wrote:
> 
> > I think I'm jumping into the middle of a conversation here, but our
> > branch is the shop through which most of the DoD processor programs are
> > managed. For real space applications there are radiation issues like
> > total dose hardness and single even upset that require special design
> > and, still, special processing. That is, you can't make these parts at
> > any foundry (yet). There are currently two hardened foundries through
> > which the most tolerant parts  are fabricated. Where the commercial
> > market is ~100's of Billions/year, the space electronics industry is
> > ~200million/year. So parts are expensive, as Jim Lux says. But more
> > importantly, the current state-of-the-art for space processors is
> > several generations back. Now, with a 200 million market/year, who is
> > going to spend the money to build a new foundry? (anyone?) It's a huge
> > problem, and beowulfs in space will not give the economies of scale
> > necessary to move us forward. 
> > 
> > I don't know if this has been discussed here, but have you thought about
> > launch costs? They're huge. Weight, power, and mission lifetime are the 
> > crucial factors for space. These are the reasons that so much R&D goes
> > into space electronics. I apologize if I have gone over old ground.
> 
> Actually, this is the sort of thing that makes (as Eray pointed out) the
> idea of a cluster (leaving aside the COTS issue, the single-headed
> issue, and whether or not it could be a true "beowulf" cluster)
> attractive in space applications.  What you (and Gerry) are saying is
> that the space and DoD market is stuck using specially engineered,
> radiation hard, not-so-bleeding-VLSI processors from what amounts to
> several VLSI generations ago.  The parts are expensive, but the cost of
> building a newer better foundry for such a small and inelastic market
> are prohibitive, so they are the only game in town.
> 
> If you have an orbital project or application that needs considerably
> more speed than the undoubtedly pedestrian clock of these devices can
> provide, you have a HUGE cost barrier to developing a faster processor,
> and that barrier is largely out of your (DoD) or Nasa's control -- you
> can only ask/hope for an industrial partner to make the investment
> required to up the chip generation in hardened technology with the
> promise of at least some guaranteed sales.  You also have a known per
> kilogram per liter cost for lifting stuff into space, and this is at
> least modestly under your own control.  So (presuming an efficiently
> parallelizable task) instead of effectively financing a couple of
> billion dollars in developing the nextgen hard chips to get a speedup of
> ten or so, you can engineer twelve systems based on the current,
> relatively cheap chips into a robust and fault tolerant cluster and pay
> the known immediate costs of lifting those twelve systems into orbit.
> 
> Again presuming that it is for some reason not feasible to simply
> establish a link to earth and do the processing here -- an application
> for which the latency would be bad, an application that requires
> immediate response in a changing environment when downlink
> communications may not be robust.
> 
> A question that you or Gerry or Jim may or may not be able to answer
> (with which Chip started this discussion):  Are there any specific
> non-classified instances that you know of where an actual "cluster"
> (defined loosely as multiple identical CPUs interconnected with some
> sort of communications bus or network and running a specific parallel
> numerical task, not e.g.  task-specific processors in several parts of a
> military jet) has been engineered, built, and shot into space?
> 
> This has been interesting enough that if there are any, I may indeed add
> a chapter to the book, if/when I next actually work on it.  I got dem
> end of semester blues, at the moment...:-)
> 
>   rgb
> 
> -- 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 
> 

-- 
Art Edwards
Senior Research Physicist
Air Force Research Laboratory
Electronics Foundations Branch
KAFB, New Mexico

(505) 853-6042 (v)
(505) 846-2290 (f)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list