beowulf in space

Art Edwards edwardsa at plk.af.mil
Thu Apr 17 16:54:23 EDT 2003


On Thu, Apr 17, 2003 at 05:14:15PM -0400, Robert G. Brown wrote:
> On Thu, 17 Apr 2003, Art Edwards wrote:
> 
> > Just so you don't think  that the space program is run by a bunch of
> > out-of-the-loop dopes, we have been doing clustering, althought these
> > are by no means beowulfs. I sent a message to one of the brightest
> > architectural designers, who is in our branch, and I paste his reply.
> > Please copy to him any posts/responses to this.
> 
> Who could possibly think that, given that the first beowulf was built
> and named by a NASA program and that CESDIS for years housed both the
> list and beowulf.org (with only one short hiatus when high level program
> overseers became inexplicably stricken with some sort of mental
> disease:-)?
> 
> However, this TOO looks very cool, sort of at the edge of the possible
> with clustering technology altogether.
> 
> It is obvious that clusters are indeed making their way into space, or
> will be soon.
> 
> I won't even ask what a "Sensor and Fusion Engine" might be -- it would
> be too much to hope that it would be a thermonuclear fusion engine that
> cannot AFAIK exist with current technology, existing anyway and
> preparing to really change the way we do space..:-)
In this case fusion just refers to data-fusion from sensors. Data
integration and processing might capture what is meant by fusion. Signal
processing is a biggee for the Air Force.



Art Edwards
> 
>     rgb
> 
> > 
> > >From Jim Lyke
> > 
> > Pretty cool.  Sure, there has been publications on SAFE, and I have
> > submitted a longer paper for publication.
> > 
> > Sensor and Fusion Engine (SAFE) in its best case is 96 processors,
> > broken
> > into 12 bussed groups (the bus a customized Futurebus+) with a Myrinet
> > bridge.  The system is small enough in scale to be serviced by a single,
> > 16-port duplex non-blocking Myrinet crossbar.  So, 12 of the hubs are
> > occupied with the 96 processors, which are of a special design
> > (microprogrammable with IEEE 754(?) double-precision floating point
> > support).  Two of the remaining four hubs are equipped with FPGA-based
> > front-end processors, to massage real-time sensor data into the packeted
> > formats amenable to the 96-nodes.  One of the remaining two hubs is
> > occupied
> > by a boot processor, which distributes program loads over the network
> > and
> > kicks off processor groups.  The final port is a user/telemetry port,
> > which
> > could be a simple Linux box equipped with a Myrinet card.
> > 
> > Everything above (except Linux box) is designed to be crammed into a
> > conduction-cooled 5x5x8 inch parallelopiped container.  The Myrinet
> > protocol
> > was gutted and replaced with a lower latency protocol with a one sigma
> > latency of about 2uS on messages based on the statistics of our problem.
> > The max sustainable peak is about 12 GFLOPS, which is because the chips
> > were
> > built on 0.5um.  The theoretic density of the system (even so) is
> > slightly
> > over one TFLOP/cu.ft.  We are moving forward to modernize the system,
> > but
> > are funding limited.  The ultimate barrier will be thermal.  Even though
> > we
> > use carbon-matrix composite materials that have 5X better heat
> > conduction
> > than aluminum, the ultimate power densities as we encroach on
> > >10TFLOPS/cu.ft. will overtake the ability of that material to draw heat
> > away.  There is discussion of trying to create a new type of thermal
> > management material based on either carbon or boron nanotubes, which are
> > claimed to beat natural diamond by about 2X.
> > 
> > I wouldn't mind being copied on the posts/replies either.
> > 
> > END OF LYKE
> > 
> > Art Edwards
> > 
> > On Thu, Apr 17, 2003 at 08:14:38AM -0400, Robert G. Brown wrote:
> > > On Wed, 16 Apr 2003, Art Edwards wrote:
> > > 
> > > > I think I'm jumping into the middle of a conversation here, but our
> > > > branch is the shop through which most of the DoD processor programs are
> > > > managed. For real space applications there are radiation issues like
> > > > total dose hardness and single even upset that require special design
> > > > and, still, special processing. That is, you can't make these parts at
> > > > any foundry (yet). There are currently two hardened foundries through
> > > > which the most tolerant parts  are fabricated. Where the commercial
> > > > market is ~100's of Billions/year, the space electronics industry is
> > > > ~200million/year. So parts are expensive, as Jim Lux says. But more
> > > > importantly, the current state-of-the-art for space processors is
> > > > several generations back. Now, with a 200 million market/year, who is
> > > > going to spend the money to build a new foundry? (anyone?) It's a huge
> > > > problem, and beowulfs in space will not give the economies of scale
> > > > necessary to move us forward. 
> > > > 
> > > > I don't know if this has been discussed here, but have you thought about
> > > > launch costs? They're huge. Weight, power, and mission lifetime are the 
> > > > crucial factors for space. These are the reasons that so much R&D goes
> > > > into space electronics. I apologize if I have gone over old ground.
> > > 
> > > Actually, this is the sort of thing that makes (as Eray pointed out) the
> > > idea of a cluster (leaving aside the COTS issue, the single-headed
> > > issue, and whether or not it could be a true "beowulf" cluster)
> > > attractive in space applications.  What you (and Gerry) are saying is
> > > that the space and DoD market is stuck using specially engineered,
> > > radiation hard, not-so-bleeding-VLSI processors from what amounts to
> > > several VLSI generations ago.  The parts are expensive, but the cost of
> > > building a newer better foundry for such a small and inelastic market
> > > are prohibitive, so they are the only game in town.
> > > 
> > > If you have an orbital project or application that needs considerably
> > > more speed than the undoubtedly pedestrian clock of these devices can
> > > provide, you have a HUGE cost barrier to developing a faster processor,
> > > and that barrier is largely out of your (DoD) or Nasa's control -- you
> > > can only ask/hope for an industrial partner to make the investment
> > > required to up the chip generation in hardened technology with the
> > > promise of at least some guaranteed sales.  You also have a known per
> > > kilogram per liter cost for lifting stuff into space, and this is at
> > > least modestly under your own control.  So (presuming an efficiently
> > > parallelizable task) instead of effectively financing a couple of
> > > billion dollars in developing the nextgen hard chips to get a speedup of
> > > ten or so, you can engineer twelve systems based on the current,
> > > relatively cheap chips into a robust and fault tolerant cluster and pay
> > > the known immediate costs of lifting those twelve systems into orbit.
> > > 
> > > Again presuming that it is for some reason not feasible to simply
> > > establish a link to earth and do the processing here -- an application
> > > for which the latency would be bad, an application that requires
> > > immediate response in a changing environment when downlink
> > > communications may not be robust.
> > > 
> > > A question that you or Gerry or Jim may or may not be able to answer
> > > (with which Chip started this discussion):  Are there any specific
> > > non-classified instances that you know of where an actual "cluster"
> > > (defined loosely as multiple identical CPUs interconnected with some
> > > sort of communications bus or network and running a specific parallel
> > > numerical task, not e.g.  task-specific processors in several parts of a
> > > military jet) has been engineered, built, and shot into space?
> > > 
> > > This has been interesting enough that if there are any, I may indeed add
> > > a chapter to the book, if/when I next actually work on it.  I got dem
> > > end of semester blues, at the moment...:-)
> > > 
> > >   rgb
> > > 
> > > -- 
> > > Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> > > Duke University Dept. of Physics, Box 90305
> > > Durham, N.C. 27708-0305
> > > Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> 
> -- 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 

-- 
Art Edwards
Senior Research Physicist
Air Force Research Laboratory
Electronics Foundations Branch
KAFB, New Mexico

(505) 853-6042 (v)
(505) 846-2290 (f)
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list