beowulf in space
Robert G. Brown
rgb at phy.duke.edu
Thu Apr 17 17:14:15 EDT 2003
On Thu, 17 Apr 2003, Art Edwards wrote:
> Just so you don't think that the space program is run by a bunch of
> out-of-the-loop dopes, we have been doing clustering, althought these
> are by no means beowulfs. I sent a message to one of the brightest
> architectural designers, who is in our branch, and I paste his reply.
> Please copy to him any posts/responses to this.
Who could possibly think that, given that the first beowulf was built
and named by a NASA program and that CESDIS for years housed both the
list and beowulf.org (with only one short hiatus when high level program
overseers became inexplicably stricken with some sort of mental
However, this TOO looks very cool, sort of at the edge of the possible
with clustering technology altogether.
It is obvious that clusters are indeed making their way into space, or
will be soon.
I won't even ask what a "Sensor and Fusion Engine" might be -- it would
be too much to hope that it would be a thermonuclear fusion engine that
cannot AFAIK exist with current technology, existing anyway and
preparing to really change the way we do space..:-)
> >From Jim Lyke
> Pretty cool. Sure, there has been publications on SAFE, and I have
> submitted a longer paper for publication.
> Sensor and Fusion Engine (SAFE) in its best case is 96 processors,
> into 12 bussed groups (the bus a customized Futurebus+) with a Myrinet
> bridge. The system is small enough in scale to be serviced by a single,
> 16-port duplex non-blocking Myrinet crossbar. So, 12 of the hubs are
> occupied with the 96 processors, which are of a special design
> (microprogrammable with IEEE 754(?) double-precision floating point
> support). Two of the remaining four hubs are equipped with FPGA-based
> front-end processors, to massage real-time sensor data into the packeted
> formats amenable to the 96-nodes. One of the remaining two hubs is
> by a boot processor, which distributes program loads over the network
> kicks off processor groups. The final port is a user/telemetry port,
> could be a simple Linux box equipped with a Myrinet card.
> Everything above (except Linux box) is designed to be crammed into a
> conduction-cooled 5x5x8 inch parallelopiped container. The Myrinet
> was gutted and replaced with a lower latency protocol with a one sigma
> latency of about 2uS on messages based on the statistics of our problem.
> The max sustainable peak is about 12 GFLOPS, which is because the chips
> built on 0.5um. The theoretic density of the system (even so) is
> over one TFLOP/cu.ft. We are moving forward to modernize the system,
> are funding limited. The ultimate barrier will be thermal. Even though
> use carbon-matrix composite materials that have 5X better heat
> than aluminum, the ultimate power densities as we encroach on
> >10TFLOPS/cu.ft. will overtake the ability of that material to draw heat
> away. There is discussion of trying to create a new type of thermal
> management material based on either carbon or boron nanotubes, which are
> claimed to beat natural diamond by about 2X.
> I wouldn't mind being copied on the posts/replies either.
> END OF LYKE
> Art Edwards
> On Thu, Apr 17, 2003 at 08:14:38AM -0400, Robert G. Brown wrote:
> > On Wed, 16 Apr 2003, Art Edwards wrote:
> > > I think I'm jumping into the middle of a conversation here, but our
> > > branch is the shop through which most of the DoD processor programs are
> > > managed. For real space applications there are radiation issues like
> > > total dose hardness and single even upset that require special design
> > > and, still, special processing. That is, you can't make these parts at
> > > any foundry (yet). There are currently two hardened foundries through
> > > which the most tolerant parts are fabricated. Where the commercial
> > > market is ~100's of Billions/year, the space electronics industry is
> > > ~200million/year. So parts are expensive, as Jim Lux says. But more
> > > importantly, the current state-of-the-art for space processors is
> > > several generations back. Now, with a 200 million market/year, who is
> > > going to spend the money to build a new foundry? (anyone?) It's a huge
> > > problem, and beowulfs in space will not give the economies of scale
> > > necessary to move us forward.
> > >
> > > I don't know if this has been discussed here, but have you thought about
> > > launch costs? They're huge. Weight, power, and mission lifetime are the
> > > crucial factors for space. These are the reasons that so much R&D goes
> > > into space electronics. I apologize if I have gone over old ground.
> > Actually, this is the sort of thing that makes (as Eray pointed out) the
> > idea of a cluster (leaving aside the COTS issue, the single-headed
> > issue, and whether or not it could be a true "beowulf" cluster)
> > attractive in space applications. What you (and Gerry) are saying is
> > that the space and DoD market is stuck using specially engineered,
> > radiation hard, not-so-bleeding-VLSI processors from what amounts to
> > several VLSI generations ago. The parts are expensive, but the cost of
> > building a newer better foundry for such a small and inelastic market
> > are prohibitive, so they are the only game in town.
> > If you have an orbital project or application that needs considerably
> > more speed than the undoubtedly pedestrian clock of these devices can
> > provide, you have a HUGE cost barrier to developing a faster processor,
> > and that barrier is largely out of your (DoD) or Nasa's control -- you
> > can only ask/hope for an industrial partner to make the investment
> > required to up the chip generation in hardened technology with the
> > promise of at least some guaranteed sales. You also have a known per
> > kilogram per liter cost for lifting stuff into space, and this is at
> > least modestly under your own control. So (presuming an efficiently
> > parallelizable task) instead of effectively financing a couple of
> > billion dollars in developing the nextgen hard chips to get a speedup of
> > ten or so, you can engineer twelve systems based on the current,
> > relatively cheap chips into a robust and fault tolerant cluster and pay
> > the known immediate costs of lifting those twelve systems into orbit.
> > Again presuming that it is for some reason not feasible to simply
> > establish a link to earth and do the processing here -- an application
> > for which the latency would be bad, an application that requires
> > immediate response in a changing environment when downlink
> > communications may not be robust.
> > A question that you or Gerry or Jim may or may not be able to answer
> > (with which Chip started this discussion): Are there any specific
> > non-classified instances that you know of where an actual "cluster"
> > (defined loosely as multiple identical CPUs interconnected with some
> > sort of communications bus or network and running a specific parallel
> > numerical task, not e.g. task-specific processors in several parts of a
> > military jet) has been engineered, built, and shot into space?
> > This has been interesting enough that if there are any, I may indeed add
> > a chapter to the book, if/when I next actually work on it. I got dem
> > end of semester blues, at the moment...:-)
> > rgb
> > --
> > Robert G. Brown http://www.phy.duke.edu/~rgb/
> > Duke University Dept. of Physics, Box 90305
> > Durham, N.C. 27708-0305
> > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf