Some of these &quot;decay over time&quot; questions have been worked on in the context of detector design in high energy physics.  Everything in the big detectors needs to be radiation hard...<br><br><div class="gmail_quote">

On Thu, Jul 7, 2011 at 2:38 PM, Prentice Bisbal <span dir="ltr">&lt;<a href="mailto:prentice@ias.edu">prentice@ias.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="im">On 07/07/2011 02:26 PM, Lux, Jim (337C) wrote:<br>

&gt;&gt;&gt; It&#39;s all about ultimate scalability.  Anybody with a moderate competence (certainly anyone on this<br>

&gt;&gt; list) could devise a scheme to use 1000 perfect processors that never fail to do 1000 quanta of work<br>

&gt;&gt; in unit time.  It&#39;s substantially more challenging to devise a scheme to do 1000 quanta of work in<br>

&gt;&gt; unit time on, say, 1500 processors with a 20% failure rate.  Or even in 1.2*unit time.<br>

&gt;&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; Just to be clear - I wasn&#39;t saying this was a bad idea. Scaling up to<br>

&gt;&gt; this size seems inevitable. I was just imagining the team of admins who<br>

&gt;&gt; would have to be working non-stop to replace dead processors!<br>

&gt;&gt;<br>

&gt;&gt; I wonder what the architecture for this system will be like. I imagine<br>

&gt;&gt; it will be built around small multi-socket blades that are hot-swappable<br>

&gt;&gt; to handle this.<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; I think that you just anticipate the failures and deal with them.  It&#39;s challenging to write code to do this, but it&#39;s certainly a worthy objective. I can easily see a situation where the cost to replace dead units is so high that you just don&#39;t bother doing it: it&#39;s cheaper to just add more live ones to the &quot;pool&quot;.<br>

&gt;<br>

<br>

</div>Did you read the paper that someone else posted a link to? I just read<br>

the first half of it. A good part of this research is focused on<br>

fault-tolerance/resiliency of computer systems. They&#39;re not just<br>

interested in creating a computer to mimic the brain, they want to learn<br>

how to mimic the brain&#39;s fault-tolerance in computers.<br>

<br>

To paraphrase the paper, we lose a neuron a second in our brains for our<br>

entire lives, but we never notice any problems from that. This research<br>

hopes to learn how to duplicate with that this computer, so you could<br>

say hardware failures are desirable and necessary for this research.<br>

<font color="#888888"><br>

Prentice<br>

</font><div><div></div><div class="h5">_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>- - - - - - -   - - - - - - -   - - - - - - - <br>Nathan Moore<br>Associate Professor, Physics<br>Winona State University<br>- - - - - - -   - - - - - - -   - - - - - - -<br>

<br />-- 

<br />This message has been scanned for viruses and

<br />dangerous content by

<a href="http://www.mailscanner.info/"><b>MailScanner</b></a>, and is

<br />believed to be clean.