<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Arial; font-size: 12pt; color: #000000'><br>Larry Stewart wrote:<div><br><div><div>&gt;Designing the communications network for this worst-case pattern has a</div><div>&gt;number of benefits: &nbsp;</div><div>&gt;</div><div>&gt; &nbsp; * it makes the machine less sensitive to the actual communications pattern</div><div>&gt; &nbsp; * it makes performance less variable run-to-run, when the job controller</div><div>&gt; &nbsp; &nbsp; chooses different subsets of the system</div><div><br></div><div>I agree with this and pretty much all of your other comments, but wanted to&nbsp;</div><div>make the point that a worst-case, hardware-only solution is not required</div><div>or necessarily where all of the research and development effort should</div><div>be placed for HPC as a whole. &nbsp;And let's not forgot that unless they are supported</div><div>by some coincidental volume requirement in another non-HPC market,</div><div>they will cost more (sometimes a lot). &nbsp;If worst-case hardware solutions were required then clusters</div><div>would not&nbsp;have pushed out&nbsp;their HPC predecessors, and novel high-end designs</div><div>would not find it so hard to break into the market. Lower cost hardware solutions often</div><div>stimulate the&nbsp;more software-intelligent use of the additional resources that come along</div><div>for the&nbsp;ride. &nbsp;With clusters you paid less for interconnects, memory interfaces,</div><div>and packaged software, and got to spend the savings on more memory, more</div><div>memory bandwidth (aggregate), and more processing power. &nbsp;This in turn</div><div>had an effect on the problems tackled, weak scaling an application&nbsp;was an</div><div>approach to use the memory while managing the impact of a &nbsp;cheaper</div><div>interconnect. &nbsp;</div><div><br></div><div>So, yes let's try to banish latency with cool state-of-the-art interconnects engineered</div><div>for worst-case, not common-case, scenarios (we have been&nbsp;hearing about the benefits of</div><div>high radix switches), but remember that interconnect cost and data locality and partitioning</div><div>will always&nbsp;matter and may make the worse-case interconnect unnecessary</div><div><br></div><div>&gt;There's a paper in the IBM Journal of Research and Development about this,</div><div>&gt;they wound up using simulated annealing to find good placement on the most</div><div>&gt;regular machine around, because the "obvious" assignments weren't&nbsp;optimal.</div><div><br></div><div>Can you point me at this paper ... sounds very interesting ... ??</div><div><br></div><div>&gt;Personally, I believe our thinking about interconnects has been poisoned by thinking</div><div>&gt;that NICs are I/O devices. &nbsp;We would be better off if they were coprocessors. &nbsp;Threads</div><div>&gt;should be able to send messages by writing to registers, and arriving packets should</div><div>&gt;activate a hyperthread that has full core capabilities for acting on them, and with the</div><div>&gt;ability to interact coherently with the memory hierarchy from the same end as other</div><div>&gt;processors. &nbsp;We had started kicking this around for the SiCortex gen-3 chip, but were</div><div>&gt;overtaken by events.</div><div><br></div><div>Yes to all this ... now that everyone has made the memory controller an integral</div><div>part of the processor. &nbsp;We can move on to the NIC ... ;-) ...</div><div><br></div><div>rbw</div><div><br></div></div></div></div></body></html>