<html><body>

<DIV>&nbsp;</DIV>

<BLOCKQUOTE style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">

<P>-------------- Original message -------------- <BR>From: Håkon Bugge &lt;Hakon.Bugge@scali.com&gt; <BR><BR>&gt; I have a slightly different view. Hybrid <BR>&gt; programming is used for performance reasons, but <BR>&gt; only in cases where parallelization (to the same <BR>&gt; level) is impossible/impractical using the pure <BR>&gt; MPI mode, or the parallelization yields low <BR>&gt; efficiency. So, if you're able to achieve your <BR>&gt; performance with MPI, you probably will. But <BR>&gt; there are cases where you cannot; a) the <BR>&gt; "decomposition parallel efficiency" is not good <BR>&gt; enough or b) the processes need a huge (shared) table. <BR></P>

<P>I think that what is being said here is that applications may be decomposible in some number of dimensions, but not so in all.&nbsp; If the benefits in performance in locally managing the "unruly" dimensions are great enough, then a hybrid program may be worth the trouble.&nbsp; I think that&nbsp;the number of real-world apps in this class is perhaps not large, or there would be more hybrid code.&nbsp;</P>

<P>Another perhaps relavent&nbsp;alternative that will at some point be able to take on both the partionable and unpartionable extreme cases and everything in between are the PGAS language extensions (UPC and CAF).&nbsp; Not yet at distributed-memory, performance-parity with well-coded MPI, but with, arguably, an intrinsic programmability advantage in LOC and in&nbsp;data structure&nbsp;coverage.&nbsp; AMR codes tracking shedding vortices are inherently non-partionable (or in need of regular repartitioning).&nbsp; Managing then in either MPI&nbsp;or OpenMP&nbsp;in&nbsp;a distributed memory environment is a chore.</P>

<P>And if you believe that ... ;-) ... then there is of course the "magic" of many-threaded latency hiding (can't say I am a true believer&nbsp;for the&nbsp;data intensive&nbsp;OZ of HPC).&nbsp; Some would have you believe that a 32 thread, 8 core Niagara 2 (or perhaps a future&nbsp;design at some&nbsp;higher active thread to core ratio)&nbsp;can hide all your data latency events behind its active thread horizon.&nbsp; </P>

<P>Maybe the key is to combine PGAS with many-threads ... mmm ... anyone doing this?</P>

<P>;-) </P>

<P>rbw</P>

<P>-- <BR><BR>"Making predictions is hard, especially about the future." <BR><BR>Niels Bohr <BR><BR>-- <BR><BR>Richard Walsh <BR>Thrashing River Consulting-- <BR>5605 Alameda St. <BR>Shoreview, MN 55126 <BR><BR>Phone #: 612-382-4620</P></BLOCKQUOTE>

!DSPAM:475479ee39781523621093!


</body></html>