Scaling of hydro codes
joachim at ccrl-nece.de
Thu Apr 10 12:41:47 EDT 2003
> I don't expect latency to play a role for these timings, as we are only
> communicating a reasonably low number of large arrays in every time step;
> I suppose, Cactus does the same.
> And if saturation of the switch played a role, I would expect a
> well-defined drop at some critical value of Ncpu, not a power law.
I'd say that it's just your network which is to slow (Gbit ethernet is not
necessarily fast!) in relation to the speed of the CPUs.
Without knowing your code, I guess that with increasing Ncpu, the number of
communication operations and the transported volume of data increases, too.
This leads to increased communication time, while the time that each CPU
needs to run through its timestep remains constant (as you adapted the
problem size ~ Ncpu).
But wait, if you keep the workload per CPU constant with increasing Ncpu, how
comes that t_step scales with 1/Ncpu at all? Am I missing something here?
Anyway, you should check if a faster network could help you (by verifying if
the reason I suspected is valid). You might do this with MPE or Vampir
(commercial tool from Pallas, demo licenses available), or some other way of
Joachim Worringen - NEC C&C research lab St.Augustin
fon +49-2241-9252.20 - fax .99 - http://www.ccrl-nece.de
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf