[Beowulf] ...Re: Benchmark results

Jim Lux James.P.Lux at jpl.nasa.gov
Wed Jan 7 19:35:36 EST 2004

At 05:18 PM 1/7/2004 -0700, Josip Loncaric wrote:
>Jim Lux wrote:
>>Josip Loncaric wrote:
>>>Here is my challenge to the Beowulf and MPI communities: Achieve 
>>>microsecond-level *global* synchronization and tracking of all system 
>>>clocks within a cluster, for application use via portable calls 
>>>gettimeofday() and/or MPI_Wtime().
>>Is added hardware on each node legal along with distributing some 
>>periodic time mark (like a 1 pulse per second from a GPS receiver)?
>I do not think this would be required, especially on clusters which 
>already have a high performance network.  These days, 1/2 round trip time 
>can be as low as 2.4 microseconds (Quadrics).  Even Fast Ethernet may 
>prove sufficient (over time) if the delays are not too noisy.  BTW, the 
>objective here is to support parallel timings within a cluster, so "global 
>time" is still "local to this cluster only", although one may want to run 
>NTP on the head node to stay close to the official time.
>Back when VXIbus was introduced, I really liked its extra trigger lines 
>which could synchronize multiple instruments within a few nanoseconds, but 
>it is unlikely that people would distribute clock that way in a 
>cluster.  Despite various software/hardware network layers and longer 
>distances involved, it should be possible to get microsecond 
>synchronization through the network by gradually tweaking local clock 
>offset and drift...

Most of the more recent implementations of NTP clients can do this sort of 
thing, sending messages both ways across the link to measure the latency 
(and latency distribution)  There's a fairly big literature on 
synchronizing clocks over noisy, unreliable, non-deterministic 
communication links. (Some of the basic theory is tied into what's called 
the "Byzantine Generals" problem.)

I wasn't so much advocating synchronization in an absolute sense, but that 
one can accurately synchronize a cluster to much better than a microsecond, 
if one can distribute a common time mark to all processors in some way.  A 
really crummy communications network,  plus a single time hack, can do it. 
In some ways, I guess this could be considered a classic barrier 
synchronization scheme.  Conceptually, the time synchronization process in 
all threads gets to the barrier and waits for the pulse to release 
them.  In my own implementation, part of the work was done by moderately 
specialized hardware. While reasonably deterministic in their execution 
time, even DSPs weren't good enough.

James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
tel: (818)354-2075
fax: (818)393-6875

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list