[Beowulf] ...Re: Benchmark results

Robert G. Brown rgb at phy.duke.edu
Thu Jan 8 11:34:26 EST 2004


On Thu, 8 Jan 2004, Josip Loncaric wrote:

> Jim Lux wrote:
> > 
> > At 05:18 PM 1/7/2004 -0700, Josip Loncaric wrote:
> > 
> >>  Despite various software/hardware network 
> >> layers and longer distances involved, it should be possible to get 
> >> microsecond synchronization through the network by gradually tweaking 
> >> local clock offset and drift...
> > 
> > 
> > Most of the more recent implementations of NTP clients can do this sort 
> > of thing, sending messages both ways across the link to measure the 
> > latency (and latency distribution) 
> 
> I love NTP, but in its current state it does not deliver microsecond 
> level synchronization.
> 
> I just checked my old cluster (which runs NTP) and the compute nodes are 
> off by 500-25000 microseconds.  Actually, most are within 1000 
> microseconds, but about 20% are off by up to 25000 microseconds.  This 
> happens over Fast Ethernet where ntpdate usually reports timing 
> dispersion of 0-20 microseconds (based on 7 ping-pong measurements).  In 
> other words, network timing is quite precise, but we'd need about 1000 
> times better clock synchronization.
> 
> NTP may need to be streamlined and tuned for clusters...

I'm not sure that the problem is NTP itself -- from what I recall
reading in its documentation it is CAPABLE of achieving 1 usec
synchronization, over time.  However, in order to do that it needs
better tools than the current gettimeofday (and other time calls) which
seem to deliver no better than perhaps millisecond resolution.  It also
very likely needs to be done not by a userspace daemon but by the
kernel, as only the kernel can decide to do something in a true "single
user mode" and for specified brief periods pretty much ignore interrupts
and pending tasks and internal sources of delay and noise.

If one used the internal hardware cycle counter as the timer (with
sub-nanosecond hardware resolution) and worked solely on iteratively
refining a single clock set/sync variable (the offset of the CPU cycle
counter relative to the epoch, basically) then one could IN PRINCIPLE
get below a microsecond even with fast ethernet with enough handshaking.
It might take a few tens of thousands of packets per node on an
otherwise quiet network, but eventually both ends would agree on packet
timings, reject exchanges that fell outside the empirically determined
minimum/optimum packet timing for the particular hardware, and settle
down to let the central limit theorem and enough exchanges reduce the
error bars to the desired level of precision.

That's why I think it is a reasonable project for a bright
undergraduate.  It doesn't seem to me that it would be that difficult,
at least not at the cluster/LAN/one-hop level where the networking
hardware can be at least moderately deterministic.  ntp is really a WAN
tool and is robust and portable at the expense of potentially achievable
precision.

   rgb

> 
> Sincerely,
> Josip
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu



_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list