[Beowulf] ...Re: Benchmark results

Jim Lux james.p.lux at jpl.nasa.gov
Thu Jan 8 11:46:21 EST 2004

----- Original Message -----
From: "Josip Loncaric" <josip at lanl.gov>
To: "Jim Lux" <James.P.Lux at jpl.nasa.gov>
Cc: <beowulf at beowulf.org>
Sent: Thursday, January 08, 2004 7:44 AM
Subject: Re: [Beowulf] ...Re: Benchmark results

> Jim Lux wrote:
> >
> > At 05:18 PM 1/7/2004 -0700, Josip Loncaric wrote:
> >
> >>  Despite various software/hardware network
> >> layers and longer distances involved, it should be possible to get
> >> microsecond synchronization through the network by gradually tweaking
> >> local clock offset and drift...
> >
> >
> > Most of the more recent implementations of NTP clients can do this sort
> > of thing, sending messages both ways across the link to measure the
> > latency (and latency distribution)
> I love NTP, but in its current state it does not deliver microsecond
> level synchronization.
> I just checked my old cluster (which runs NTP) and the compute nodes are
> off by 500-25000 microseconds.  Actually, most are within 1000
> microseconds, but about 20% are off by up to 25000 microseconds.  This
> happens over Fast Ethernet where ntpdate usually reports timing
> dispersion of 0-20 microseconds (based on 7 ping-pong measurements).  In
> other words, network timing is quite precise, but we'd need about 1000
> times better clock synchronization.
> NTP may need to be streamlined and tuned for clusters...

Longer integration times in the smoothing algorithm?

The limiting thing might be the instability of the clock oscillator on the
motherboard.  They're not necessarily designed for low phase noise, and may
have terrible short term statistics.

The other thing that will bite you with the NTP approach is the asynchronous
nature of the Ethernet interface against the CPU.  The Ethernet card has
it's own clock, it's own logic, etc., so the time from "receving a message"
over the wire to the CPU knowing about it could vary quite widely, and the
statistics might be non-stationary, which would kill any sort of statistical
averaging scheme.

You're probably right.. NTP, using just ethernet, will probably never get to
sub microsecond without external help, or cleverly designed specialized
ethernet hardware (use the bit clock in the PHY layer as the time transfer

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list