[Beowulf] ...Re: Benchmark results
James.P.Lux at jpl.nasa.gov
Thu Jan 8 13:25:53 EST 2004
At 11:13 AM 1/8/2004 -0700, Josip Loncaric wrote:
>Jim Lux wrote:
>>From: "Josip Loncaric" <josip at lanl.gov>
>>>NTP may need to be streamlined and tuned for clusters...
>>Longer integration times in the smoothing algorithm?
>Better yet, extended Kalman filtering (given my control theory background).
>BTW, there was a highly relevant paper 9 years ago:
>"On Efficiently Implementing Global Time for Performance Evaluation on
>Multiprocessor Systems," Eric Maillet and Cecile Tron, Journal of Parallel
>and Distributed Computing, No. 28, pp. 84-93 (1995).
I'll get a copy and take a look.
>Their experimental analysis of the relationship between the system clock
>and the reference clock over a 50-minute period showed a good fit to a
>linear model, within about 50-100 microseconds, with slow 16-minute
>periodicity apparently related to thermal swings in the computer room (A/C
>These measurements were done using their T-Node "Meganode" system (128
>T800 transputers) on the Transputer network, older than 1995 vintage,
>probably with clock frequency of about 3.8 MHz. Current networks are much
>faster, although clock drift behavior may still be similar.
What was the measurement/integration interval for their frequency
measurements? Robert (and I, and others) are interested in stability over
fairly short time scales (milliseconds?).
>Presumably, clock drift coefficients could be correlated to motherboard
>temperature measurements. Over time, one should be able to construct a
>linearized model of clock dynamics, linking the local clock drift,
>motherboard temperature, and possibly even power supply voltages. This
>should overcome the main sources of timing uncertainty.
I wouldn't be too sure that you could build a suitable model. One could
certainly make a very accurate predictor in the long term sense: it would
very accurately model things like how many cycles of the processor clock
there are in every 10 or 60 second interval. But at shorter time scales,
the clocks MIGHT be pretty bad, with random jumps and stuff. I don't know,
but since it's relevant to my current research, I'm going to find out,
The run of the mill clock oscillators you see on a motherboard do not have
very good short term characteristics. For instance, you can't use them for
timing digital communications interfaces like SONET or OC-3 or ATM, where
they use much higher quality oscillators, with better jitter specs.
A typical processor isn't going to much care if there's, say, 10% jitter
(which is really hideously bad) in the clock supplied to it , providing the
clock multiplier loops on the chip can stay locked. Likewise, since the
processor is, generally, synchronously clocked logic (except for those of
you running PDP-8 based clusters!), instantaneous variation in clock rates
don't do much, other than set a bound on timing margins (more jitter means
you need more timing margin in the propagation paths on chip).
If you plot the frequency versus time for an oscillator, a "nice"
oscillator will show a fairly smooth monotonic drift with aging (usually
they drift up with time) and smooth variations with temperature and things
like gravitational force or acceleration(if you change the orientation of
the oscillator). A "bad" oscillator will have spontaneous jumps up and down.
We're talking parts per million here, though, just to keep things in
>P.S. Internally, clusters often have a single broadcast domain. Even if
>the head node broadcast an augmented NTP packet every second, this would
>still not impact the network much -- but may generate enough timing data
>so that each node's filter would have at least some good measurements to
James Lux, P.E.
Spacecraft Telecommunications Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf