Bug#396937: [pkg-ntp-maintainers] Bug#396937: Backgrounded ntpdate from ifup races with hwclock

Sat Nov 4 10:32:13 CET 2006

Re,

On Fri, Nov 03, 2006 at 11:19:42PM +0100, Kurt Roeckx wrote:
> On Fri, Nov 03, 2006 at 08:51:03PM +0100, beck wrote:
> > Package: ntpdate
> > Version: 1:4.2.2.p4+dfsg-1
> > Severity: normal
> > 
> > it appears that the new way ntpdate-debian is called from if-up.d (starting
> > it as a background process to prevent delays during boot) can lead to a
> > race with hwclock.sh which is called later from rcS. In my case, ntpdate
> > was seemingly started after the interface went up and ran a while in the
> > background. In the meantime, hwclock.sh corrected the clock according to
> > CMOS clock information. At this point, the clock is correct, but ntpdate
> > is still not completed. When it finally completes, it corrects the clock
> > by another hour, resulting in a wrong clock. The unsatisfactory result is
> > that though I have a correct CMOS clock, working NTP servers and I am
> > running both ntpdate and a local NTP server, I end up with the wrong
> > time. Due to the large offset, ntpd will not fix the clock, either.
> 
> ntpdate should never adjust the clock wrong by an hour, it should set it
> correct.

Yep, but it does. I've had the proper loop in rc equipped with a date(1)
call after every single init script, which revealed that time was wrong
(by misinterpretation of the CMOS clock as UTC) in the whole boot process
until S50hwclock.sh fixed it (which up to this is expected behavior). Both
the output from that script (I even let it run -xv) and the date(1)
immediately following it showed correct time.

Then, later and somehow magically, time changed again to a value 1h in
the past. It happened after some init scripts (like exim startup) that
will not touch the clock by themselves. The only explanation I have for
this behavior is the backgrounded ntpdate-debian hosed the clock after
finally getting data from its NTP servers. You may probably force this
behavior easily by having one or two unreachable servers in the sequence
first.

> I think your problem is that hwclock is started after ntpdate.

At least this is way too late for hwclock as we all agree - and running
hwclock at a more proper time would likely fix it. What remains is the
knowledge that ntpdate does something silly, though - when it runs over
a macroscopic timescale due to unreachable servers or similar delays
and something else changes the kernel clock during this time, it might
end up offsetting the time *again*. Obviously it thinks it is the only
tool that controls the clock, and everything works perfectly when it is.
But now that it runs backgrounded, other tools might interfere. There is
probably not only hwclock, but other time correction tools that use
various sources might collide with it as well. IMO this should be fixed
upstream, even when there cannot be a perfect fix (a small chance for
a race condition will remain).

> > My CMOS clock is running in local time (intentionally) which is CET.
> 
> Do you know about #342887 and all it's merged bugs, which is supposed to
> be fixed in util-linux 2.12r-13 just a few days ago?  It had a problem
> with time not in UTC.

Oh my goodness, what a biblical thread. No, I didn't know about it yet, but
it explains a lot. It seems what caused my problem in the first place was
the former "fix" that deactivated hwclockfirst.sh which used to fix the
clock in the early boot process.

> It seems to have moved to S8 according to the changelog, and I believe
> that should fix your problem.

In -14 it's now back at S11 and this might not be the last word, as there
is good reason it should run before checkroot. The old solution wasn't that
bad after all, it seems - it always worked for me, as I have no dedicated
filesystem for /usr. But hwclock.sh tries to write to / so it cannot be
easily moved before checkroot. What a mess ;)

> Can you please chech that it works with the new util-linux?  I intend to
> reassign this to util-linux and close it.

Sadly, -13 (and above) are not yet in testing and might need some more
time to go there due to freeze. I'll try to just move the existing
hwclock.sh to S11, which will likely fix the issue. Hopefully the chaos
around hwclock gets cleaned before etch hits the streets.

Thanks,
Andre.
-- 
                  The _S_anta _C_laus _O_peration
  or "how to turn a complete illusion into a neverending money source"

-> Andre Beck    +++ ABP-RIPE +++    IBH Prof. Dr. Horn GmbH, Dresden <-