[pkg-ntp-maintainers] Bug#711548: Bug#711548: Fragile handling of pidfile by /etc/init.d/ntp

Sergio Gelato sergio.gelato at astro.su.se
Sat Jun 8 08:12:26 UTC 2013


On Fri, 7 Jun 2013 22:11:30 +0200, Kurt Roeckx <kurt at roeckx.be> wrote:
> On Fri, Jun 07, 2013 at 09:53:38PM +0200, Sergio Gelato wrote:
>> Package: ntp
>> Version: 1:4.2.6.p5+dfsg-2
>> 
>> The current /etc/init.d/ntp cannot recover from a situation in which
>> /var/run/ntpd.pid exists but does not contain the correct PID for the
>> running daemon.
>> 
>> How to reproduce:
>> 
>> # pgrep -f ntpd
>> 9219
>> # ps -f -p 9404
>> UID        PID  PPID  C STIME TTY          TIME CMD
>> root      9404  9395  0 20:38 pts/2    00:00:00 /bin/bash
>> # printf 9404 > /var/run/ntpd.pid
> 
> Why do you do this?

To reproduce a problem that was observed on a live system. I don't know
how the pidfile got broken; it happened right after a reboot, so I'm
guessing some kind of race between startup scripts was involved.

>> # invoke-rc.d ntp status
>> NTP server is running.
>> # echo $?
>> 0
>> # invoke-rc.d ntp restart
>> Stopping NTP server: ntpd.
>> Starting NTP server: ntpd.
>> # echo $?
>> 0
>> # pgrep -f ntpd
>> 9219
> 
> Did you expect something else, if you break the pid file?

Yes, actually. I don't particularly care about the pid file: it's just a
tool that may or may not be useful for managing the daemon. What I care
about is that "invoke-rc.d <whatever> status" and "invoke-rc.d <whatever>
restart" do what it says on the tin; the pid file is an internal
implementation detail.

> 
> It's running as 9219, but you decided to write 9404 to the file.
> So of course it didn't kill it.
> 
>> # cat /var/run/ntpd.pid
>> 9485
> 
> But you started a new one, which wrote a PID file, and then it
> died because it detected that an other ntpd was still running,
> and you really one want 1 running.  It probably shouldn't have
> written the pid file in that case.  But it should never have
> gotton in the situation if you didn't manually write something
> to that pid file.

I agree; and yet it did (on an Ubuntu precise system, so I'd report that
aspect of the problem to Launchpad and not to the Debian BTS). My concern
here is that regardless of how the pid file got corrupted, once it has been
corrupted it doesn't self-heal. I noticed because puppet started trying to
restart ntpd on every run. By default puppet relies on the status
functionality of the init script to decide whether a restart is needed; in
my testing I found that it makes better decisions (for ntp) if I tell it
not to trust the status reported by the init script. (I think it falls back
on running "ps" and parsing the output.)
> 
> # ps -f -p 9404
>> UID        PID  PPID  C STIME TTY          TIME CMD
>> root      9404  9395  0 20:38 pts/2    00:00:00 /bin/bash
> 
> So the script did the right thing and did not kill some
> random process that happens to be written in the pid file.

Indeed I'm not complaining about that particular aspect of the problem. On
the contrary, couldn't this be part of the solution? Once the script
discovers that the pid file is inaccurate it could just delete it.
> 
> 
> Kurt



More information about the pkg-ntp-maintainers mailing list