[pkg-dhcp-devel] Bug#704175: closed by Michael Gilbert <mgilbert at debian.org> (Re: Bug#704175: isc-dhcp-server: init script removes dhcpd.pid)

Alan Sundell sundell at gmail.com
Sun Sep 6 00:33:21 UTC 2015


Huh? I don't think you read the bug.  You seem to be confusing bug symptoms
with intent.

Why would anyone *try* to run two dhcpds off the same config file and
corrupt their lease database and confuse a failover partner? The intent
here is to run a perfectly normal dhcpd setup.

But the pid file is used for concurrency protection by dhcpd. That "rm -f"
of the pid file outside of dhcpd interferes with this locking and
introduces a race condition that allows two dhcpds to start, which is a
*bad thing*, hence the bug report.

My guess is it's almost never safe to remove a pid file like this, though
for daemons that have other concurrency protection (such as bind()
failures), it would not be as important as it is for dhcpd.

On Sun, Sep 6, 2015 at 7:30 AM, Debian Bug Tracking System <
owner at bugs.debian.org> wrote:

> This is an automatic notification regarding your Bug report
> which was filed against the isc-dhcp-server package:
>
> #704175: isc-dhcp-server: init script removes dhcpd.pid
>
> It has been closed by Michael Gilbert <mgilbert at debian.org>.
>
> Their explanation is attached below along with your original report.
> If this explanation is unsatisfactory and you have not received a
> better one in a separate message then please contact Michael Gilbert <
> mgilbert at debian.org> by
> replying to this email.
>
>
> --
> 704175: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=704175
> Debian Bug Tracking System
> Contact owner at bugs.debian.org with problems
>
>
> ---------- Forwarded message ----------
> From: Michael Gilbert <mgilbert at debian.org>
> To: 704175-close at bugs.debian.org
> Cc:
> Date: Sat, 5 Sep 2015 19:27:21 -0400
> Subject: Re: [pkg-dhcp-devel] Bug#704175: isc-dhcp-server: init script
> removes dhcpd.pid
> On Thu, Mar 28, 2013 at 5:55 PM, Alan Sundell wrote:
> > In debian/isc-dhcp-server.init.d, there is the following code:
> >
> >         stop)
> >                 log_daemon_msg "Stopping $DESC" "$NAME"
> >                 start-stop-daemon --stop --quiet --pidfile "$DHCPD_PID"
> >                 log_end_msg $?
> >                 rm -f "$DHCPD_PID"
> >                 ;;
> >
> > So, you can end up in a situation like this:
> >    process a: stops dhcpd
> >    process b: starts dhcpd, which writes pid file
> >    process a: removes pid file
> >    process a: starts dhcpd, which writes another pid file
>
> I think you want to use different DHCPD_CONF files for your two server
> processes, each with different pid-file-name settings, to avoid this
> problem.
>
> Best wishes,
> Mike
>
> ---------- Forwarded message ----------
> From: Alan Sundell <sundell at gmail.com>
> To: Debian Bug Tracking System <submit at bugs.debian.org>
> Cc:
> Date: Thu, 28 Mar 2013 17:55:55 -0400
> Subject: isc-dhcp-server: init script removes dhcpd.pid
> Package: isc-dhcp-server
> Version: 4.2.4-5
> Severity: important
>
> Dear Maintainer,
>
> In debian/isc-dhcp-server.init.d, there is the following code:
>
>         stop)
>                 log_daemon_msg "Stopping $DESC" "$NAME"
>                 start-stop-daemon --stop --quiet --pidfile "$DHCPD_PID"
>                 log_end_msg $?
>                 rm -f "$DHCPD_PID"
>                 ;;
>
> So, you can end up in a situation like this:
>    process a: stops dhcpd
>    process b: starts dhcpd, which writes pid file
>    process a: removes pid file
>    process a: starts dhcpd, which writes another pid file
>
> Here's an strace from when I removed the pidfile and started dhcpd
> immediately afterwards, while another dhcpd was running:
>
> 10698 13:47:26 bind(5, {sa_family=AF_PACKET, proto=0x6574, if12392,
> pkttype=PACKET_HOST, addr(0)={0, }, 16) = 0
> 10698 13:47:26 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 7
> 10698 13:47:26 ioctl(7, SIOCGIFHWADDR, {ifr_name="eth0",
> ifr_hwaddr=aa:00:00:15:cb:1d}) = 0
> 10698 13:47:26 close(7)                 = 0
> 10698 13:47:26 setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER,
> "\v\0\0\0\0\0\0\0\340\373\237/\35\177\0\0", 16) = 0
> 10698 13:47:26 fcntl(5, F_SETFD, FD_CLOEXEC) = 0
> 10698 13:47:26 socket(PF_PACKET, SOCK_PACKET, 768) = 7
> 10698 13:47:26 bind(7, {sa_family=AF_PACKET, proto=0x6c6f, if0,
> pkttype=PACKET_HOST, addr(0)={0, }, 16) = 0
> 10698 13:47:26 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 8
> 10698 13:47:26 ioctl(8, SIOCGIFHWADDR, {ifr_name="lo",
> ifr_hwaddr=00:00:00:00:00:00}) = 0
> 10698 13:47:26 close(8)                 = 0
> 10698 13:47:26 setsockopt(7, SOL_SOCKET, SO_ATTACH_FILTER,
> "\v\0\0\0\0\0\0\0\340\373\237/\35\177\0\0", 16) = 0
> 10698 13:47:26 fcntl(7, F_SETFD, FD_CLOEXEC) = 0
> 10698 13:47:26 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 8
> 10698 13:47:26 setsockopt(8, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> 10698 13:47:26 bind(8, {sa_family=AF_INET, sin_port=htons(67),
> sin_addr=inet_addr("0.0.0.0")}, 16) = 0
> 10698 13:47:26 fcntl(8, F_SETFD, FD_CLOEXEC) = 0
> 10698 13:47:26 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 9
> 10698 13:47:26 fcntl(9, F_SETFD, FD_CLOEXEC) = 0
> 10698 13:47:26 setsockopt(9, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> 10698 13:47:26 bind(9, {sa_family=AF_INET, sin_port=htons(7911),
> sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)
> 10698 13:47:26 close(9)                 = 0
> 10698 13:47:26 close(9)                 = -1 EBADF (Bad file descriptor)
> 10698 13:47:26 sendto(3, "<163>May 14 13:47:26 dhcpd: Can'"..., 77,
> MSG_NOSIGNAL, NULL, 0) = 77
> 10698 13:47:26 write(6, "\nfailover peer \"xxxxxx\" stat"..., 131) = 131
> 10698 13:47:26 fsync(6)                 = 0
> 10698 13:47:26 sendto(3, "<166>May 14 13:47:26 dhcpd: fail"..., 83,
> MSG_NOSIGNAL, NULL, 0) = 83
> 10698 13:47:26 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 9
> 10698 13:47:26 bind(9, {sa_family=AF_INET, sin_port=htons(0),
> sin_addr=inet_addr("172.30.192.69")}, 16) = 0
> 10698 13:47:26 fcntl(9, F_SETFD, FD_CLOEXEC) = 0
> 10698 13:47:26 setsockopt(9, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> 10698 13:47:26 fcntl(9, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
> 10698 13:47:26 connect(9, {sa_family=AF_INET, sin_port=htons(520),
> sin_addr=inet_addr("172.24.154.70")}, 16) = -1 EINPROGRESS (Operation now
> in progress)
> 10698 13:47:26 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 10
> 10698 13:47:26 fcntl(10, F_SETFD, FD_CLOEXEC) = 0
> 10698 13:47:26 setsockopt(10, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> 10698 13:47:26 bind(10, {sa_family=AF_INET, sin_port=htons(520),
> sin_addr=inet_addr("172.30.192.69")}, 16) = -1 EADDRINUSE (Address already
> in use)
> 10698 13:47:26 close(10)                = 0
> 10698 13:47:26 close(10)                = -1 EBADF (Bad file descriptor)
> 10698 13:47:26 clone(child_stack=0,
> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x7f1d2f72a9d0) = 10699
> 10698 13:47:26 exit_group(0)            = ?
> 10699 13:47:26 open("/var/run/dhcp-server/dhcpd.pid", O_RDONLY) = -1
> ENOENT (No such file or directory)
> 10699 13:47:26 open("/var/run/dhcp-server/dhcpd.pid",
> O_WRONLY|O_CREAT|O_TRUNC, 0644) = 10
> 10699 13:47:26 write(10, "10699\n", 6)  = 6
>
> dhcpd has some checks for bind() failures, but they don't seem to be
> triggered
> here (the bind() to port 67 succeeds because no listen() has been called,
> because it uses recvfrom(); the preceeding ones are raw sockets, and dhcpd
> doesn't seem to treat the other failures as fatal).
>
> Now we have two dhcpds, and there is no locking around dhcpd.leases, so
> they get to fight over writing data there, which causes loss of lease data.
>
> Things get even more confusing if (as in this case), this is part of a
> failover pair.  One of the dhcpds will have the TCP ports for inbound
> connections from the peer (the other will indefinitely call add_timeout()
> to reschedule a bind attempt).  Both will be trying to connect to the peer
> and resolve conflicts, which will confuse the peer about the state.
>
> Obviously, there are some serious problems upstream with lack of
> concurrency
> protection in dhcpd (checking if the pid in the pidfile is alive is not
> very reliable).
>
> But I'm not sure why that "rm -f" was added, or what problem it solves,
> and it certainly interacts badly with the limited checks that dhcpd does
> to prevent multiple copies from running.  Hence this bug.
>
> [Note: I'm filing this from a system that has nothing to do with dhcpd, and
> I've marked the version in sid, but AFAIK, this 'rm -f' has been there a
> long
> time.  Also not sure about severity -- there is data loss potential, but
> it's
> lease data.  Eventually, in failover pairs, the state conflicts will result
> in a non-serving pair.]
>
> -- System Information:
> Debian Release: wheezy/sid
>   APT prefers precise-updates
>   APT policy: (600, 'precise-updates'), (600, 'precise-security'), (600,
> 'precise'), (400, 'precise-backports')
> Architecture: amd64 (x86_64)
>
> Kernel: Linux 3.5.0-26-generic (SMP w/4 CPU cores)
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/bash
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-dhcp-devel/attachments/20150906/7989dd9f/attachment-0001.html>


More information about the pkg-dhcp-devel mailing list