[pkg-dhcp-devel] Bug#704175: isc-dhcp-server: init script removes dhcpd.pid

Alan Sundell sundell at gmail.com
Thu Mar 28 21:55:55 UTC 2013


Package: isc-dhcp-server
Version: 4.2.4-5
Severity: important

Dear Maintainer,

In debian/isc-dhcp-server.init.d, there is the following code:

        stop)
                log_daemon_msg "Stopping $DESC" "$NAME"
                start-stop-daemon --stop --quiet --pidfile "$DHCPD_PID"
                log_end_msg $?
                rm -f "$DHCPD_PID"
                ;;

So, you can end up in a situation like this:
   process a: stops dhcpd
   process b: starts dhcpd, which writes pid file
   process a: removes pid file
   process a: starts dhcpd, which writes another pid file

Here's an strace from when I removed the pidfile and started dhcpd
immediately afterwards, while another dhcpd was running:

10698 13:47:26 bind(5, {sa_family=AF_PACKET, proto=0x6574, if12392, pkttype=PACKET_HOST, addr(0)={0, }, 16) = 0
10698 13:47:26 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 7
10698 13:47:26 ioctl(7, SIOCGIFHWADDR, {ifr_name="eth0", ifr_hwaddr=aa:00:00:15:cb:1d}) = 0
10698 13:47:26 close(7)                 = 0
10698 13:47:26 setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, "\v\0\0\0\0\0\0\0\340\373\237/\35\177\0\0", 16) = 0
10698 13:47:26 fcntl(5, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 socket(PF_PACKET, SOCK_PACKET, 768) = 7
10698 13:47:26 bind(7, {sa_family=AF_PACKET, proto=0x6c6f, if0, pkttype=PACKET_HOST, addr(0)={0, }, 16) = 0
10698 13:47:26 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 8
10698 13:47:26 ioctl(8, SIOCGIFHWADDR, {ifr_name="lo", ifr_hwaddr=00:00:00:00:00:00}) = 0
10698 13:47:26 close(8)                 = 0
10698 13:47:26 setsockopt(7, SOL_SOCKET, SO_ATTACH_FILTER, "\v\0\0\0\0\0\0\0\340\373\237/\35\177\0\0", 16) = 0
10698 13:47:26 fcntl(7, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 8
10698 13:47:26 setsockopt(8, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
10698 13:47:26 bind(8, {sa_family=AF_INET, sin_port=htons(67), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
10698 13:47:26 fcntl(8, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 9
10698 13:47:26 fcntl(9, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 setsockopt(9, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
10698 13:47:26 bind(9, {sa_family=AF_INET, sin_port=htons(7911), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)
10698 13:47:26 close(9)                 = 0
10698 13:47:26 close(9)                 = -1 EBADF (Bad file descriptor)
10698 13:47:26 sendto(3, "<163>May 14 13:47:26 dhcpd: Can'"..., 77, MSG_NOSIGNAL, NULL, 0) = 77
10698 13:47:26 write(6, "\nfailover peer \"xxxxxx\" stat"..., 131) = 131
10698 13:47:26 fsync(6)                 = 0
10698 13:47:26 sendto(3, "<166>May 14 13:47:26 dhcpd: fail"..., 83, MSG_NOSIGNAL, NULL, 0) = 83
10698 13:47:26 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 9
10698 13:47:26 bind(9, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.30.192.69")}, 16) = 0
10698 13:47:26 fcntl(9, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 setsockopt(9, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
10698 13:47:26 fcntl(9, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
10698 13:47:26 connect(9, {sa_family=AF_INET, sin_port=htons(520), sin_addr=inet_addr("172.24.154.70")}, 16) = -1 EINPROGRESS (Operation now in progress)
10698 13:47:26 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 10
10698 13:47:26 fcntl(10, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 setsockopt(10, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
10698 13:47:26 bind(10, {sa_family=AF_INET, sin_port=htons(520), sin_addr=inet_addr("172.30.192.69")}, 16) = -1 EADDRINUSE (Address already in use)
10698 13:47:26 close(10)                = 0
10698 13:47:26 close(10)                = -1 EBADF (Bad file descriptor)
10698 13:47:26 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f1d2f72a9d0) = 10699
10698 13:47:26 exit_group(0)            = ?
10699 13:47:26 open("/var/run/dhcp-server/dhcpd.pid", O_RDONLY) = -1 ENOENT (No such file or directory)
10699 13:47:26 open("/var/run/dhcp-server/dhcpd.pid", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 10
10699 13:47:26 write(10, "10699\n", 6)  = 6

dhcpd has some checks for bind() failures, but they don't seem to be triggered
here (the bind() to port 67 succeeds because no listen() has been called,
because it uses recvfrom(); the preceeding ones are raw sockets, and dhcpd
doesn't seem to treat the other failures as fatal).

Now we have two dhcpds, and there is no locking around dhcpd.leases, so
they get to fight over writing data there, which causes loss of lease data.

Things get even more confusing if (as in this case), this is part of a
failover pair.  One of the dhcpds will have the TCP ports for inbound
connections from the peer (the other will indefinitely call add_timeout()
to reschedule a bind attempt).  Both will be trying to connect to the peer
and resolve conflicts, which will confuse the peer about the state.

Obviously, there are some serious problems upstream with lack of concurrency
protection in dhcpd (checking if the pid in the pidfile is alive is not
very reliable).

But I'm not sure why that "rm -f" was added, or what problem it solves,
and it certainly interacts badly with the limited checks that dhcpd does
to prevent multiple copies from running.  Hence this bug.

[Note: I'm filing this from a system that has nothing to do with dhcpd, and
I've marked the version in sid, but AFAIK, this 'rm -f' has been there a long
time.  Also not sure about severity -- there is data loss potential, but it's
lease data.  Eventually, in failover pairs, the state conflicts will result
in a non-serving pair.]

-- System Information:
Debian Release: wheezy/sid
  APT prefers precise-updates
  APT policy: (600, 'precise-updates'), (600, 'precise-security'), (600, 'precise'), (400, 'precise-backports')
Architecture: amd64 (x86_64)

Kernel: Linux 3.5.0-26-generic (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash



More information about the pkg-dhcp-devel mailing list