[pkg-dhcp-devel] Bug#704175: isc-dhcp-server: init script removes dhcpd.pid
Alan Sundell
sundell at gmail.com
Thu Mar 28 21:55:55 UTC 2013
Package: isc-dhcp-server
Version: 4.2.4-5
Severity: important
Dear Maintainer,
In debian/isc-dhcp-server.init.d, there is the following code:
stop)
log_daemon_msg "Stopping $DESC" "$NAME"
start-stop-daemon --stop --quiet --pidfile "$DHCPD_PID"
log_end_msg $?
rm -f "$DHCPD_PID"
;;
So, you can end up in a situation like this:
process a: stops dhcpd
process b: starts dhcpd, which writes pid file
process a: removes pid file
process a: starts dhcpd, which writes another pid file
Here's an strace from when I removed the pidfile and started dhcpd
immediately afterwards, while another dhcpd was running:
10698 13:47:26 bind(5, {sa_family=AF_PACKET, proto=0x6574, if12392, pkttype=PACKET_HOST, addr(0)={0, }, 16) = 0
10698 13:47:26 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 7
10698 13:47:26 ioctl(7, SIOCGIFHWADDR, {ifr_name="eth0", ifr_hwaddr=aa:00:00:15:cb:1d}) = 0
10698 13:47:26 close(7) = 0
10698 13:47:26 setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, "\v\0\0\0\0\0\0\0\340\373\237/\35\177\0\0", 16) = 0
10698 13:47:26 fcntl(5, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 socket(PF_PACKET, SOCK_PACKET, 768) = 7
10698 13:47:26 bind(7, {sa_family=AF_PACKET, proto=0x6c6f, if0, pkttype=PACKET_HOST, addr(0)={0, }, 16) = 0
10698 13:47:26 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 8
10698 13:47:26 ioctl(8, SIOCGIFHWADDR, {ifr_name="lo", ifr_hwaddr=00:00:00:00:00:00}) = 0
10698 13:47:26 close(8) = 0
10698 13:47:26 setsockopt(7, SOL_SOCKET, SO_ATTACH_FILTER, "\v\0\0\0\0\0\0\0\340\373\237/\35\177\0\0", 16) = 0
10698 13:47:26 fcntl(7, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 8
10698 13:47:26 setsockopt(8, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
10698 13:47:26 bind(8, {sa_family=AF_INET, sin_port=htons(67), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
10698 13:47:26 fcntl(8, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 9
10698 13:47:26 fcntl(9, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 setsockopt(9, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
10698 13:47:26 bind(9, {sa_family=AF_INET, sin_port=htons(7911), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)
10698 13:47:26 close(9) = 0
10698 13:47:26 close(9) = -1 EBADF (Bad file descriptor)
10698 13:47:26 sendto(3, "<163>May 14 13:47:26 dhcpd: Can'"..., 77, MSG_NOSIGNAL, NULL, 0) = 77
10698 13:47:26 write(6, "\nfailover peer \"xxxxxx\" stat"..., 131) = 131
10698 13:47:26 fsync(6) = 0
10698 13:47:26 sendto(3, "<166>May 14 13:47:26 dhcpd: fail"..., 83, MSG_NOSIGNAL, NULL, 0) = 83
10698 13:47:26 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 9
10698 13:47:26 bind(9, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("172.30.192.69")}, 16) = 0
10698 13:47:26 fcntl(9, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 setsockopt(9, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
10698 13:47:26 fcntl(9, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
10698 13:47:26 connect(9, {sa_family=AF_INET, sin_port=htons(520), sin_addr=inet_addr("172.24.154.70")}, 16) = -1 EINPROGRESS (Operation now in progress)
10698 13:47:26 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 10
10698 13:47:26 fcntl(10, F_SETFD, FD_CLOEXEC) = 0
10698 13:47:26 setsockopt(10, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
10698 13:47:26 bind(10, {sa_family=AF_INET, sin_port=htons(520), sin_addr=inet_addr("172.30.192.69")}, 16) = -1 EADDRINUSE (Address already in use)
10698 13:47:26 close(10) = 0
10698 13:47:26 close(10) = -1 EBADF (Bad file descriptor)
10698 13:47:26 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f1d2f72a9d0) = 10699
10698 13:47:26 exit_group(0) = ?
10699 13:47:26 open("/var/run/dhcp-server/dhcpd.pid", O_RDONLY) = -1 ENOENT (No such file or directory)
10699 13:47:26 open("/var/run/dhcp-server/dhcpd.pid", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 10
10699 13:47:26 write(10, "10699\n", 6) = 6
dhcpd has some checks for bind() failures, but they don't seem to be triggered
here (the bind() to port 67 succeeds because no listen() has been called,
because it uses recvfrom(); the preceeding ones are raw sockets, and dhcpd
doesn't seem to treat the other failures as fatal).
Now we have two dhcpds, and there is no locking around dhcpd.leases, so
they get to fight over writing data there, which causes loss of lease data.
Things get even more confusing if (as in this case), this is part of a
failover pair. One of the dhcpds will have the TCP ports for inbound
connections from the peer (the other will indefinitely call add_timeout()
to reschedule a bind attempt). Both will be trying to connect to the peer
and resolve conflicts, which will confuse the peer about the state.
Obviously, there are some serious problems upstream with lack of concurrency
protection in dhcpd (checking if the pid in the pidfile is alive is not
very reliable).
But I'm not sure why that "rm -f" was added, or what problem it solves,
and it certainly interacts badly with the limited checks that dhcpd does
to prevent multiple copies from running. Hence this bug.
[Note: I'm filing this from a system that has nothing to do with dhcpd, and
I've marked the version in sid, but AFAIK, this 'rm -f' has been there a long
time. Also not sure about severity -- there is data loss potential, but it's
lease data. Eventually, in failover pairs, the state conflicts will result
in a non-serving pair.]
-- System Information:
Debian Release: wheezy/sid
APT prefers precise-updates
APT policy: (600, 'precise-updates'), (600, 'precise-security'), (600, 'precise'), (400, 'precise-backports')
Architecture: amd64 (x86_64)
Kernel: Linux 3.5.0-26-generic (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
More information about the pkg-dhcp-devel
mailing list