[pkg-s48-maint] Bug#450948: scsh-0.6: Logic error allows infinite block in event.c:queue_ready_ports()

Derek Upham sand at blarg.net
Mon Nov 12 16:00:56 UTC 2007


Package: scsh-0.6
Version: 0.6.7-4
Severity: normal

When using scsh-0.6.7 with the SUnet web server, the VM regularly
hangs.  strace shows that the VM (either the parent process, or a
child process spawned to handle a request) is blocked in the select()
system call:

    select(0, [], [], [], NULL)

If the final parameter were a real struct timeval instead of NULL,
then select() would block for that time duration and return.  But with
a NULL struct timeval and empty fdset arguments, this just blocks
forever (or until it gets some explicit interrupt, I suppose).

Looking at the code, the only select() call that seems to allow this
combination of parameters is in queue_ready_ports() in event.c.  A
combination of 'wait' being true, 'seconds' being -1, and the global
'pending' variable being empty would trigger it:

  if ((! wait)
      &&  (pending.first == NULL))
    return (NO_ERRORS);
  FD_ZERO(&reads);
  FD_ZERO(&writes);
  FD_ZERO(&alls);
  limfd = 0;
  for (fdp = pending.first; fdp != NULL; fdp = fdp->next) {
    FD_SET(fdp->fd, fdp->is_input ? &reads : &writes);
    FD_SET(fdp->fd, &alls);
    if (limfd <= fdp->fd)
      limfd = fdp->fd + 1;
  }
  tvp = &tv;
  if (wait)
    if (seconds == -1){
      tvp = NULL;
    }
    else {
      tv.tv_sec = seconds;
      tv.tv_usec = ticks * (1000000 / TICKS_PER_SECOND);
    }
  else
    timerclear(&tv);

'wait' is true so we skip the 'return (NO_ERRORS)'.  'pending.first' is
NULL, so we don't hit any of the FD_SET calls.  And 'seconds' is -1 so
we set 'tvp' to NULL.  Then this call

    /* time gap */
    left = select(limfd, &reads, &writes, &alls, tvp);

has 0, [], [], [] and NULL.

It looks like this is the same problem that causes the SUnet site's
own web-server to lock up periodically.

I'm not sure what the correct solution is for this bug.  We could put
a special check in at the top, and return immediately for that
combination of parameters.  Or this may be a "should never happen"
combination, pointing to a real issue elsewhere: it seems like the
real problem is the lack of 'pending' ports.

It may be worth checking the latest Scheme48 sources for the same bug,
as well.

Derek

-- System Information:
Debian Release: lenny/sid
  APT prefers oldstable
  APT policy: (500, 'oldstable'), (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.22 (PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages scsh-0.6 depends on:
ii  libc6                         2.6.1-6    GNU C Library: Shared libraries
ii  libelfg0                      0.8.6-4    an ELF object file access library
ii  scsh-common-0.6               0.6.7-4    A `scheme' interpreter designed fo

scsh-0.6 recommends no packages.

-- no debconf information





More information about the pkg-scheme48-maintainers mailing list