[Pkg-db-devel] Bug#232305: marked as done (libdb4.2: Hangs on SMP (HyperThreaded P4) system using slapd 2.1.25-1)

Debian Bug Tracking System owner@bugs.debian.org
Sat, 05 Feb 2005 03:18:15 -0800


Your message dated Sat, 5 Feb 2005 09:09:06 -0200
with message-id <20050205110906.GA14958@khazad-dum.debian.net>
and subject line Bug#232305: hangs of slapd should be fixed by now
has caused the attached Bug report to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what I am
talking about this indicates a serious mail system misconfiguration
somewhere.  Please contact me immediately.)

Debian bug tracking system administrator
(administrator, Debian Bugs database)

--------------------------------------
Received: (at submit) by bugs.debian.org; 11 Feb 2004 22:47:09 +0000
>From hmh@debian.org Wed Feb 11 14:47:09 2004
Return-path: <hmh@debian.org>
Received: from master.debian.org [146.82.138.7] 
	by spohr.debian.org with esmtp (Exim 3.35 1 (Debian))
	id 1Ar38P-0003I8-00; Wed, 11 Feb 2004 14:47:09 -0800
Received: from khazad-dum.debian.net [200.196.10.6] 
	by master.debian.org with esmtp (Exim 3.35 1 (Debian))
	id 1Ar38O-0001cD-00; Wed, 11 Feb 2004 16:47:08 -0600
Received: from localhost (localhost [127.0.0.1])
	by localhost.khazad-dum.debian.net (Postfix) with ESMTP
	id 21C6C202C4F; Wed, 11 Feb 2004 20:47:02 -0200 (BRST)
Received: from khazad-dum.debian.net ([127.0.0.1])
	by localhost (khazad-dum [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id 24103-01; Wed, 11 Feb 2004 20:47:00 -0200 (BRST)
Received: by khazad-dum.debian.net (Postfix, from userid 1000)
	id 32F35202C47; Wed, 11 Feb 2004 20:47:00 -0200 (BRST)
Date: Wed, 11 Feb 2004 20:47:00 -0200
From: Henrique de Moraes Holschuh <hmh@debian.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: libdb4.2: Hangs on SMP (HyperThreaded P4) system using slapd 2.1.25-1
Message-ID: <20040211224700.GA23288@khazad-dum.debian.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Reportbug-Version: 2.44
X-GPG-Fingerprint-1: 1024D/128D36EE 50AC 661A 7963 0BBA 8155  43D5 6EF7 F36B 128D 36EE
X-GPG-Fingerprint-2: 1024D/1CDB0FE3 5422 5C61 F6B7 06FB 7E04  3738 EE25 DE3F 1CDB 0FE3
X-Debbugs-CC: Torsten Landschoff <torsten@debian.org>
User-Agent: Mutt/1.5.5.1+cvs20040105i
X-Virus-Scanned: by amavisd-new-20030616-p7 (Debian) at khazad-dum.debian.net
Delivered-To: submit@bugs.debian.org
X-Spam-Checker-Version: SpamAssassin 2.60-bugs.debian.org_2004_02_10 
	(1.212-2003-09-23-exp) on spohr.debian.org
X-Spam-Status: No, hits=-8.0 required=4.0 tests=HAS_PACKAGE,X_DEBBUGS_CC 
	autolearn=no version=2.60-bugs.debian.org_2004_02_10
X-Spam-Level: 

Package: libdb4.2
Version: 4.2.52-10
Severity: grave
Justification: stops critical system services dead on their tracks

slapd is hanging here about once every three days.  Stopping slapd and
trying to access the database directly using slapcat would hang, too.

Going to the database dir and running a db4.2_recover fixes the issue, which
leads me to believe the dreaded db4.2 hangs on SMP systems are back again,
and corrupting the db environment while at it (since a restart of slapd will
result in it deadlocking immediately).

The changelogs on libdb4.2 and slapd have me really worried.  A patch was
applied to libdb4.2, but there is also this in the slapd changelog:

    + servers/slapd/back-bdb/dbcache.c: Turn off subdatabases. This
      is an incompatible database format change, but according to
      Howard Chu "using them (subdatabases) is known to cause deadlocks
      on multiprocessor machines, among other issues."

Is this a libdb4.2 problem? Or is slapd using libdb4.2 in a wrong way that
causes it to deadlock (in which case the bug should be reassigned)?

I remind you that running a db4.2_recover was the _only_ way to get slapd
(slapcat, in this case) to work again, so it looks like the on-disk libdb4.2
environment got corrupted or in a state that causes db4.2 to deadlock.

If it happens again, I will try to debug the deadlock further.  It is a very
unloaded system, at most two concurrent ldap connections happen at the same
time.

The database was quite clean, since I had regenerated it two days ago (by
using slapcat to get the data in text format, rm -rf'ing the entire db
environment, and recreating everything with slapadd and slapindex).

Kernel is 2.4.25-rc1 with Debian patches. CPU is Intel P4 2.8 HT, stepping
9.  Everything else is up-to-date unstable.

-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (990, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.4.25-rc1
Locale: LANG=pt_BR, LC_CTYPE=pt_BR

Versions of packages libdb4.2 depends on:
ii  libc6                       2.3.2.ds1-11 GNU C Library: Shared libraries an

-- no debconf information


-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

---------------------------------------
Received: (at 232305-done) by bugs.debian.org; 5 Feb 2005 11:09:16 +0000
>From hmh@debian.org Sat Feb 05 03:09:15 2005
Return-path: <hmh@debian.org>
Received: from master.debian.org [146.82.138.7] 
	by spohr.debian.org with esmtp (Exim 3.35 1 (Debian))
	id 1CxNoR-0003xa-00; Sat, 05 Feb 2005 03:09:15 -0800
Received: from mac-200-220-128-101.nipnet.net.br (khazad-dum.debian.net) [200.220.128.101] 
	by master.debian.org with esmtp (Exim 3.35 1 (Debian))
	id 1CxNoP-00047c-00; Sat, 05 Feb 2005 05:09:13 -0600
Received: from localhost (localhost [127.0.0.1])
	by localhost.khazad-dum.debian.net (Postfix) with ESMTP id E09FF20613F;
	Sat,  5 Feb 2005 09:09:07 -0200 (BRST)
Received: from khazad-dum.debian.net ([127.0.0.1])
	by localhost (khazad-dum [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id 12268-01; Sat, 5 Feb 2005 09:09:06 -0200 (BRST)
Received: by khazad-dum.debian.net (Postfix, from userid 1000)
	id 6B245201F65; Sat,  5 Feb 2005 09:09:06 -0200 (BRST)
Date: Sat, 5 Feb 2005 09:09:06 -0200
From: Henrique de Moraes Holschuh <hmh@debian.org>
To: Andreas Barth <aba@not.so.argh.org>, 232305-done@bugs.debian.org
Subject: Re: Bug#232305: hangs of slapd should be fixed by now
Message-ID: <20050205110906.GA14958@khazad-dum.debian.net>
References: <20050205094931.GA22702@mails.so.argh.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20050205094931.GA22702@mails.so.argh.org>
X-GPG-Fingerprint-1: 1024D/128D36EE 50AC 661A 7963 0BBA 8155  43D5 6EF7 F36B 128D 36EE
X-GPG-Fingerprint-2: 1024D/1CDB0FE3 5422 5C61 F6B7 06FB 7E04  3738 EE25 DE3F 1CDB 0FE3
User-Agent: Mutt/1.5.6+20040907i
X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at khazad-dum.debian.net
Delivered-To: 232305-done@bugs.debian.org
X-Spam-Checker-Version: SpamAssassin 2.60-bugs.debian.org_2005_01_02 
	(1.212-2003-09-23-exp) on spohr.debian.org
X-Spam-Status: No, hits=-4.0 required=4.0 tests=BAYES_00,HAS_BUG_NUMBER,
	RCVD_IN_SBLXBL,RCVD_IN_SBLXBL_CBL autolearn=no 
	version=2.60-bugs.debian.org_2005_01_02
X-Spam-Level: 

On Sat, 05 Feb 2005, Andreas Barth wrote:
> a 2.4/2.6-compatible way. It would be great if you could re-check this
> (or just close this bug if you remember the same as I on our IRC
> conversation :).

Well, as I said, no more hangs on 2.4, so I am closing the bug.  If one of
the 2.6 machines hang, I will reopen it, but that's not very likely to
happen.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh