[Pkg-postgresql-public] Bug#756606: postgresql-9.1: Init-Script does not work together with heartbeat

Marc Richter richter_marc at gmx.net
Thu Jul 31 09:44:17 UTC 2014


Package: postgresql-9.1
Version: 9.1.13-0wheezy1
Severity: important

Dear Maintainer,

After drbd, heartbeat and postgresql-9.1 is installed and basically configured, the attempt to run postgresql init script from heartbeat's haresources fails in multiple ways.
Unfortunately, I cannot even tell why it doesn't work, since I cannot even see reasons why this doesn't work in ha-debug log with debuging enabled.

The following lists what's going wrong:

1) A usual case for Postgres HA Clusters is to have /var/lib/postgresql in a DRBD synced resource, which is only mounted on one node at a time. When you have a resource group configured to start drbddisk, mount /var/lib/postgresql, start postgresql (in that order - see haresources - file listed later in this report) and start up heartbeat on both nodes, these resources are only started on the primary node for this resource group (first field in haresources file). These resources are not acquired on the standby - node.
Unfortunately, when stoping heartbeat on the standby node, heartbeat nevertheless tries to give up resources, even it hasn't acquired them before. Since /var/lib/postgresql wasn't mounted before on that node, issuing "/etc/init.d/postgresql stop" on the standby node fails, since it cannot find necessary files in /var/lib/postgresql .
Even without having heartbeat STONITH configured, this leads to a hard server reset somehow.
Solution: "/etc/init.d/postgresql stop" shouldn't return an error when the datadir is empty to make it usable along with heartbeat.

2) When starting heartbeat, it seems like postgresql isn't started at all. I do not understand this, since all other init-scripts I have tested (samba, cron) are working fine when used instead of postgresql in quoted haresources file below.

I have tried this on multiple, clean Debian wheezy installs from Bare metal server to workstation VirtualBox setups. The result is always the same.

You find the logs and configurations used following this line:

/etc/ha.d/haresources :

prod-cl3  drbddisk::var_lib_postgres Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 IPaddr::192.168.20.18/24/eth0 postgresql

=======================

/etc/ha.d/ha.cf :

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 30
warntime 10
initdead 90
udpport 694
ucast eth1 10.250.250.16
auto_failback on
node   prod-cl3
node   prod-cl4

=======================

/etc/drbd.conf :

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

=======================

/etc/drbd.d/global_common.conf :

global {
        usage-count yes;
}

common {
        protocol C;

        startup {
                wfc-timeout  15;
                degr-wfc-timeout 120;
        }

        disk {
                on-io-error     detach;
        }

        net {
                after-sb-0pri disconnect;
                after-sb-1pri disconnect;
                after-sb-2pri disconnect;
                rr-conflict disconnect;
        }

        syncer {
                rate 96256;
        }
}

=======================

/etc/drbd.d/prod-cl.res :

resource var_lib_postgres {
        protocol C;
        on prod-cl3 {
                device  /dev/drbd0;
                disk    /dev/prod-cl3_data/var_lib_postgres;
                address 10.250.250.16:7789;
                meta-disk       internal;
        }
        on prod-cl4 {
                device  /dev/drbd0;
                disk    /dev/prod-cl4_data/var_lib_postgres;
                address 10.250.250.17:7789;
                meta-disk       internal;
        }
}

=======================

ha-debug log, showing postgres isn't even started on primary node when heartbeat starts:

Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Core dumps could be lost if multiple dumps occur.
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: Pacemaker support: false
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: **************************
Jul 30 13:51:11 prod-cl3 heartbeat: [20846]: info: Configuration validated. Starting heartbeat 3.0.5
Jul 30 13:51:11 prod-cl3 heartbeat: [20847]: info: heartbeat: version 3.0.5
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: Heartbeat generation: 1406638883
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: bound send socket to device: eth1
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: bound receive socket to device: eth1
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: glib: ucast: started on port 694 interface eth1 to 10.250.250.17
Jul 30 13:51:12 prod-cl3 heartbeat: [20847]: info: Local status now set to: 'up'
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: node prod-cl4: is dead
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Comm_now_up(): updating status to active
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Local status now set to: 'active'
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: No STONITH device configured.
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: WARN: Shared disks are not protected.
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Resources being acquired from prod-cl4.
Jul 30 13:52:43 prod-cl3 heartbeat: [20876]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[20876]:    2014/07/30_13:52:43 info: Running /etc/ha.d//rc.d/status status
Jul 30 13:52:43 prod-cl3 heartbeat: [20877]: info: Local Resource acquisition completed.
mach_down[20910]:       2014/07/30_13:52:43 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: debug: StartNextRemoteRscReq(): child count 2
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: mach_down takeover complete.
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: info: Initial resource acquisition complete (mach_down)
Jul 30 13:52:43 prod-cl3 heartbeat: [20847]: debug: StartNextRemoteRscReq(): child count 1
mach_down[20910]:       2014/07/30_13:52:43 info: mach_down takeover complete for node prod-cl4.
Jul 30 13:52:43 prod-cl3 heartbeat: [20968]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[20968]:    2014/07/30_13:52:43 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp[20968]: 2014/07/30_13:52:43 received ip-request-resp drbddisk::var_lib_postgres OK yes
ResourceManager[20989]: 2014/07/30_13:52:43 info: Acquiring resource group: prod-cl3 drbddisk::var_lib_postgres Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 192.168.20.18/24/eth0 postgresql
ResourceManager[20989]: 2014/07/30_13:52:43 info: Running /etc/ha.d/resource.d/drbddisk var_lib_postgres start
Filesystem[21057]:      2014/07/30_13:52:43 INFO:  Resource is stopped
ResourceManager[20989]: 2014/07/30_13:52:43 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /var/lib/postgresql ext4 start
Filesystem[21131]:      2014/07/30_13:52:43 INFO: Running start for /dev/drbd0 on /var/lib/postgresql
FATAL: Module scsi_hostadapter not found.
Filesystem[21125]:      2014/07/30_13:52:43 INFO:  Success
INFO:  Success
IPaddr[21200]:  2014/07/30_13:52:43 INFO:  Resource is stopped
ResourceManager[20989]: 2014/07/30_13:52:43 info: Running /etc/ha.d/resource.d/IPaddr 192.168.20.18/24/eth0 start
IPaddr[21282]:  2014/07/30_13:52:43 INFO: Using calculated netmask for 192.168.20.18: 255.255.255.0
IPaddr[21282]:  2014/07/30_13:52:43 INFO: eval ifconfig eth0:0 192.168.20.18 netmask 255.255.255.0 broadcast 192.168.20.255
IPaddr[21258]:  2014/07/30_13:52:43 INFO:  Success
INFO:  Success
Jul 30 13:52:53 prod-cl3 heartbeat: [20847]: info: Local Resource acquisition completed. (none)
Jul 30 13:52:53 prod-cl3 heartbeat: [20847]: info: local resource transition completed.

=======================

ha-debug log, showing server crash when postgresql isn't properly stoped (due to missing files in datadir as described):

Jul 30 13:57:49 prod-cl4 heartbeat: [3340]: info: Heartbeat shutdown in progress. (3340)
Jul 30 13:57:49 prod-cl4 heartbeat: [3410]: info: Giving up all HA resources.
ResourceManager[3424]:  2014/07/30_13:57:49 info: Releasing resource group: prod-cl3 drbddisk::var_lib_postgres Filesystem::/dev/drbd0::/var/lib/postgresql::ext4 192.168.20.18/24/eth0 postgresql
ResourceManager[3424]:  2014/07/30_13:57:49 info: Running /etc/init.d/postgresql  stop
Stopping PostgreSQL 9.1 database server: mainError: /var/lib/postgresql/9.1/main is not accessible or does not exist ... failed!
 failed!
ResourceManager[3424]:  2014/07/30_13:57:50 ERROR: Return code 1 from /etc/init.d/postgresql
ResourceManager[3424]:  2014/07/30_13:57:51 info: Retrying failed stop operation [postgresql]
ResourceManager[3424]:  2014/07/30_13:5


-- System Information:
Debian Release: 7.6
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages postgresql-9.1 depends on:
ii  libc6                  2.13-38+deb7u3
ii  libcomerr2             1.42.5-1.1
ii  libgssapi-krb5-2       1.10.1+dfsg-5+deb7u1
ii  libkrb5-3              1.10.1+dfsg-5+deb7u1
ii  libldap-2.4-2          2.4.31-1+nmu2
ii  libpam0g               1.1.3-7.1
ii  libpq5                 9.1.13-0wheezy1
ii  libssl1.0.0            1.0.1e-2+deb7u11
ii  libxml2                2.8.0+dfsg1-7+wheezy1
ii  locales                2.13-38+deb7u3
ii  postgresql-client-9.1  9.1.13-0wheezy1
ii  postgresql-common      134wheezy4
ii  ssl-cert               1.0.32
ii  tzdata                 2014e-0wheezy1

postgresql-9.1 recommends no packages.

Versions of packages postgresql-9.1 suggests:
pn  locales-all             <none>
pn  oidentd | ident-server  <none>

-- no debconf information



More information about the Pkg-postgresql-public mailing list