[Pkg-ganeti-devel] Bug#785788: ganeti: luxid dies regularly

Joerg Jaspert joerg at debian.org
Wed May 20 08:23:27 UTC 2015


Package: ganeti
Version: 2.11.6-1~bpo70+1
Severity: important

Dear Maintainer,

on the ganeti master node I found that the luxid dies regularly. This
happens somewhere between 1 to 1.5 days, possibly depending on the
number of commands it deals with. A simple /etc/init.d/ganeti restart
"fixes" it and one can use the cluster again, but obviously that is
not a good solution.

The last thing in log when this happens is

------------------------------------------------------------------------
2015-05-07 08:50:05,904644000000 CEST: ganeti-luxid pid=175797 INFO Rereading job 51157
2015-05-07 08:50:05,905380000000 CEST: ganeti-luxid pid=175797 INFO Finished jobs: (51157,JOB_STATUS_SUCCESS)
2015-05-07 08:54:21,256650000000 CEST: ganeti-luxid pid=175797 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-07 08:54:21,257120000000 CEST: ganeti-luxid pid=175797 WARNING Rescheduling jobs: 
2015-05-07 08:54:21,257706000000 CEST: ganeti-luxid pid=175797 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-07 08:54:21,257828000000 CEST: ganeti-luxid pid=175797 WARNING Rescheduling jobs: 
2015-05-07 08:54:21,258016000000 CEST: ganeti-luxid pid=175797 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-07 08:54:21,258163000000 CEST: ganeti-luxid pid=175797 WARNING Rescheduling jobs: 
2015-05-07 08:54:21,258431000000 CEST: ganeti-luxid pid=175797 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-07 08:54:21,258550000000 CEST: ganeti-luxid pid=175797 WARNING Rescheduling jobs: 
2015-05-07 08:54:21,258755000000 CEST: ganeti-luxid pid=175797 INFO Waiting jobs: []; running jobs: [36853,36858,36883,36884,36886,36917,37010,44515,44518,44528,44529,44532,44533,44534,44535,
44540,44554]
2015-05-07 08:55:03,053666000000 CEST: ganeti-luxid pid=175797 INFO Successfully handled QueryGroups
2015-05-07 08:55:03,400245000000 CEST: ganeti-luxid pid=175797 INFO Successfully handled Query
2015-05-07 08:55:04,784546000000 CEST: ganeti-luxid pid=175797 INFO Successfully handled Query
2015-05-07 08:55:05,230757000000 CEST: ganeti-luxid pid=175797 INFO Successfully handled Query
2015-05-07 08:55:05,574790000000 CEST: ganeti-luxid pid=175797 INFO New jobs enqueued: 51158
2015-05-07 08:55:05,575083000000 CEST: ganeti-luxid pid=175797 INFO Starting jobs: 51158
2015-05-07 08:55:05,575246000000 CEST: ganeti-luxid pid=175797 INFO Successfully handled SubmitJob
ganeti-luxid: file descriptor 1025 out of range for select (0--1024).
Recompile with -threaded to work around this.
------------------------------------------------------------------------

Note that the part with the failed started jobs due to "Resource
temporarily unavailable" happens at other times too, at which luxid
does NOT die, so while that is another thing to look at (wtf, there
are enough resources), I dont think it is the cause. Especially as the
failure happens without that too:

------------------------------------------------------------------------
2015-05-13 08:40:05,864418000000 CEST: ganeti-luxid pid=844184 INFO Finished jobs: (51941,JOB_STATUS_SUCCESS)
2015-05-13 08:42:47,965607000000 CEST: ganeti-luxid pid=844184 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-13 08:42:47,965896000000 CEST: ganeti-luxid pid=844184 WARNING Rescheduling jobs: 
2015-05-13 08:42:47,966336000000 CEST: ganeti-luxid pid=844184 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-13 08:42:47,966457000000 CEST: ganeti-luxid pid=844184 WARNING Rescheduling jobs: 
2015-05-13 08:42:47,966727000000 CEST: ganeti-luxid pid=844184 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-13 08:42:47,966844000000 CEST: ganeti-luxid pid=844184 WARNING Rescheduling jobs: 
2015-05-13 08:42:47,967252000000 CEST: ganeti-luxid pid=844184 INFO Waiting jobs: []; running jobs: [36853,36858,36883,36884,36886,36917,37010,44515,44518,44528,44529,44532,44533,44534,44535,
44540,44554]
2015-05-13 08:42:47,967498000000 CEST: ganeti-luxid pid=844184 WARNING Starting jobs failed: connect: resource exhausted (Resource temporarily unavailable)
2015-05-13 08:42:47,967651000000 CEST: ganeti-luxid pid=844184 WARNING Rescheduling jobs: 
2015-05-13 08:45:02,422865000000 CEST: ganeti-luxid pid=844184 INFO Successfully handled QueryGroups
ganeti-luxid: file descriptor 1025 out of range for select (0--1024).
Recompile with -threaded to work around this.
------------------------------------------------------------------------


-- System Information:
Debian Release: 7.8
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-0.bpo.4-amd64 (SMP w/2 CPU cores)
Locale: LANG=de_DE.utf8, LC_CTYPE=de_DE.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

-- 
bye Joerg



More information about the Pkg-ganeti-devel mailing list