[Virtual-pkg-base-maintainers] Bug#638316: base: Task blocked for more than 120 seconds

Roberto Dal Zilio roberto.dalzilio at ilvillage.it
Thu Aug 18 15:08:57 UTC 2011


Package: base
Severity: important


I have a serious problem with debian squeeze servers on virtualized environment (XenServer 5.6).
Frequent kernel errors appears on /var/log/messsages or dmesg and informs me that there are processes blocked.
When the process is apache2, seems, it remains in "D" (Uninterruptible sleep) status and don't respond at http requests. Also the virtual machine has an high CPU iowait.
This VM is part of an ocfs2 cluster with 8 nodes but the problem also occurs outside the ocfs2 cluster (for example with an ext3 filesystem).
Every virtual machine is in his own dedicated volume on Equallogic iSCSI SAN.   

The errors are like these:

[3759012.889335] INFO: task java:14127 blocked for more than 120 seconds.
[3759012.889341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[3759012.889345] java          D ffff8800da913f90     0 14127      1 0x00000000
[3759012.889348]  ffff8800da913f90 0000000000000286 ffff8800fdb45448 0000000000001be6
[3759012.889352]  0000000000000000 ffffffff8100e22f 000000000000f9e0 ffff88007b86bfd8
[3759012.889356]  0000000000015780 0000000000015780 ffff8800e5541530 ffff8800e5541828
[3759012.889360] Call Trace:
[3759012.889362]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[3759012.889369]  [<ffffffffa0142a67>] ? ocfs2_inode_unlock+0x144/0x1aa [ocfs2]
[3759012.889372]  [<ffffffff811531a6>] ? cap_inode_permission+0x0/0x3
[3759012.889374]  [<ffffffff812fb99b>] ? __mutex_lock_common+0x122/0x192
[3759012.889377]  [<ffffffff810e71b1>] ? kmem_cache_alloc+0x8c/0xf0
[3759012.889380]  [<ffffffff812fbac3>] ? mutex_lock+0x1a/0x31
[3759012.889382]  [<ffffffff810f89fd>] ? do_filp_open+0x226/0x94b
[3759012.889385]  [<ffffffff8100c2c2>] ? pfn_to_mfn+0xe/0x22
[3759012.889388]  [<ffffffff8100c33c>] ? pte_pfn_to_mfn+0x21/0x30
[3759012.889391]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[3759012.889393]  [<ffffffff81101d0d>] ? alloc_fd+0x67/0x10c
[3759012.889396]  [<ffffffff810ed1eb>] ? do_sys_open+0x55/0xfc
[3759012.889398]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[3804012.893213] INFO: task java:6498 blocked for more than 120 seconds.
[3804012.893228] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[3804012.893237] java          D ffff8800038f7c40     0  6498      1 0x00000000
[3804012.893245]  ffff880002d8dbd0 0000000000000286 ffff8800ac4f9d38 ffff8800ac4f9d34
[3804012.893254]  0000000000000001 ffffffff8100e22f 000000000000f9e0 ffff8800ac4f9fd8
[3804012.893258]  0000000000015780 0000000000015780 ffff8800018a0710 ffff8800018a0a08
[3804012.893262] Call Trace:
[3804012.893271]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[3804012.893283]  [<ffffffffa0142a67>] ? ocfs2_inode_unlock+0x144/0x1aa [ocfs2]
[3804012.893288]  [<ffffffff812fb99b>] ? __mutex_lock_common+0x122/0x192
[3804012.893296]  [<ffffffff810e71b1>] ? kmem_cache_alloc+0x8c/0xf0
[3804012.893300]  [<ffffffff812fbac3>] ? mutex_lock+0x1a/0x31
[3804012.893304]  [<ffffffff810f89fd>] ? do_filp_open+0x226/0x94b
[3804012.893307]  [<ffffffff8100c2c2>] ? pfn_to_mfn+0xe/0x22
[3804012.893310]  [<ffffffff8100c33c>] ? pte_pfn_to_mfn+0x21/0x30
[3804012.893313]  [<ffffffff812fcaca>] ? error_exit+0x2a/0x60
[3804012.893316]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[3804012.893320]  [<ffffffff81101d0d>] ? alloc_fd+0x67/0x10c
[3804012.893323]  [<ffffffff810ed1eb>] ? do_sys_open+0x55/0xfc
[3804012.893326]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

[2086346.591237] INFO: task apache2:2218 blocked for more than 120 seconds.
[2086346.591241] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2086346.591245] apache2       D ffff8800f3f08710     0  2218    957 0x00000000
[2086346.591248]  ffff8800f3f08710 0000000000000286 ffff8800fe7b1c20 ffff8800feb03170
[2086346.591252]  ffff8800feb03170 ffff8800feb03170 000000000000f9e0 ffff8800fe7b1fd8
[2086346.591256]  0000000000015780 0000000000015780 ffff8800feb03170 ffff8800feb03468
[2086346.591259] Call Trace:
[2086346.591262]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[2086346.591265]  [<ffffffff812fc51a>] ? _spin_unlock_irqrestore+0xd/0xe
[2086346.591267]  [<ffffffff812fb99b>] ? __mutex_lock_common+0x122/0x192
[2086346.591270]  [<ffffffff812fbac3>] ? mutex_lock+0x1a/0x31
[2086346.591272]  [<ffffffff810f64c8>] ? do_lookup+0x80/0x15d
[2086346.591275]  [<ffffffff810f6f48>] ? __link_path_walk+0x5a5/0x6f5
[2086346.591277]  [<ffffffff810f72c6>] ? path_walk+0x66/0xc9
[2086346.591280]  [<ffffffff810f8730>] ? do_path_lookup+0x20/0x77
[2086346.591283]  [<ffffffff810f9c12>] ? user_path_at+0x48/0x79
[2086346.591285]  [<ffffffff81064f1a>] ? autoremove_wake_function+0x0/0x2e
[2086346.591288]  [<ffffffff81242247>] ? move_addr_to_user+0x61/0x7c
[2086346.591291]  [<ffffffff810f205e>] ? vfs_fstatat+0x2c/0x57
[2086346.591293]  [<ffffffff810f2151>] ? sys_newstat+0x11/0x30
[2086346.591296]  [<ffffffff8106c5e5>] ? getnstimeofday+0x55/0xaf
[2086346.591298]  [<ffffffff8106c692>] ? do_gettimeofday+0x12/0x34
[2086346.591301]  [<ffffffff8105245c>] ? sys_gettimeofday+0x2b/0x61
[2086346.591303]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[2086346.591308] INFO: task apache2:2257 blocked for more than 120 seconds.
[2086346.591313] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2086346.591317] apache2       D ffff8800f3f0dbd0     0  2257    957 0x00000000
[2086346.591320]  ffff8800f3f0dbd0 0000000000000286 ffff8800febe3c20 ffff8800f3f0a350
[2086346.591324]  ffff8800f3f0a350 ffff8800f3f0a350 000000000000f9e0 ffff8800febe3fd8
[2086346.591327]  0000000000015780 0000000000015780 ffff8800f3f0a350 ffff8800f3f0a648
[2086346.591331] Call Trace:
[2086346.591334]  [<ffffffff8100e22f>] ? xen_restore_fl_direct_end+0x0/0x1
[2086346.591337]  [<ffffffff812fc51a>] ? _spin_unlock_irqrestore+0xd/0xe
[2086346.591339]  [<ffffffff812fb99b>] ? __mutex_lock_common+0x122/0x192
[2086346.591342]  [<ffffffff812fbac3>] ? mutex_lock+0x1a/0x31
[2086346.591344]  [<ffffffff810f64c8>] ? do_lookup+0x80/0x15d
[2086346.591347]  [<ffffffff810f6f48>] ? __link_path_walk+0x5a5/0x6f5
[2086346.591349]  [<ffffffff810f72c6>] ? path_walk+0x66/0xc9
[2086346.591352]  [<ffffffff810f8730>] ? do_path_lookup+0x20/0x77
[2086346.591354]  [<ffffffff810f9c12>] ? user_path_at+0x48/0x79
[2086346.591357]  [<ffffffff81064f1a>] ? autoremove_wake_function+0x0/0x2e
[2086346.591360]  [<ffffffff81242247>] ? move_addr_to_user+0x61/0x7c
[2086346.591363]  [<ffffffff810f205e>] ? vfs_fstatat+0x2c/0x57
[2086346.591365]  [<ffffffff810f2151>] ? sys_newstat+0x11/0x30
[2086346.591368]  [<ffffffff8106c5e5>] ? getnstimeofday+0x55/0xaf
[2086346.591370]  [<ffffffff8106c692>] ? do_gettimeofday+0x12/0x34
[2086346.591373]  [<ffffffff8105245c>] ? sys_gettimeofday+0x2b/0x61
[2086346.591375]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

Apache2 (2.2.19) servers are behind Ngnix reverse proxy and strace command reports this when an Apache2 child dies:

read(38, "GET /images/image001.g"..., 8000) = 721
gettimeofday({1313595282, 352344}, NULL) = 0
gettimeofday({1313595282, 352400}, NULL) = 0
gettimeofday({1313595282, 352448}, NULL) = 0
gettimeofday({1313595282, 352505}, NULL) = 0
gettimeofday({1313595282, 352552}, NULL) = 0
gettimeofday({1313595282, 352603}, NULL) = 0
gettimeofday({1313595282, 352653}, NULL) = 0
gettimeofday({1313595282, 352704}, NULL) = 0
gettimeofday({1313595282, 352750}, NULL) = 0
gettimeofday({1313595282, 352813}, NULL) = 0
gettimeofday({1313595282, 352863}, NULL) = 0
gettimeofday({1313595282, 352918}, NULL) = 0
gettimeofday({1313595282, 352964}, NULL) = 0
gettimeofday({1313595282, 353015}, NULL) = 0
gettimeofday({1313595282, 353062}, NULL) = 0
gettimeofday({1313595282, 353116}, NULL) = 0
gettimeofday({1313595282, 353164}, NULL) = 0
gettimeofday({1313595282, 353215}, NULL) = 0
gettimeofday({1313595282, 353278}, NULL) = 0
gettimeofday({1313595282, 353324}, NULL) = 0
gettimeofday({1313595282, 353409}, NULL) = 0
gettimeofday({1313595282, 353483}, NULL) = 0
gettimeofday({1313595282, 353538}, NULL) = 0
gettimeofday({1313595282, 353620}, NULL) = 0
gettimeofday({1313595282, 353668}, NULL) = 0
gettimeofday({1313595282, 353731}, NULL) = 0
gettimeofday({1313595282, 353782}, NULL) = 0
gettimeofday({1313595282, 353833}, NULL) = 0
gettimeofday({1313595282, 353883}, NULL) = 0
gettimeofday({1313595282, 353934}, NULL) = 0
gettimeofday({1313595282, 353984}, NULL) = 0
gettimeofday({1313595282, 354034}, NULL) = 0
gettimeofday({1313595282, 354079}, NULL) = 0
gettimeofday({1313595282, 354147}, NULL) = 0
gettimeofday({1313595282, 354219}, NULL) = 0
gettimeofday({1313595282, 354282}, NULL) = 0
gettimeofday({1313595282, 354331}, NULL) = 0
gettimeofday({1313595282, 354383}, NULL) = 0
gettimeofday({1313595282, 354435}, NULL) = 0
gettimeofday({1313595282, 354487}, NULL) = 0
gettimeofday({1313595282, 354536}, NULL) = 0
gettimeofday({1313595282, 354605}, NULL) = 0
gettimeofday({1313595282, 354654}, NULL) = 0
stat("/cluster_www/images/image001.gif", {st_mode=S_IFREG|0644, st_size=4731, ...}) = 0
open("/.htaccess", O_RDONLY|O_CLOEXEC)  = -1 ENOENT (No such file or directory)
open("/cluster_web/.htaccess", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/cluster_web/www/.htaccess", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
lstat("/cluster_www/images", {st_mode=S_IFDIR|0755, st_size=3896, ...}) = 0
lstat("/cluster_www/images/image001.gif", {st_mode=S_IFREG|0644, st_size=4731, ...}) = 0
open("/cluster_www/images/image001.gif", O_RDONLY|O_CLOEXEC) = 39
fcntl(39, F_GETFD)                      = 0x1 (flags FD_CLOEXEC)
setsockopt(38, SOL_TCP, TCP_CORK, [1], 4) = 0
writev(38, [{"HTTP/1.1 200 OK\r\nDate: Wed, 17 A"..., 284}], 1) = 284
sendfile(38, 39, [0], 4731)             = 4731
setsockopt(38, SOL_TCP, TCP_CORK, [0], 4) = 0
gettimeofday({1313595282, 355611}, NULL) = 0
write(27, "192.168.199.13 - - [17/Aug/2011"..., 211) = 211
gettimeofday({1313595282, 355742}, NULL) = 0
times({tms_utime=0, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 1889853582
gettimeofday({1313595282, 355871}, NULL) = 0
shutdown(38, 1 /* send */)              = 0
poll([{fd=38, events=POLLIN}], 1, 2000) = 1 ([{fd=38, revents=POLLIN|POLLHUP}])
read(38, "", 512)                       = 0
close(38)                               = 0
read(5, 0x7fff9c0a3dcf, 1)              = -1 EAGAIN (Resource temporarily unavailable)
close(39)                               = 0
gettimeofday({1313595282, 356503}, NULL) = 0
accept(4, {sa_family=AF_INET6, sin6_port=htons(54007), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 38
fcntl(38, F_GETFD)                      = 0
fcntl(38, F_SETFD, FD_CLOEXEC)          = 0
gettimeofday({1313595287, 894718}, NULL) = 0
getsockname(38, {sa_family=AF_INET6, sin6_port=htons(80), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
fcntl(38, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(38, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
gettimeofday({1313595287, 894970}, NULL) = 0
gettimeofday({1313595287, 895026}, NULL) = 0
gettimeofday({1313595287, 895075}, NULL) = 0
read(38, "OPTIONS * HTTP/1.0\r\nUser-Agent: "..., 8000) = 86
gettimeofday({1313595287, 895194}, NULL) = 0
gettimeofday({1313595287, 895251}, NULL) = 0
gettimeofday({1313595287, 895301}, NULL) = 0
gettimeofday({1313595287, 895355}, NULL) = 0
gettimeofday({1313595287, 895406}, NULL) = 0
gettimeofday({1313595287, 895465}, NULL) = 0
gettimeofday({1313595287, 895518}, NULL) = 0
writev(38, [{"HTTP/1.1 200 OK\r\nDate: Wed, 17 A"..., 152}], 1) = 152
gettimeofday({1313595287, 895670}, NULL) = 0
write(36, "::1 - - [17/Aug/2011:17:34:47 +0"..., 123) = 123
gettimeofday({1313595287, 895797}, NULL) = 0
times({tms_utime=0, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 1889854136
gettimeofday({1313595287, 895995}, NULL) = 0
shutdown(38, 1 /* send */)              = -1 ENOTCONN (Transport endpoint is not connected)
close(38)                               = 0
read(5, "!", 1)                         = 1
close(37)                               = 0
munmap(0x7fc77fa49000, 2126320)         = 0
munmap(0x7fc77fc51000, 2200840)         = 0
gettimeofday({1313595287, 897317}, NULL) = 0
munmap(0x7fc77fe6b000, 2236240)         = 0
munmap(0x7fc7804ad000, 2150792)         = 0
munmap(0x7fc78008d000, 4323776)         = 0
munmap(0x7fc781f3d000, 2163144)         = 0
munmap(0x7fc781cd8000, 2508584)         = 0
munmap(0x7fc781aa6000, 2301656)         = 0
munmap(0x7fc781881000, 2248208)         = 0
munmap(0x7fc781424000, 2417800)         = 0
munmap(0x7fc781673000, 2152936)         = 0
munmap(0x7fc78120c000, 2193632)         = 0
munmap(0x7fc780f61000, 2796456)         = 0
munmap(0x7fc780ce9000, 2587424)         = 0
munmap(0x7fc780ace000, 2204640)         = 0
munmap(0x7fc7808be000, 2161864)         = 0
munmap(0x7fc7806bb000, 2109040)         = 0
munmap(0x7fc77f81e000, 2272040)         = 0
munmap(0x7fc782add000, 323584)          = 0
munmap(0x7fc786b6d000, 8192)            = 0
munmap(0x7fc78445e000, 8192)            = 0
munmap(0x7fc784460000, 8192)            = 0
munmap(0x7fc784458000, 8192)            = 0
munmap(0x7fc78445a000, 8192)            = 0
munmap(0x7fc784454000, 8192)            = 0
munmap(0x7fc78445c000, 8192)            = 0
munmap(0x7fc784456000, 8192)            = 0
close(6)                                = 0
close(5)                                = 0
exit_group(0)                           = ?


Thanks!

Roberto


-- System Information:
Debian Release: 6.0.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'testing'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash





More information about the Virtual-pkg-base-maintainers mailing list