[Virtual-pkg-base-maintainers] Bug#646917: base: Debian gets unresponsive on higher IO load - BUG: soft lockup - CPU#0 stuck for 61s

Paul Greggo hostingnuggets at gmail.com
Fri Oct 28 10:46:06 UTC 2011


Package: base
Severity: important


On higher IO load, for example by creating a new Xen VPS by issuing xen-create-image the system freezes and is totally responsive for a few seconds up to a minute. Even a ping to the server doesn't reply back but when finished the server is responsive again. Also I noticed when running applications which generate more IO such as a Java crawler the server hangs a big and ping to the server get quite slow although I am pinging from a LAN. Below is the full stack trace which got generated while issuing xen-create-image (extract from dmesg).

[930756.024114] BUG: soft lockup - CPU#0 stuck for 61s! [kblockd/0:22]
[930756.024114] Modules linked in: dm_snapshot nfs lockd fscache nfs_acl auth_rpcgss sunrpc xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables ipmi_si mpt2sas scsi_transport_sas mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu xen_evtchn xenfs bridge stp bonding ext2 loop snd_pcm radeon snd_timer ttm drm_kms_helper snd drm joydev i2c_algo_bit psmouse soundcore i2c_core dcdbas snd_page_alloc serio_raw evdev pcspkr video output e752x_edac rng_core processor shpchp edac_core pci_hotplug button acpi_processor ext4 mbcache jbd2 crc16 dm_mod usbhid hid sd_mod sg sr_mod cdrom crc_t10dif ata_generic megaraid_mbox pata_sil680 uhci_hcd floppy ata_piix thermal thermal_sys megaraid_mm libata scsi_mod ehci_hcd e1000 usbcore nls_base [last unloaded: ipmi_si]
[930756.024114] CPU 0:
[930756.024114] Modules linked in: dm_snapshot nfs lockd fscache nfs_acl auth_rpcgss sunrpc xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev iptable_filter ip_tables x_tables ipmi_si mpt2sas scsi_transport_sas mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu xen_evtchn xenfs bridge stp bonding ext2 loop snd_pcm radeon snd_timer ttm drm_kms_helper snd drm joydev i2c_algo_bit psmouse soundcore i2c_core dcdbas snd_page_alloc serio_raw evdev pcspkr video output e752x_edac rng_core processor shpchp edac_core pci_hotplug button acpi_processor ext4 mbcache jbd2 crc16 dm_mod usbhid hid sd_mod sg sr_mod cdrom crc_t10dif ata_generic megaraid_mbox pata_sil680 uhci_hcd floppy ata_piix thermal thermal_sys megaraid_mm libata scsi_mod ehci_hcd e1000 usbcore nls_base [last unloaded: ipmi_si]
[930756.024114] Pid: 22, comm: kblockd/0 Not tainted 2.6.32-5-xen-amd64 #1 PowerEdge 2850
[930756.024114] RIP: e030:[<ffffffff8100922a>]  [<ffffffff8100922a>] hypercall_page+0x22a/0x1001
[930756.024114] RSP: e02b:ffff88000350ecc8  EFLAGS: 00000246
[930756.024114] RAX: 0000000000040000 RBX: 0000000000000000 RCX: ffffffff8100922a
[930756.024114] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[930756.024114] RBP: ffff880002f4e000 R08: ffff88003ebe0690 R09: ffffffff8118aed0
[930756.024114] R10: 0000000000000000 R11: 0000000000000246 R12: ffff88000243f800
[930756.024114] R13: ffff880002bdc000 R14: ffff88000243f848 R15: ffff8800024369c0
[930756.024114] FS:  00007fed4f83e700(0000) GS:ffff88000350b000(0000) knlGS:0000000000000000
[930756.024114] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[930756.024114] CR2: 0000000001ac31d0 CR3: 000000003e46d000 CR4: 0000000000000660
[930756.024114] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[930756.024114] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[930756.024114] Call Trace:
[930756.024114]  <IRQ>  [<ffffffff8100ecf2>] ? check_events+0x12/0x20
[930756.024114]  [<ffffffff8100e635>] ? xen_force_evtchn_callback+0x9/0xa
[930756.024114]  [<ffffffff8100ecf2>] ? check_events+0x12/0x20
[930756.024114]  [<ffffffff8118aed0>] ? __cfq_slice_expired+0xc4/0xd4
[930756.024114]  [<ffffffff8100ec99>] ? xen_irq_enable_direct_end+0x0/0x7
[930756.024114]  [<ffffffffa006a2f0>] ? scsi_request_fn+0x41f/0x506 [scsi_mod]
[930756.024114]  [<ffffffff8117ff72>] ? __blk_run_queue+0x35/0x66
[930756.024114]  [<ffffffff81180047>] ? blk_run_queue+0x21/0x35
[930756.024114]  [<ffffffffa0069881>] ? scsi_run_queue+0x2ce/0x36f [scsi_mod]
[930756.024114]  [<ffffffffa006a5d0>] ? scsi_next_command+0x2d/0x39 [scsi_mod]
[930756.024114]  [<ffffffffa006b0da>] ? scsi_io_completion+0x3d1/0x3fa [scsi_mod]
[930756.024114]  [<ffffffff8130d55c>] ? _spin_lock_irqsave+0x15/0x34
[930756.024114]  [<ffffffff811840ea>] ? blk_done_softirq+0x6e/0x7b
[930756.024114]  [<ffffffff81054d1b>] ? __do_softirq+0xdd/0x1a6
[930756.024114]  [<ffffffff81012cac>] ? call_softirq+0x1c/0x30
[930756.024114]  [<ffffffff8101422b>] ? do_softirq+0x3f/0x7c
[930756.024114]  [<ffffffff81054b8b>] ? irq_exit+0x36/0x76
[930756.024114]  [<ffffffff811f28a5>] ? xen_evtchn_do_upcall+0x33/0x42
[930756.024114]  [<ffffffff81012cfe>] ? xen_do_hypervisor_callback+0x1e/0x30
[930756.024114]  <EOI>  [<ffffffff8100922a>] ? hypercall_page+0x22a/0x1001
[930756.024114]  [<ffffffff8100922a>] ? hypercall_page+0x22a/0x1001
[930756.024114]  [<ffffffff8100ecf2>] ? check_events+0x12/0x20
[930756.024114]  [<ffffffff8100e635>] ? xen_force_evtchn_callback+0x9/0xa
[930756.024114]  [<ffffffff8100ecf2>] ? check_events+0x12/0x20
[930756.024114]  [<ffffffff8100ec99>] ? xen_irq_enable_direct_end+0x0/0x7
[930756.024114]  [<ffffffffa006a38f>] ? scsi_request_fn+0x4be/0x506 [scsi_mod]
[930756.024114]  [<ffffffff8100ecf2>] ? check_events+0x12/0x20
[930756.024114]  [<ffffffff811800f6>] ? generic_unplug_device+0x21/0x34
[930756.024114]  [<ffffffff81062953>] ? worker_thread+0x188/0x21d
[930756.024114]  [<ffffffff811783ca>] ? blk_unplug_work+0x0/0x44
[930756.024114]  [<ffffffff81065f86>] ? autoremove_wake_function+0x0/0x2e
[930756.024114]  [<ffffffff810627cb>] ? worker_thread+0x0/0x21d
[930756.024114]  [<ffffffff81065cb9>] ? kthread+0x79/0x81
[930756.024114]  [<ffffffff81012baa>] ? child_rip+0xa/0x20
[930756.024114]  [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b
[930756.024114]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
[930756.024114]  [<ffffffff81012ba0>] ? child_rip+0x0/0x20

Just in case here is the output of "lspci":

00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 09)
00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 09)
00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 09)
00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1 (rev 09)
00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 09)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
01:00.0 PCI bridge: Intel Corporation 80332 [Dobson] I/O processor (A-Segment Bridge) (rev 06)
01:00.2 PCI bridge: Intel Corporation 80332 [Dobson] I/O processor (B-Segment Bridge) (rev 06)
02:0e.0 RAID bus controller: Dell PowerEdge Expandable RAID controller 4 (rev 06)
05:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
05:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
06:07.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05)
07:08.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05)
08:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09)
08:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09)
0b:05.0 Unassigned class [ff00]: Dell Remote Access Card 4 Daughter Card
0b:05.1 Unassigned class [ff00]: Dell Remote Access Card 4 Daughter Card Virtual UART
0b:05.2 Unassigned class [ff00]: Dell Remote Access Card 4 Daughter Card SMIC interface
0b:06.0 IDE interface: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02)
0b:0d.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE]

and the output of "/proc/cpu":

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 3
model name	: Intel(R) Xeon(TM) CPU 3.40GHz
stepping	: 4
cpu MHz		: 3391.656
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc pni est cid hypervisor
bogomips	: 6783.31
clflush size	: 64
cache_alignment	: 128
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 15
model		: 3
model name	: Intel(R) Xeon(TM) CPU 3.40GHz
stepping	: 4
cpu MHz		: 3391.656
cache size	: 1024 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc pni est cid hypervisor
bogomips	: 6783.31
clflush size	: 64
cache_alignment	: 128
address sizes	: 36 bits physical, 48 bits virtual
power management:

Please do let me know if you need more information, I am happy to provide information which might help.

Many thanks in advance.


-- System Information:
Debian Release: 6.0.3
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-xen-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash





More information about the Virtual-pkg-base-maintainers mailing list