Older Linux Kernels can hang/lockup on APD or Heavy Loads when using adapter vNVME
search cancel

Older Linux Kernels can hang/lockup on APD or Heavy Loads when using adapter vNVME

book

Article ID: 318038

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

The hang occurs when an ABORT command times out. Virtual Machine can be unresponsive or just the IO task may be unresponsive.

Detailed log from /var/log/kern.log on the Ubuntu 1804 VM:
Ubuntu 1804.1: Linux eng26-dhcp20 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Mar  2 14:25:00 Ubuntu1804Server kernel: [17427.420654] nvme nvme1: I/O 42 QID 2 timeout, aborting
Mar  2 14:25:00 Ubuntu1804Server kernel: [17430.423909] nvme nvme1: Abort status: 0x0
Mar  2 14:25:30 Ubuntu1804Server kernel: [17460.570298] nvme nvme1: I/O 42 QID 2 timeout, reset controller
Mar  2 14:26:01 Ubuntu1804Server kernel: [17491.415229] nvme nvme1: I/O 42 QID 2 timeout, disable controller
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.034980] INFO: task kworker/1:1H:304 blocked for more than 120 seconds.
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035003]       Not tainted 4.15.0-20-generic #21-Ubuntu
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035038] kworker/1:1H    D    0   304      2 0x80000000
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035045] Workqueue: kblockd blk_mq_timeout_work
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035046] Call Trace:
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035051]  __schedule+0x297/0x8b0
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035054]  ? vprintk_emit+0x333/0x3a0
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035055]  schedule+0x2c/0x80
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035057]  schedule_preempt_disabled+0xe/0x10
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035058]  __mutex_lock.isra.2+0x18c/0x4d0
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035061]  ? ttwu_do_wakeup+0x1e/0x150
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035062]  __mutex_lock_slowpath+0x13/0x20
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035063]  ? __mutex_lock_slowpath+0x13/0x20
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035064]  mutex_lock+0x2f/0x40
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035070]  nvme_start_freeze+0x21/0x50 [nvme_core]
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035072]  nvme_dev_disable+0x3f9/0x450 [nvme]
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035077]  ? __dev_printk+0x3c/0x80
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035077]  ? dev_warn+0x64/0x80
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035079]  nvme_timeout+0x2de/0x310 [nvme]
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035081]  ? __switch_to_asm+0x40/0x70
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035082]  ? __switch_to_asm+0x34/0x70
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035083]  ? __switch_to_asm+0x40/0x70
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035084]  blk_mq_rq_timed_out+0x32/0x70
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035085]  blk_mq_check_expired+0x53/0x60
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035086]  bt_iter+0x4d/0x60
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035087]  blk_mq_queue_tag_busy_iter+0x162/0x260
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035089]  ? blk_mq_rq_timed_out+0x70/0x70
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035090]  ? blk_mq_rq_timed_out+0x70/0x70
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035092]  blk_mq_timeout_work+0x60/0x160
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035094]  process_one_work+0x1de/0x410
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035095]  worker_thread+0x32/0x410
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035097]  kthread+0x121/0x140
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035098]  ? process_one_work+0x410/0x410
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035100]  ? kthread_create_worker_on_cpu+0x70/0x70
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035101]  ret_from_fork+0x35/0x40
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035110] INFO: task kworker/u4:2:2105 blocked for more than 120 seconds.
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035129]       Not tainted 4.15.0-20-generic #21-Ubuntu
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035162] kworker/u4:2    D    0  2105      2 0x80000000
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035165] Workqueue: nvme-wq nvme_reset_work [nvme]
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035165] Call Trace:
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035167]  __schedule+0x297/0x8b0
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035169]  schedule+0x2c/0x80
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035170]  blk_mq_freeze_queue_wait+0x4b/0xb0
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035172]  ? wait_woken+0x80/0x80
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035175]  nvme_wait_freeze+0x33/0x50 [nvme_core]
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035176]  nvme_reset_work+0xb71/0x12e9 [nvme]
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035177]  process_one_work+0x1de/0x410
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035178]  worker_thread+0x32/0x410
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035180]  kthread+0x121/0x140
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035181]  ? process_one_work+0x410/0x410
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035182]  ? kthread_create_worker_on_cpu+0x70/0x70
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035184]  ? do_syscall_64+0x73/0x130
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035186]  ? SyS_exit_group+0x14/0x20
Mar  2 14:28:30 Ubuntu1804Server kernel: [17641.035187]  ret_from_fork+0x35/0x40
(END)


=============================================================
We also see this on RHEL 8.2:
RHEL 8.3: Linux promb-gen-dhcp47 4.18.0-240.el8.x86_64 #1 SMP Wed Sep 23 05:13:10 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

Detailed log from /var/log/kern.log
2022-02-25T19:27:02.333739+01:00 maestro-xcat kernel: [ 788.623401] nvme nvme2: I/O 72 QID 2 timeout, aborting
2022-02-25T19:27:02.333747+01:00 maestro-xcat kernel: [ 788.623516] nvme nvme2: Abort status: 0x0
2022-02-25T19:27:32.541729+01:00 maestro-xcat kernel: [ 818.831071] nvme nvme2: I/O 72 QID 2 timeout, reset controller
2022-02-25T19:27:32.546740+01:00 maestro-xcat kernel: [ 818.836274] WARNING: CPU: 0 PID: 934 at kernel/irq/chip.c:210 irq_startup+0xde/0xf0
2022-02-25T19:27:32.546745+01:00 maestro-xcat kernel: [ 818.836275] Modules linked in: 8021q garp mrp stp llc nfsv3 nfs fscache intel_rapl_msr intel_rapl_common nfit libnvdimm nft_reject_ipv4 kvm_intel nf_reject_ipv4 nft_reject nf_tables_set kvm nf_tables irqbypass nfnetlink vmw_vsock_vmci_transport vsock crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vmw_balloon intel_rapl_perf pcspkr joydev vmw_vmci i2c_piix4 vfat fat nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sr_mod cdrom sg ata_generic crc32c_intel serio_raw nvme drm_kms_helper nvme_core syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ahci libahci ata_piix libata vmxnet3
2022-02-25T19:27:32.546746+01:00 maestro-xcat kernel: [ 818.836304] CPU: 0 PID: 934 Comm: kworker/0:1H Kdump: loaded Not tainted 4.18.0-193.70.1.el8_2.x86_64 #1
2022-02-25T19:27:32.546748+01:00 maestro-xcat kernel: [ 818.836305] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.18227214.B64.2106252220 06/25/2021
2022-02-25T19:27:32.546749+01:00 maestro-xcat kernel: [ 818.836309] Workqueue: kblockd blk_mq_timeout_work
2022-02-25T19:27:32.546750+01:00 maestro-xcat kernel: [ 818.836310] RIP: 0010:irq_startup+0xde/0xf0
2022-02-25T19:27:32.546750+01:00 maestro-xcat kernel: [ 818.836312] Code: 31 f6 4c 89 ef e8 12 2f 00 00 85 c0 75 20 48 89 ee 31 d2 4c 89 ef e8 e1 d7 ff ff 48 89 df e8 b9 fe ff ff 89 c5 e9 59 ff ff ff <0f> 0b eb b9 0f 0b eb b5 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
2022-02-25T19:27:32.546752+01:00 maestro-xcat kernel: [ 818.836313] RSP: 0018:ffffbe09c2c2bbd8 EFLAGS: 00010002
2022-02-25T19:27:32.546752+01:00 maestro-xcat kernel: [ 818.836314] RAX: 0000000000000080 RBX: ffff997ba7725200 RCX: 0000000000000080
2022-02-25T19:27:32.546754+01:00 maestro-xcat kernel: [ 818.836314] RDX: 0000000000000001 RSI: ffffffffb560d980 RDI: ffff997ba7ced960
2022-02-25T19:27:32.546754+01:00 maestro-xcat kernel: [ 818.836315] RBP: ffff997ba7ced960 R08: 0000000000000000 R09: ffff997ba5ea0920
2022-02-25T19:27:32.546755+01:00 maestro-xcat kernel: [ 818.836315] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
2022-02-25T19:27:32.546756+01:00 maestro-xcat kernel: [ 818.836316] R13: 0000000000000001 R14: 0000000000000000 R15: ffff997ba92e0f00
2022-02-25T19:27:32.546756+01:00 maestro-xcat kernel: [ 818.836316] FS: 0000000000000000(0000) GS:ffff997badc00000(0000) knlGS:0000000000000000
2022-02-25T19:27:32.546757+01:00 maestro-xcat kernel: [ 818.836317] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2022-02-25T19:27:32.546757+01:00 maestro-xcat kernel: [ 818.836317] CR2: 00007fdc243de000 CR3: 00000003baa0a002 CR4: 00000000007606f0
2022-02-25T19:27:32.546758+01:00 maestro-xcat kernel: [ 818.836334] PKRU: 55555554
2022-02-25T19:27:32.546759+01:00 maestro-xcat kernel: [ 818.836334] Call Trace:
2022-02-25T19:27:32.546759+01:00 maestro-xcat kernel: [ 818.836339] enable_irq+0x46/0x90
2022-02-25T19:27:32.546760+01:00 maestro-xcat kernel: [ 818.836343] nvme_poll_irqdisable+0x2b7/0x370 [nvme]
2022-02-25T19:27:32.546760+01:00 maestro-xcat kernel: [ 818.836345] __nvme_disable_io_queues+0x1b2/0x1f0 [nvme]
2022-02-25T19:27:32.546761+01:00 maestro-xcat kernel: [ 818.836346] ? nvme_del_queue_end+0x20/0x20 [nvme]
2022-02-25T19:27:32.546761+01:00 maestro-xcat kernel: [ 818.836347] nvme_dev_disable+0x1ff/0x220 [nvme]
2022-02-25T19:27:32.546762+01:00 maestro-xcat kernel: [ 818.836348] nvme_timeout+0x2ca/0x320 [nvme]
2022-02-25T19:27:32.546763+01:00 maestro-xcat kernel: [ 818.836351] ? __switch_to_asm+0x41/0x70
2022-02-25T19:27:32.546763+01:00 maestro-xcat kernel: [ 818.836352] ? __switch_to_asm+0x35/0x70
2022-02-25T19:27:32.546763+01:00 maestro-xcat kernel: [ 818.836353] ? __switch_to_asm+0x41/0x70
2022-02-25T19:27:32.546764+01:00 maestro-xcat kernel: [ 818.836353] ? __switch_to_asm+0x35/0x70
2022-02-25T19:27:32.546764+01:00 maestro-xcat kernel: [ 818.836354] ? __switch_to_asm+0x41/0x70
2022-02-25T19:27:32.546765+01:00 maestro-xcat kernel: [ 818.836354] ? __switch_to_asm+0x35/0x70
2022-02-25T19:27:32.546766+01:00 maestro-xcat kernel: [ 818.836355] ? __switch_to_asm+0x41/0x70
2022-02-25T19:27:32.546766+01:00 maestro-xcat kernel: [ 818.836356] ? __switch_to_asm+0x35/0x70
2022-02-25T19:27:32.546767+01:00 maestro-xcat kernel: [ 818.836356] ? __switch_to_asm+0x41/0x70
2022-02-25T19:27:32.546767+01:00 maestro-xcat kernel: [ 818.836357] blk_mq_check_expired+0x11f/0x140
2022-02-25T19:27:32.546768+01:00 maestro-xcat kernel: [ 818.836360] blk_mq_queue_tag_busy_iter+0x19a/0x300
2022-02-25T19:27:32.546768+01:00 maestro-xcat kernel: [ 818.836361] ? blk_poll+0x350/0x350
2022-02-25T19:27:32.546769+01:00 maestro-xcat kernel: [ 818.836362] ? __switch_to_asm+0x41/0x70
2022-02-25T19:27:32.546770+01:00 maestro-xcat kernel: [ 818.836363] ? blk_poll+0x350/0x350
2022-02-25T19:27:32.546770+01:00 maestro-xcat kernel: [ 818.836365] ? release_thread+0x10/0x30
2022-02-25T19:27:32.546770+01:00 maestro-xcat kernel: [ 818.836366] blk_mq_timeout_work+0x54/0x130
2022-02-25T19:27:32.546771+01:00 maestro-xcat kernel: [ 818.836368] process_one_work+0x1a7/0x3b0
2022-02-25T19:27:32.546772+01:00 maestro-xcat kernel: [ 818.836369] worker_thread+0x30/0x390
2022-02-25T19:27:32.546773+01:00 maestro-xcat kernel: [ 818.836370] ? create_worker+0x1a0/0x1a0
2022-02-25T19:27:32.546774+01:00 maestro-xcat kernel: [ 818.836372] kthread+0x112/0x130
2022-02-25T19:27:32.546774+01:00 maestro-xcat kernel: [ 818.836374] ? kthread_flush_work_fn+0x10/0x10
2022-02-25T19:27:32.546774+01:00 maestro-xcat kernel: [ 818.836375] ret_from_fork+0x1f/0x40
2022-02-25T19:27:32.546775+01:00 maestro-xcat kernel: [ 818.836376] ---[ end trace 6500b1afc30fe12d ]---


 


Environment

VMware vSphere ESXi 7.0.3

Cause

Bugs in the earlier Linux NVME Drivers when using the vNVME new Retry Handler, under APD or with heavy IO load conditions can cause hangs/lockups in the Linux Guest.
Its observed there is Hang/Lockup problem in 4.18.xx or 4.15.xx Linux kernel versions.

Resolution

The issue has been fixed in 7.0 P05 (u3f)


Additional Information

The vNVME retry handler was added in 70u3c to mitigate APD conditions. The idea was to allow NVME guests to retry constantly when an APD (or a temporary NO CONNECT) conditions occur. This is on par with the way a SCSI guest works. Windows/Windows Server does this for both NVME and SCSI disks. Linux NVME handlers on the other hand only do 5 retries (configurable) before failing any command. Without an NVME retry handler this only results in about 28 seconds of APD before failing. With the new retry handler we can extend this to about 6 minutes of APD cushion which is still not a great user experience. While testing the new Retry Handler we discovered some of the older Linux distributions had a bug in their timeout handler that causes a Virtual Machine to hang when an ABORT command times out requiring a power cycle for that Virtual Machine.

Because having a Virtual Machine lock up is so disruptive we have added an advanced config value to turn off the vNVME retry sub-system. The default for this config will be to turn off (disable) retries. Users that feel comfortable with Linux distributions that do not have the timeout bug, can enable the retry handler. In this case, we will recommend users increase the number of retries in the Linux NVME driver at the same time. Since Windows/Windows Server distributions do not have any issues with retries and will retry forever in an APD situation, users may want to enable the NVME retry handler as a cushion for APD conditions.