Linux VM fails with the error "kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!"
search cancel

Linux VM fails with the error "kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!"

book

Article ID: 316420

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Linux virtual machine fails with an error similar to:
    BUG_ON: kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!

     
  • Virtual machine is not responsive and you are unable to suspend
     
  • Halting the virtual machine vmx process does not help
     
  • In the /vmfs/volumes/datastore/virtual_machine/vmware.log file, you see entries similar to:

    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843003] ------------[ cut here ]------------
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <2>[84978.843130] kernel BUG at drivers/net/vmxnet3/vmxnet3_drv.c:1413!
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843167] invalid opcode: 0000 [#1] SMP
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843191] Modules linked in: vmw_vsock_vmci_transport(E) vsock(E) xt_conntrack(E) iptable_mangle(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) iptable_filter(E) ip_tables(E) xt_LOG(E) nf_conntrack(E) coretemp(E) hwmon(E) kvm_intel(E) kvm(E) irqbypass(E) mousedev(E) hid_generic(E) aesni_intel(E) vmw_balloon(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) evdev(E) cryptd(E) psmouse(E) usbhid(E) hid(E) nfit(E) intel_agp(E) vmw_vmci(E) battery(E) i2c_piix4(E) intel_gtt(E) acpi_cpufreq(E) tpm_tis(E) tpm_tis_core(E) tpm(E) ac(E) button(E) sch_fq_codel(E) crc32c_intel(E) uhci_hcd(E) ehci_pci(E) ehci_hcd(E) usbcore(E) usb_common(E) autofs4(E)
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843627] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E 4.9.13-2.ph2dev #1-photon
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843675] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843735] task: ffffffff81e0e500 task.stack: ffffffff81e00000
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843769] RIP: 0010:[<ffffffff816bc651>] [<ffffffff816bc651>] vmxnet3_rq_rx_complete+0x941/0xe70
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843826] RSP: 0018:ffff88013fc03e20 EFLAGS: 00010297
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843857] RAX: 0000000000000040 RBX: ffff880136e69440 RCX: ffff88012f31dc00
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843897] RDX: 000000000000000a RSI: 0000000000000040 RDI: 0000000000000002
    2017-03-21T19:14:44.521Z| vcpu-0| I125: Guest: <4>[84978.843937] RBP: ffff88013fc03e98 R08: 0000000000000030 R09: 0000000000000000
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.843977] R10: ffff880136e688c0 R11: 0000000000000000 R12: ffff8801386071d0
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844016] R13: ffff88013863b320 R14: 000000000000001d R15: ffff8801386482b8
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844057] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844102] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844134] CR2: 000055df51643548 CR3: 0000000136b7d000 CR4: 00000000000006f0
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844239] Stack:
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844255] ffffffff8179e8d0 ffff880136acf8e8 0000000000000000 ffff880136e688c0
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844304] ffff880136e69460 ffff88013fc00024 0000000000000002 0000000000000040
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844351] ffff880136e69528 0000000000000000 ffff880136e688c0 ffff880136e69460
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844399] Call Trace:
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844415] <irq>
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844433] [<ffffffff8179e8d0>] ? tcp_wfree+0x50/0xc0
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844465] [<ffffffff816bccc5>] vmxnet3_poll_rx_only+0x35/0xa0
    2017-03-21T19:14:44.522Z| vcpu-0| I125: Guest: <4>[84978.844502] [<ffffffff8174259b>] net_rx_action+0x20b/0x350
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.844538] [<ffffffff81077a88>] __do_softirq+0xe8/0x270
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.845698] [<ffffffff81077d71>] irq_exit+0xb1/0xc0
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.846850] [<ffffffff8102e38f>] do_IRQ+0x4f/0xd0
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.847974] [<ffffffff8184d5c2>] common_interrupt+0x82/0x82
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.849092] <eoi>
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.849109] [<ffffffff8184c4c6>] ? native_safe_halt+0x6/0x10
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.850241] [<ffffffff8184c209>] default_idle+0x19/0xd0
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.851357] [<ffffffff8103618a>] arch_cpu_idle+0xa/0x10
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.852443] [<ffffffff8184c61e>] default_idle_call+0x1e/0x30
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.853512] [<ffffffff810b1ac0>] cpu_startup_entry+0x1b0/0x220
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.854550] [<ffffffff8183fd52>] rest_init+0x72/0x80
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.855749] [<ffffffff81fe7f85>] start_kernel+0x44a/0x457
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.856741] [<ffffffff81fe7120>] ? early_idt_handler_array+0x120/0x120
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.857705] [<ffffffff81fe72d6>] x86_64_start_reservations+0x2a/0x2c
    2017-03-21T19:14:44.523Z| vcpu-0| I125: Guest: <4>[84978.858644] [<ffffffff81fe7413>] x86_64_start_kernel+0x13b/0x14a
    2017-03-21T19:14:44.524Z| vcpu-0| I125: Guest: <4>[84978.859563] Code: 53 fd ff ff 31 c0 4d 85 c9 0f 94 c0 e9 63 fd ff ff 48 8b 05 b2 b9 76 00 e9 2b fd ff ff 3b 93 78 01 00 00 0f 84 ff f7 ff ff 0f 0b <0f> 0b 31 c0 e9 0c fc ff ff 0f 0b 0f 0b 41 0f b6 45 04 66 85 c0
    2017-03-21T19:14:44.524Z| vcpu-0| I125: Guest: <1>[84978.862415] RIP [<ffffffff816bc651>] vmxnet3_rq_rx_complete+0x941/0xe70
    2017-03-21T19:14:44.524Z| vcpu-0| I125: Guest: <4>[84978.863349] RSP <ffff88013fc03e20>
    2017-03-21T19:14:44.524Z| vcpu-0| I125: Guest: <4>[84978.865880]

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware vSphere ESXi 6.5

Cause

This issue occurs due to a bug in VMXNET3 vNIC backend which is part of the vmkernel. This issue occurs if the following conditions are met:
  • Linux VM is running kernel >= 4.8
  • HW version of VM is >=13
  • ESXi version is 6.5
 

Resolution

This issue is resolved in VMware ESXi 6.5 U1, available at VMware Downloads.


Workaround:
To work around this issue if you do not want to upgrade, use any one of these options:
  • Add the vmxnet3.rev.30 = FALSE parameter in the vmx file of virtual machine:
    1. Power off the virtual machine
       
    2. Edit the vmx file and add the below parameter:

      vmxnet3.rev.30 = FALSE
* Editing files on an ESXi host using vi (1020302) - https://kb.vmware.com/s/article/1020302
  1. Power on the virtual machine
 
  • If you do not want to power off the virtual machine, disable the receive data ring for each VMXNET3 vNIC on the VM by running this command:

    ethtool -G ethX rx-mini 0
Note: Replace ethX with virtual machine's interface name.