hbr_filter searches for a whole contiguous region in the transfer bitmap. This usually works well when the regions are small. When the regions are large enough (for ex. when full syncing large disk with checksumming disabled), iterating them may result in PSOD (because the disk lock is held for long time, this way exceeding the spin count of other contending cpu's)
Resolution
This issue is resolved in VMware vSphere ESXi 6.0 Patch ESXi600-201909001 , ESXi 6.5 U3 and ESXi 6.7 U3.
Workaround: To workaround follow the below steps Identify the VM which is part of the replication and notice the RDID for example
This is the replication ID of the disk: RDID-13d1285d-e660-4da9-8ffd-9e921a84ea2c The corresponding replication group ID: GID-4f1df3b0-16fc-4e66-bddd-01ccc688a8d9
You can find the VM by checking the replication configuration of the VMs on the host:
$ vim-cmd hbrsvc/vmreplica.getConfig <vmID>
where <vmID> can be obtained from the list of the registered VMs:
$ vim-cmd vmsvc/getallvms
The replication ID should match GID-4f1df3b0-16fc-4e66-bddd-01ccc688a8d9
Then stop the replication for this VM.
Additional Information
Impact/Risks: Stopping the replication of VM that caused the crash.