Consolidating snapshot fails after third party backup succeeds
search cancel

Consolidating snapshot fails after third party backup succeeds

book

Article ID: 341620

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • Upon third party backup operation, delta files are not consolidated successfully.
  • A backup is completed successfully, however the consolidation is not performed correctly due to file locks.
  • In the vmware.log file, you see entries similar to:

    [...]
    2018-01-19T04:01:06.422Z| vmx| I125: VigorTransportProcessClientPayload: opID=26c5366f-b0-3630 seq=45158: Receiving Snapshot.Take request.
    2018-01-19T04:01:06.423Z| vmx| I125: SnapshotVMX_TakeSnapshot start: '__GX_BACKUP__', deviceState=0, lazy=0, quiesced=1, forceNative=0, tryNative=1, saveAllocMaps=0 cb=11754C0A20, cbData=1176CC76D0
    [...]

    (The backup is performed)

    [...]
    2018-01-19T06:28:07.959Z| vmx| I125: VigorTransportProcessClientPayload: opID=3aa3a556-b1-8415 seq=54103: Receiving Snapshot.Consolidate request.
    [...]
    2018-01-19T06:30:34.294Z| vcpu-0| I125: DISKLIB-LIB_BLOCKTRACK : Resuming change tracking.
    2018-01-19T06:30:34.301Z| vcpu-0| I125: DISKLIB-CBT : Initializing ESX kernel change tracking for fid 625481039.
    2018-01-19T06:30:34.301Z| vcpu-0| I125: DISKLIB-CBT : Successfuly created cbt node 2548154f-cbt.
    2018-01-19T06:30:34.301Z| vcpu-0| I125: DISKLIB-CBT : Opening cbt node /vmfs/devices/cbt/2548154f-cbt
    2018-01-19T06:30:34.302Z| vcpu-0| I125: DISKLIB-LIB : Opened "/vmfs/volumes/5a54e1ec-1d176125-fdda-a0369fe304f0/VM/VM_2.vmdk" (flags 0x8, type vmfsSparse).
    2018-01-19T06:30:34.303Z| vcpu-0| I125: AIOGNRC: Failed to open '/vmfs/volumes/5a54e1ec-1d176125-fdda-a0369fe304f0/VM/VM_2-flat.vmdk' : Failed to lock the file (40003) (0x2013).
    2018-01-19T06:30:34.303Z| vcpu-0| I125: AIOMGR: AIOMgr_OpenWithRetry: Descriptor file '/vmfs/volumes/5a54e1ec-1d176125-fdda-a0369fe304f0/VM/VM_2-flat.vmdk' locked (try 0)
    2018-01-19T06:30:34.605Z| vcpu-0| I125: AIOGNRC: Failed to open '/vmfs/volumes/5a54e1ec-1d176125-fdda-a0369fe304f0/VM/VM_2-flat.vmdk' : Failed to lock the file (40003) (0x2013).
    2018-01-19T06:30:34.605Z| vcpu-0| I125: AIOMGR: AIOMgr_OpenWithRetry: Descriptor file '/vmfs/volumes/5a54e1ec-1d176125-fdda-a0369fe304f0/VM/VM_2-flat.vmdk' locked (try 1)
    2018-01-19T06:30:34.907Z| vcpu-0| I125: AIOGNRC: Failed to open '/vmfs/volumes/5a54e1ec-1d176125-fdda-a0369fe304f0/VM/VM_2-flat.vmdk' : Failed to lock the file (40003) (0x2013).
    2018-01-19T06:30:34.907Z| vcpu-0| I125: AIOMGR: AIOMgr_OpenWithRetry: Descriptor file '/vmfs/volumes/5a54e1ec-1d176125-fdda-a0369fe304f0/VM/VM_2-flat.vmdk' locked (try 2)
    2018-01-19T06:30:35.208Z| vcpu-0| I125: AIOGNRC: Failed to open '/vmfs/volumes/5a54e1ec-1d176125-fdda-a0369fe304f0/VM/VM_2-flat.vmdk' : Failed to lock the file (40003) (0x2013).
    2018-01-19T06:30:35.208Z| vcpu-0| I125: AIOMGR: AIOMgr_OpenWithRetry: Descriptor file '/vmfs/volumes/5a54e1ec-1d176125-fdda-a0369fe304f0/VM/VM_2-flat.vmdk' locked (try 3)

    [...]
    2018-01-19T06:30:35.813Z| vcpu-0| I125: SNAPSHOT:Failed to open disk /vmfs/volumes/5a54e1ec-1d176125-fdda-a0369fe304f0/VM/VM_2.vmdk : Failed to lock the file (16392)
    2018-01-19T06:30:35.816Z| vcpu-0| I125: DISKLIB-CBT : Shutting down change tracking for untracked fid 625481039.
    2018-01-19T06:30:35.817Z| vcpu-0| I125: DISKLIB-CBT : Successfully disconnected CBT node.

    [...]
    2018-01-19T06:30:35.929Z| vcpu-0| I125: ConsolidateEnd: Snapshot consolidate complete: Failed to lock the file (5).
    [...]


    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.


Environment

VMware vSphere ESXi 6.5
VMware vSphere ESXi 5.5
VMware vSphere ESXi 6.7
VMware vSphere ESXi 6.0

Cause

When a backup is initiated, the third party backup provider creates a snapshot of the target VM. During a backup, opening a disk might take a longer time than the third party backup provider default timeout. When this happens, the third party backup provider considers the operation as a failure and as a consequence it tries to delete the snapshots. For example, a consolidation operation is initiated. The consolidation task will be started on the same disks which the third party backup provider attempted to open and caused the time out. Since the disks are still open, the consolidation will also fail, leaving stale delta files behind. The reason for the initial timeout and also why the disks are still in open state, is because when taking a backup of a VMDK file using SAN transport mode, a "Map Disk Region" call is initiated by VDDK to obtain the mapping of the disk. The use of SAN mode backups requires building up an entire mapping of the disk in-memory before allowing any I/O. This can take a very long time for large and/or fragmented disks.
 

Resolution

This is an known issue affecting the legacy SAN Transport Virtual Disk API Method utilized by 3rd Party Backup Software.

Currently, there is no resolution.

Workaround:
To work around this issue, use NBD backup.

Note: NDB backup may be faster overall as there is no need to build the VMDK block mapping. For more information, see the Virtual Disk Transport Methods section of the vSphere API/SDK Documentation.

Additional Information

Troubleshooting virtual machine snapshot descriptor problems

Impact/Risks:
Snapshots are not consolidated after a backup completes.