The VMware Knowledge Base provides support solutions, error messages and troubleshooting guides
Best practices when using advanced transport for backup and restore (1035096)
This article is intended for developers of backup and restore software. Data-protection customers should query their backup providers about the issues in this article.
For array-based storage, SAN transport is often the best performing choice for backups when running on a physical proxy. It is disabled inside virtual machines, so use SCSI HotAdd instead on a virtual proxy.
SAN transport is not always the best choice for restores. It offers the best performance on thick disks, but the worst performance on thin disks, because of round trips through the disk manager APIs, AllocateBlock and ClearLazyZero. For thin disk restore, NBDSSL is usually faster, and NBD is even faster. Changed Block Tracking (CBT) must be disabled for SAN restores. Also, SAN transport does not support writing to redo logs (snapshots or child disks), only to base disks.
When writing using SAN mode during restore, disk size should be a multiple of the underlying VMFS block size, otherwise writes to the last fraction of a disk will fail. For example, if a datastore has 1MB block size and the virtual disk is 16.3MB large, the last 0.3MB will not get written and will fail with the Invalid Argument error. Your software must add 0.7MB of zeroes to complete the block. This caveat does not apply to eager-zeroed thick disk.
Programs that open a local virtual disk in SAN mode might be able to read (if the disk is empty) but writing will throw an error. Even if programs call VixDiskLib_ConnextEx() with NULL parameter to accept the default transport mode, SAN is selected as the preferred mode if SAN storage is connected to the ESXi host. VixDiskLib should, but does not, check SAN accessibility on open. With local disk, programs must explicitly request NBD or NBDSSL mode.
For a Windows Server 2008 proxy, set SAN policy to onlineAll. Set SAN disk to read‐only except for restore. You can use the diskpart utility to clear the read-only flag. SAN policy varies by Windows Server 2008 edition. For Enterprise and Datacenter editions, the default SAN policy is offline, which is not required when vSphere mediates SAN storage. For more information see: Upgrading virtual hardware in ESX 4.x may cause Windows 2008 disks to go offline (1013109)
Best practices for HotAdd transport
With VMFS-3, deploy the proxy on volumes capable of large block size, so that the proxy can back up and restore very large virtual disks. The block size of the proxy's datastore should match the block size of the backed-up disk's datastore. VMFS-5 has a unified file block size and can always handle volumes up to about 60TB (see blog).
A redo log is created for HotAdded disks. Do not remove the target virtual machine (the one being backed up) while HotAdded disk is still attached. If removed, HotAdd fails to properly clean up redo logs so virtual disks must be removed manually from the backup appliance. Also, do not remove the snapshot until after cleanup. Removing it could result in an unconsolidated redo log.
HotAdd is a SCSI feature and does not work for IDE disks. The paravirtual SCSI controller (PVSCSI) is not supported for HotAdd; use the LSI controller instead.
Removing all disks on a controller with the vSphere Client also removes the controller. You might want to include some checks in your code to detect this in your appliance, and reconfigure to add controllers back in.
Virtual disk created on Windows by HotAdd backup or restore might have a different disk signature than the original virtual disk. You can work around this problem by rereading or rewriting the first disk sector in NBD mode.
HotAdded disks should be released with VixDiskLib_Cleanup() before snapshot delete. Cleanup might cause improper removal of the change tracking (ctk) file, but you can fix this by power cycling the virtual machine.
Customers running a Windows Server 2008 proxy on SAN storage should set SAN policy to onlineAll (see note about SAN policy above).
Best practices for NBDSSL transport
Before ESXi 5.0 there were no default network file copy (NFC) timeouts. Default NFC timeout values may change in future releases. VMware recommends that you specify default NFC timeouts in the VixDiskLib configuration file. If you do not specify a timeout, older versions of ESX/ESXi hold the corresponding disk open indefinitely, until vpxa or hostd is restarted. However with a timeout, you might need to perform some “keepalive” operation to prevent the disk from being closed on the server side. Reading block 0 periodically is a good keepalive operation.
As a starting point, recommended settings are 3 minutes for Accept and Request, 1 minute for Read, 10 minutes for Write, and no timeouts (0) for nfcFssrvr and and nfcFssrvrWrite.
If too many NFC connections are made to an ESX/ESXi host, VMDK open fails and users see the error Failed to open NBD extent, NBD_ERR_GENERIC. For further details see: VDDK library returns the error: Failed to open NBD extent, NBD_ERR_GENERIC (1022543)
General backup and restore
For incremental backup of virtual disk, always enable changed block tracking (CBT) before the first snapshot. When doing full restores of virtual disk, disable CBT for the duration of the restore. File-based restores affect change tracking of course, but disabling CBT is optional, except for SAN transport restores. CBT must be disabled for SAN transport writes because the mechanism must account for thin-disk allocation and clear-lazy-zero operations.
Backup software should ignore independent disks (those not capable of snapshots). These virtual disks are unsuitable for backup. They throw an error if a snapshot is attempted on them.
DNS must be configured on the backup proxy, ESXi hosts, and vCenter Server. If it is not properly configured, nslookup errors will result during the FQDN name resolution of the proxy, vCenter, and ESXi host, no matter what transport mode is used. Backup proxy, vCenter Server, and all ESXi hosts under vCenter with access to the VMDK files, must be accessible to each other.
To back up thick disk, the proxy's datastore must have at least as much free space as the maximum configured disk size for the backed-up virtual machine. Thick disk takes up all its allocated size in the datastore. To preserve space you can choose thin-provisioned disk, which consumes only the space actually containing data.
If you do a full backup of lazy-zeroed thick disk with CBT not enabled, the software reads all sectors, converting data in empty (lazy-zero) sectors to actual zeros. Upon restore, this full backup will produce eager-zeroed thick disk. This is one reason why VMware recommends enabling CBT before the first snapshot.
Backup and restore of thin-provisioned disk
When applications perform random I/O or write to previously unallocated areas of thin-provisioned disk, subsequent backups can be larger than expected, even with CBT enabled. In some cases, disk defragmentation might help reduce the size of backups.
Thin-provisioned virtual disk is created on first write. So the first-time write to thin-provisioned disk involves extra overhead compared to thick disk, whether using NBD, NBDSSL, or HotAdd. This is due to block allocation overhead, not VDDK advanced transports. However once thin disk has been created, performance is similar to thick disk, as discussed in this performance study.
Request a Product Feature
To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.