TSM backup of a live Windows VM not working due to the VMware Tools failing to quiesce its filesystem

System: VMware ESXi 5.5

- last updated on -

About: The document is meant to record an issue where the backup of an ESXi 5.5 based Windows VM to the Tivoli Storage Manager fails due the VMware Tools package that is deployed on the VM's OS not managing to quiesce its filesystem, which is a requirement in this scenario because TSM uses VMware's filesystem quiesced snapshotting mechanism to ensure the consistency of the data to be backed up.

What has been observed was that a long-standing VMware Tools installation will appear to eventually "break down" and have problems communicating with the underlying hypervisor, which in turn causes the filesystem quiescing feature to not work, the net result of this being that TSM-based backups of live VMs will simply not work. The fix, in this context, is to perform a clean reinstallation of the affected VMware Tools package.

Symptoms: The below fix should be implemented if the VMware Tools are found to be behaving erratically, as evidenced by two types of behaviour: the Tools cannot be properly restarted, as their service refuses to obey Windows' Service Control Manager, and/or the TSM backups of the hosting VM break due to a failure on the part of the VMware Tools to quiesce the OS' filesystem. For the purposes of exemplifying this sort of behaviour, the below sections have been included in this document: one shows the outcome of a backup attempt from TSM's point of view, the other shows the same outcome from the point of view of the hypervisor running the VM.

TSM interactive backup
tsm> backup vm [name redacted] -vmbackuptype=fullvm -mode=IFIncremental
Full BACKUP VM of virtual machines '[name redacted]'.


Backup VM command started.  Total number of virtual machines to process: 1

Creating "VMware Tools" snapshot for virtual machine '[name redacted]'
Starting Full VM backup of VMware Virtual Machine '[name redacted]'
    mode:	'Incremental Forever - Incremental'
    target node name:	'[name redacted]'
    data mover node name:	'[name redacted]'
    application protection type: 'VMware'
    application(s) protected:    'n/a'
ANS9365E VMware vStorage API error for virtual machine '[name redacted]'.
  TSM function name : CreateSnapshot_USCORETask
  TSM file          : ....\common\vm\vmvisdk.cpp (6498)
  API return code   : 1390
  API error message : An error occurred while saving the snapshot: Failed to qu
  iesce the virtual machine.
ANS4066I Snapshot operation attempt 1 of 2 for the guest virtual machine '[name 
redacted]' failed using "VMware Tools" snapshot.
Reattempting snapshot with "VMware Tools" snapshot.

Creating "VMware Tools" snapshot for virtual machine '[name redacted]'
Starting Full VM backup of VMware Virtual Machine '[name redacted]'
    mode:	'Incremental Forever - Incremental'
    target node name:	'[name redacted]'
    data mover node name:	'[name redacted]'
    application protection type:	'VMware'
    application(s) protected:	'n/a'
ANS9365E VMware vStorage API error for virtual machine '[name redacted]'.
  TSM function name : CreateSnapshot_USCORETask
  TSM file          : ....\common\vm\vmvisdk.cpp (6498)
  API return code   : 1390
  API error message : An error occurred while saving the snapshot: Failed to qu
  iesce the virtual machine.


ANS4067E Snapshot operation attempt 2 of 2 for the guest virtual machine '[name 
redacted]' failed using "VMware Tools" snapshot.
Backup failed.
ANS1228E Sending of object '[name redacted]' failed.
ANS5226E The virtual machine backup operation failed.

ANS4174E Full VM backup of VMware Virtual Machine '[name redacted]' failed with RC=43
79 mode=Incremental Forever - Incremental, target node name='[name redacted]', data
mover node name='[name redacted]'
Statistics for Virtual Machine '[name redacted]'.

Total number of objects inspected:            1
Total number of objects backed up:            0
Total number of objects updated:              0
Total number of objects rebound:              0
Total number of objects deleted:              0
Total number of objects expired:              0
Total number of objects failed:               1
Total number of objects skipped:              0
Total number of objects encrypted:            0
Total number of subfile objects:              0
Total number of objects grew:                 0
Total number of retries:                      0
Total number of bytes inspected:              0  B
Total number of bytes transferred:            0  B
Data transfer time:                        0.00 sec
Network data transfer rate:                0.00 KB/sec
Aggregate data transfer rate:              0.00 KB/sec
Objects compressed by:                        0%
Total data reduction ratio:                0,00%
Subfile objects reduced by:                   0%
Elapsed processing time:               00:00:03  

Backup VM command complete
Total number of virtual machines backed up successfully: 0
Total number of virtual machines failed: 1
  virtual machine [name redacted]
Total number of virtual machines skipped: 0
Total number of virtual machines processed: 1

VM logfile on ESXi
2016-10-10T13:10:27.849Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-10-10T13:10:41.358Z| vcpu-0| I120: Guest: toolbox: Version: build-3000743
2016-10-10T13:10:41.358Z| vcpu-0| I120: GuestRpc: Channel 4, Timeout
2016-10-10T13:10:41.358Z| vcpu-0| I120: GuestRpc: Reinitializing Channel 4(toolbox)
2016-10-10T13:10:41.358Z| vcpu-0| I120: GuestMsg: Channel 4, Cannot unpost because the previous post is already completed
2016-10-10T13:10:41.363Z| vcpu-0| I120: GuestRpc: Channel 5, guest application toolbox.
2016-10-10T13:10:41.365Z| vcpu-0| I120: TOOLS autoupgrade protocol version 2
2016-10-10T13:10:41.365Z| vcpu-0| I120: Vix: [19955035 mainDispatch.c:3884]: VMAutomationReportPowerStateChange: Reporting power state change (opcode=2, err=0).
2016-10-10T13:10:41.368Z| vcpu-0| I120: TOOLS Received tools.set.version rpc call, version = 10240.
2016-10-10T13:10:41.368Z| vcpu-0| I120: ToolsSetVersionWork did nothing; new tools version (10240) matches old Tools version
2016-10-10T13:10:42.851Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2016-10-10T13:10:57.853Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox-dnd timed out.
2016-10-10T13:10:57.853Z| vmx| I120: GuestRpc: app toolbox-dnd's second ping timeout; assuming app is down
2016-10-10T13:10:57.854Z| vmx| I120: GuestRpc: Reinitializing Channel 1(toolbox-dnd)
2016-10-10T13:10:57.854Z| vmx| I120: GuestMsg: Channel 1, Cannot unpost because the previous post is already completed
2016-10-10T13:11:00.404Z| vmx| I120: SnapshotVMX_TakeSnapshot start: 'TSM-VM Snapshot Mon Oct 10 15:11:02 2016', deviceState=0, lazy=0, logging=0, quiesced=1, forceNative=0, tryNative=1, sibling=0 saveAllocMaps=0 cb=4717C40, cbData=32586860
2016-10-10T13:11:12.857Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-10-10T13:11:21.173Z| vmx| I120: Tools: Tools heartbeat timeout.
2016-10-10T13:11:27.858Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-10-10T13:11:30.410Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.
2016-10-10T13:11:30.410Z| vmx| I120: Msg_Post: Warning
2016-10-10T13:11:30.410Z| vmx| I120: [msg.snapshot.quiesce.rpc_timeout] A timeout occurred while communicating with VMware Tools in the virtual machine.
2016-10-10T13:11:30.410Z| vmx| I120: ----------------------------------------
2016-10-10T13:11:30.415Z| vmx| I120: ToolsBackup: changing quiesce state: IDLE -> DONE
2016-10-10T13:11:30.415Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'TSM-VM Snapshot Mon Oct 10 15:11:02 2016': 0
2016-10-10T13:11:30.415Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine (40).
2016-10-10T13:11:32.790Z| vmx| I120: SnapshotVMX_TakeSnapshot start: 'TSM-VM Snapshot Mon Oct 10 15:11:34 2016', deviceState=0, lazy=0, logging=0, quiesced=1, forceNative=0, tryNative=1, sibling=0 saveAllocMaps=0 cb=4717C40, cbData=325CE4C0
2016-10-10T13:11:32.792Z| vmx| I120: Msg_Post: Warning
2016-10-10T13:11:32.792Z| vmx| I120: [msg.snapshot.quiesce.notools] Cannot quiesce this virtual machine because VMware Tools is not currently available.
2016-10-10T13:11:32.792Z| vmx| I120: ----------------------------------------
2016-10-10T13:11:32.795Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'TSM-VM Snapshot Mon Oct 10 15:11:34 2016': 0
2016-10-10T13:11:32.795Z| vmx| I120: SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine (40).
2016-10-10T13:11:42.861Z| vmx| I120: GuestRpcSendTimedOut: message to toolbox timed out.

Resolution:

References:

  1. https://kb.vmware.com/s/article/1012693
  2. https://communities.vmware.com/thread/408657