vCenter Server Appliance /storage/log partition runs out of space due to VMware Analytics service dump files (analytics/java_pidxxxx.hprof)
search cancel

vCenter Server Appliance /storage/log partition runs out of space due to VMware Analytics service dump files (analytics/java_pidxxxx.hprof)

book

Article ID: 318173

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • The /storage/log partition is 99% full
  • /storage/log/vmware/analytics/ folder has a number of java hprof files 
  • The  analytics-runtime.log-0.stdout file shows an "OutOfMemoryError" java.lang.OutOfMemoryError: GC overhead limit exceeded
    Dumping heap to /var/log/vmware/analytics/java_pid60392.hprof ...
    Heap dump file created [203776370 bytes in 0.977 secs]
  • The analytics.log.1.gz shows the below ERROR and it is the only error being generated in the logs
2020-12-14T10:16:44.048+11:00 data-app-collector-vsphere.adc.6_7P02 ERROR collector.internal.cdf.mapping.SafeMappingWrapper Error while executing object mapping. Returning null.
org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Unexpected failure during bean definition parsing
Offending resource: class path resource [com/vmware/vim/binding/vim/context_v2.xml]
Bean 'context'; nested exception is java.lang.OutOfMemoryError: GC overhead limit exceeded

 
  • Deleting the hprof files  only provides temporary relief since they get created every now and then


Environment

VMware vCenter Server 6.7.x

Cause

The heap memory available to the Analytics service during weekly telemetry collection is insufficient leading to Out Of Memory (OOM) errors.

Resolution


A permanent solution is available in vSphere 7.0 and later versions.
For vSphere 6.7 and earlier, apply the work-around mentioned below.

Workaround:
Before proceeding, take an offline snapshot of vCenter.

The work-around has two sections:

1. Limit the number of hprof dumps to 4 to prevent the /storage/log filling up
2. Enable String De-duplication to enable efficient use of heap memory by 
the analytics service and 
prevent it from running out of memory
  1. Create a file called cleanup.sh in etc/vmware-analytics/ and copy the following script inside the file:
#!/bin/sh
 
# Delete old hprof files
HPROF_FILES_TO_REMOVE=`ls /storage/log/vmware/analytics/*.hprof -1t | tail -n +5`
rm -f $HPROF_FILES_TO_REMOVE
 
# Delete old gclog files
GCLOG_FILES_TO_REMOVE=`ls /storage/log/vmware/analytics/*-gc*.log.*[^.current] -1t | tail -n +10`
rm -f $GCLOG_FILES_TO_REMOVE
  1. Make the cleanup.sh file executable:
chmod +x cleanup.sh
  1. Add the following line to the StartCommandArgs of the analytics service in /etc/vmware/vmware-vmon/svcCfgfiles/analytics.json
"-XX:OnOutOfMemoryError='/etc/vmware-analytics/cleanup.sh'",
  1. Set the following JVM arguments in the same  StartCommandArgs section of the analytics service in /etc/vmware/vmware-vmon/svcCfgfiles/analytics.jsonfile to optimize heap memory usage in the analytics service:
"-XX:+UseG1GC",
"-XX:+UseStringDeduplication",
  1. Restart the analytics service:
vmon-cli -r analytics


Additional Information


How to clear space on VCSA /storage/log partition (83070)


vCenter Server Appliance 7.0.x /storage/log partition runs out of space due to VMware Analytics service log file (analytics-runtime.log.stderr) (85468)

Impact/Risks:
  • If the analytics service crashes frequently during collections customers may not be able to use Skyline Health to their benefit.
  •  VMware Analytics Cloud may not get the analytical data we expect from the customer's environment to provide proactive recommendations to prevent issues