"java.lang.OutOfMemoryError: Java heap space" error causes the Cloud Director services to fail continuously after startup
search cancel

"java.lang.OutOfMemoryError: Java heap space" error causes the Cloud Director services to fail continuously after startup

book

Article ID: 325531

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

Symptoms:
  • Cloud Director vmware-vcd service is failing and the Cloud Director UI is unavailable.
  • The /opt/vmware/vcloud-director/logs/cell-runtime.log on Cloud Director cells show errors similar to the following:
FATAL    | ... | UncaughtExceptionHandlerStartupAction | Uncaught Exception. Originating thread: Thread[...]. Message: Java heap space |
java.lang.OutOfMemoryError: Java heap space
  • The /opt/vmware/vcloud-director/logs/cell.log on Cloud Director cells show errors similar to the following:
Cell startup completed in 1m 25s
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /opt/vmware/vcloud-director/logs/java_pid<PID>.hprof ...
Uncaught Exception. Originating thread: Thread[auto-pool-Scheduled Service-15-thread-1,5,main]. Message: Java heap space
Dump file is incomplete: No space left on device
  • The Cloud Director database contains a large number of entries (greater than 100k) in the jobs table with an operation type of BEHAVIOR_INVOCATION. This can be checked using an SQL query against the Cloud Director database like the following:
SELECT COUNT(*) FROM jobs WHERE operation = 'BEHAVIOR_INVOCATION';
  • Java .hprof files are generated and present in the /opt/vmware/vcloud-director/logs/ directory of the Cloud Director cells.
  • If the Cloud Director cells continue to experience the issue the cells' root / directory is 100% full due to the presence of a large number of .hprof files.
  • A solution that performs operations on Runtime Defined Entities in Cloud Director, for example Container Service Extension server version 4.1 or later, is making API calls to the Cloud Director instance.
  • Stopping the solution that performs operations on Runtime Defined Entities in Cloud Director, for example Container Service Extension server version 4.1 or later, allows Cloud Director services to startup without issue.


Environment

VMware Cloud Director 10.x

Cause

This is a known issue affecting Cloud Director 10.4 and later.
This issue can occur when the resolve operation is invoked on a Runtime Defined Entity (RDE) which has a large number of tasks associated with it.

Resolution

This is a known issue in Cloud Director 10.5 and 10.4.x.
This issue is currently set for resolution in the upcoming releases of Cloud Director 10.5.x and 10.4.x.

Workaround:

Temporarily stop any solutions that perform operations on Runtime Defined Entities in Cloud Director, for example Container Service Extension server version 4.1 or later. This will stop the Cloud Director service from continually failing.

If the Cloud Director cells' root / directory has become 100% full due to the presence of a large number of .hprof files then delete them to free up space. These .hprof files will be present in the /opt/vmware/vcloud-director/logs/ directory of the Cloud Director cells.

Before restarting any solutions that perform operations on Runtime Defined Entities in Cloud Director, for example Container Service Extension server version 4.1 or later, the total number of specific associated task entries must be reduced to stop the Cloud Director services failing.
These Database modifications will require VMware Technical Support assistance, please open a Support Request with VMware Technical Support and reference this KB article 95464.


Additional Information

Impact/Risks:

Cloud Director vmware-vcd service failing can make the Cloud Director UI unavailable.

The Cloud Director cells' root / directory can become 100% full due to the presence of a large number of .hprof files.