Search the VMware Knowledge Base (KB)
View by Article ID

Extending an ESXi diagnostic coredump partition on a vSAN 6.5/6.6 node (2147881)

  • 2 Ratings

Purpose

The article provides steps to resize an existing core dump partition to account the vSAN node’s DRAM size and vSAN usage.

Cause

The default size for the coredump partition is 2.5 GiB which is approximately 2.7 GB.
 
During install, the ESXi installer creates a coredump partition on the boot device for vSphere 5.5 and above.
 
The size requirement for the ESXi coredump device scales with the size of the host DRAM and also usage of vSAN. 
 
Sometimes, installations using vSAN in particular with SD / USB boot media does not have a local datastore and thus require a larger coredump device/slot. 
 
However,  the default size of 2.56 GB suffices the host with 1 TB of DRAM not running vSAN, or a host with vSAN enabled with 512 GB of  DRAM and 250 GB of  SSDs in the Caching Tier configured.
 
For larger systems, using the default size results in truncated or partial core dumps as the default partition / slot size may not  accommodate a complete core dump of an ESXi host. For more information, see ESXi hosts with larger workloads may generate partial core dumps (2012362).
 
Note: Along with automatic creation of partition size of 2.5 GB on ESXi boot media during install, a core dump slot size gets  automatically configured  which is 100 MB by default.

Resolution

To address this for vSAN hosts with ESXi 6.5 installs  , attached to this KB is scripted method to safely allow an administrator to resize an existing core partition on a ESXI host with vSAN enabled.

The script will attempt to do the following:

Scan the host 
  • To check if host is running ESXi  6.5
  • To check if vSAN is enabled
  • For existing core dump partitions
  • To check if host is in maintenance mode
Modifying the host
  • The script will backup existing bootbanks
  • The script will extend the core dump partition where applicable
  • The script will resize the disk dump slot size
  • The host will need to be rebooted for changes to take effect
The script attached to this KB should be downloaded, extracted and copied to a vSAN enabled ESXi node , see VMware KB , https://kb.vmware.com/kb/1918 on how to copy files to an ESXi host using scp.
 
On each vSAN enabled node the script should be executed to modify the host boot disk and default coredump slot size
 
The script will automatically compute the correct size of a core dump partition, using the remaining free space on the boot drive  and apply the new configuration. In addition the script will resize the coredump slot size.
 
The calculation is based on vSAN cache tier size, number of diskgroups and DRAM configured to the host
 
In essence the following guidelines will be used:
  • Without vSAN enabled:

    For every 1 TB of DRAM there should be a coredump size partition of  2.5 GB

  • With vSAN enabled:

    In addition to the core dump size , the physical size of the size of caching tier SSD(s)  in GB will be used as the basis of calculation the additional core dump size requirements
    • Base requirement for vSAN is 3.981GB
    • For every 100GB cache tier, 0.181GB of space is required
    • Every disk group needs a base requirement of 1.32 GB
    • Data will be compressed by 75%
The formula for a single disk group on a vSAN node as follows:

requirementOnSSDSize = (((size of SSD in GB)/100 GB) * 0.181) + 1.32
requirement = base + (requirementOnSSDSize1 +  requirementOnSSDSize2 +  requirementOnSSDSize3 ...)
sizeOfCoredumpBasedOnDG = requirement * 0.25
sizeOfCoredumpBasedOnDRAM = 2.56 GB * size of DRAM in TB

Coredump = sizeOfCoredumpBasedOnDG + sizeOfCoredumpBasedOnDRAM

Note: "The formula for multiple disk-groups is dependent on cache tier size on a per disk group basis. In the event that there are different sizes for the cache tier disks. disk groups should be calculated individually. Please see Worked Example 3  below"

The formula for multiple uniformly sized diskgroups is as follows:
  
requirementOnSSDSize = (((size of SSD in GB)/100 GB) * 0.181)  + 1.32
requirement = base + (requirementOnSSDSize*number_of_DGs)
sizeOfCoredumpBasedOnDG = requirement * .25
sizeOfCoredumpBasedOnDRAM = 2.56 GB * size of DRAM in TB
Coredump = sizeOfCoredumpBasedOnDG + sizeOfCoredumpBasedOnDRAM

Worked Example 1: One Disk group and 1TB of DRAM, 600GB SSD used for cache tier

Calculate cache tier (requirementOnSSDSize = (((size of SSD in GB)/100 GB) * 0.181) + 1.32)

requirementOnSSDSize = (((600)/100 GB) * 0.181) + 1.32 = 2.406GB

add the base overhead (requirement =  base + requirementOnSSDSize)

requirement =  3.981+ 2.406 = 6.387GB

Apply compression  (sizeOfCoredumpBasedOnDG = requirement * 0.25) 

sizeOfCoredumpBasedOnDG = 6.387 * 0.25 = 1.597GB

Add DRAM overhead

sizeOfCoredumpBasedOnDRAM = 2.56GB * 1 TB
 
Coredump = sizeOfCoredumpBasedOnDG + sizeOfCoredumpBasedOnDRAM
CoreDump = 1.597GB + 2.56GB 
CoreDump =  4.157GB

Worked Example 2:  2 TB DRAM per host, 5 Disk groups, with 1600GB for cache tier SSDs (uniform sized cache tier devices)
 
requirementOnSSDSizeX = (((1600)/100 GB) * 0.181) + 1.32 = 4.216GB
requirement = base + (requirementOnSSDSize1 +  requirementOnSSDSize2 + .... + requirementOnSSDSize5)
requirement = 3.981 + (4.216 + 4.216 + 4.216 + 4.216 + 4.216)
requirement = 3.981 + 21.08 = 25.061GB
 
sizeOfCoredumpBasedOnDG = 25.061 GB * 0.25 = 6.265GB
sizeOfCoredumpBasedOnDRAM = 2.56 GB * 2 TB = 5.12GB
 
Coredump = sizeOfCoredumpBasedOnDG + sizeOfCoredumpBasedOnDRAM
Coredump = 6.265GB + 5.12GB 
Coredump = 11.385 GB

Worked Example 3:  1 TB DRAM per host, 2 Disk groups, (non-uniform cache tier devices):
1 disk group (DG1) with 400GB cache tier SSD, 1 disk group (DG2) with 600 GB cache tier 
 
This example differs to Example 2, as we cannot simply multiply the requirementOnSSDSize by number of DGs as each DG has a different cache tier disk.
 
We need to calculate the differently sized diskgroup overheads separately and add the base once.

DG1-requirementOnSSDSize1 = (((400)/100 GB) * 0.181) + 1.32 = 2.044 GB
DG2-requirementOnSSDSize2 = (((600)/100 GB) * 0.181) + 1.32 = 2.406 GB
 
requirement = base + (requirementOnSSDSize1 + requirementOnSSDSize2)
requirement = 3.981 + (2.044 + 2.406) = 8.431GB
sizeOfCoredumpBasedOnDG = 8.431 * 0.25 = 2.108GB
 
Coredump = sizeOfCoredumpBasedOnDG + sizeOfCoredumpBasedOnDRAM
Coredump = 2.108GB + 2.56GB
CoreDump = 4.668GB

Note: Please note that above worked examples are shown for illustrative purpose as the script will calculate the required space when run against a vSAN enabled node.

 
Procedure  

  1. Download and extract the "2147881_coredumpResize.zip" and copy the attached script “coredumpResize.py” to the relevant ESXi hosts.
  2. Place host in to maintenance mode with ensure accessibility.
  3. Enter the below command: 

    # python /var/tmp/coredumpResize.py

    Note: In above command, /var/tmp/ is the location where "coredumpResize.py" script is copied. If you have copied the script to another location,modify the above command accordingly.

    If the script detects that the host is not in maintenance mode, it will ask for confirmation to continue or not 

    For example :  “It is advised to place the host in Maintenance Mode to resize the coredump partition. Do you want to continue?[y/n]

  4. If an existing core partition is detected, the script will list the partitions on the ESXi boot device.

    Following coredump partitions are found on the host

    1 ) mpx.vmhba36:C0:T0:L0:7
    Suggested partition to resize: mpx.vmhba36:C0:T0:L0:7

  5. The Administrator  is prompted to select the appropriate device, in this case the Administrator will select 1) 

    Enter the number of coredump partition to be resized:[1,2..]1

  6. The script will now calculate how much the core partition should be resized to and confirm if the Administrator should continue or not

    The present coredump partition size is 0.11 GB while suggested coredump partition size is 4.46 GB (4.35GB larger than current partition).
    Do you want to resize the coredump partition?[y/n]y

  7. Once the Administrator selects y to continue the script will attempt to resize the partition table 

    The coredump partition is successfully resized from 0.11 GB to 4.46 GB. The ESXi host must be rebooted to finish the resizing operation.

  8. The host must be rebooted for the changes to take effect.

    The procedure must be repeated on all nodes in the cluster.

    If the size of the boot media does not have enough space to resize the partition, the script will warn the Administrator that there is not enough free space but will prompt the Administrator to use what  remaining free space is available.

    For example,

Error: Not enough free space for extending the coredump partition.
Present coredump partition size: 0.11 GB
Estimated size required for coredump partition: 4.46 GB
Estimated size of coredump partition after resize operation: 1.04 GB
Do you still want to resize to largest coredump possible which might still be undersized? (NOTE: This may impact the ability to get support.)[y/n]y


In this case the partition got resized to 1.04GB.

The coredump partition is successfully resized from 0.11 GB to 1.04 GB. The ESXi host must be rebooted to finish the resizing operation.

If there is no free space to expand the core dump partition The script will error with 

“Error: Not enough free space for extending the coredump partition.
Present coredump partition size: 1.04 GB
Estimated size required for coredump partition: 4.46 GB
Estimated size of coredump partition after resize operation:    
1.04 GB
ERROR: The resize operation on coredump partition failed.

The system will remain unchanged.

Note: VMware highly recommends to have an adequate sized coredump partition to ensure that , in the event of a diagnostic crash,  adequate space is available for storing coredumps in production environments 

If  the boot partitions do not have enough capacity, VMware recommends:
  1. Upgrading the capacity of the  ESXi boot device (or location of core dump partition) to ensure there is  adequate capacity to reflect the requirements of DRAM and vSAN.
  2. If applicable to your environment, use a VMFS volume to configure ESXI coredumps to file instead of a partition, see KB https://kb.vmware.com/kb/2077516
  3. If nether of above options are viable, another alternative is to use configure Network Dump Collector Service, see KB https://kb.vmware.com/kb/2002954

Impact/Risks

It is highly recommended to place the vSAN node in maintenance mode when re-partitioning a live boot device. A reboot will be required to complete the operation.

Additional Information

In some scenarios vSAN  might be enabled on a ESXi node without any Diskgroups configured,  while not recommended config, the base requirement of 3.981GB should be included in calculating the correct core dump size for a vSAN node without diskgroups.

See Also

Attachments

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 2 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 2 Ratings
Actions
KB: