Recovering an ESX host from GRUB prompt
search cancel

Recovering an ESX host from GRUB prompt

book

Article ID: 308598

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • ESX host stops booting at GRUB prompt. The screen displays grub> , and it seems to be waiting for commands
  • ESX host does not recover after the GRUB prompt
  • The /boot/grub/grub.conf file has incorrect entries


Environment

VMware ESX 4.0.x
VMware ESX 4.1.x
VMware ESX Server 3.0.x
VMware ESX Server 3.5.x

Resolution

Running the df -h command generates a view of the ESX host partition table similar to:

Filesystem Size Used Avail Use% Mounted on
/dev/sda2 4.9G 1.3G 3.4G 27% /
/dev/sda1 97M 26M 67M 29% /boot
none 132M 0 132M 0% /dev/shm
/dev/sda6 2.0G 33M 1.8G 2% /var/log

Your /boot partition is on the first disk's first partition, and / is on the second. /var/log might vary from 5th to 7th depending on configuration.

Note: There are three options for recovering your ESX host from this point. Each option is progressively more complex. Perform each option in the order provided, until your ESX host has recovered.

Option 1 – Running Commands in the GRUB Prompt

With this option, you run commands from the GRUB prompt to let it boot.
This option is simple (you only need to run a few commands, your ESX host boots normally, and you can edit your grub.conf file later), but y ou need to know the UUID of your / partition, which can be hard to find in many cases.
While you are in the GRUB prompt, you must specify four things to let grub to continue booting:
Note: You may want to change the kernel name and initrd name from this example depending on your ESX version.
  1. Type the location of the of the /boot directory and press Enter :

    root (hd0,0)

  2. Type the kernel name and the UUID and press Enter:

    For example:


    kernel /vmlinuz-2.4.21-47.0.1.ELvmnix ro root=UUID=932dad41-f43a-4a60-9257-198f026da80e


    Note: The UUID in this command is only an example. You need to find or have it from your system before you issue this command. You can sometimes use cat /grub/grub.conf to get the UUID.

  3. Type the initrd value and press Enter:

    For example:

    initrd /initrd-2.4.21-47.0.1.ELvmnix.img

  4. Run the boot command:

    boot

If you can not remember the names of kernel and initrd, press Tab after typing /. This gives you the possibilities. You can also use it to check if the filesystem is valid.

Run these commands after your ESX host is booted:
  • esxcfg-boot -p
  • esxcfg-boot -b
  • esxcfg-boot -r

Ensure your /boot/grub/grub.conf appears like:

title VMware ESX Server
root (hd0,0)
kernel --no-mem-option /vmlinuz-2.4.21-47.0.1.ELvmnix ro root=UUID=44fc4a1c-d5ac-4ce1-a9cb-74acab0e61e8 mem=272M
initrd /initrd-2.4.21-47.0.1.ELvmnix.img

Option 2 – Using your Live CD to Boot your ESX Host

Use a Live CD to boot your ESX host and fix the host from a chroot environment.
Unlike Option 1, you do not need to know the UUID of your / to recover. You can find the UUID as part of this option and continue to use option 1. However, more Linux commands are involved and you need to have the Live CD.

Note: You can also use a live CD to take the advantage of esxcfg-boot command.

Examples of Live Linux CDs:
  • Gentoo Live CD
  • Redhat rescue CD/DVD
  • Knoppix Live CD
  • PClinuxOS Live CD
  • Ubuntu Live CD
Run these commands after you boot your ESX host from the Live CD:
  • fdisk -l > lists the device names of your filesystems.

Note: Use this command to discover the device names of your filesystems as they may not be named /dev/sda. The proceeding commands are an example of what your filesystems could be named.

  • mkdir /mnt/esx
  • mount /dev/sda2 /mnt/esx
  • mount /dev/sda1 /mnt/esx/boot
  • mount /dev/sda6 /mnt/esx/var/log > You may need to use fdisk /dev/sda to find which one is your /var/log
  • chroot /mnt/esx
  • bash
  • touch /boot/grub/grub.conf > Only if there is no grub.conf file
  • esxcfg-boot -gr
  • vi /boot/grub/grub.conf > Correct problems of the file if needed
esxcfg-boot does not know /boot and / are 2 separated partitions in the live chroot environment.

You receive this information inside your /boot/grub/grub.conf :
title VMware ESX Server
root (hd0,1)‏
uppermem 277504
kernel --no-mem-option /boot/vmlinuz-2.4.21-47.0.1.ELvmnix ro root=UUID=44fc4a1c-d5ac-4ce1-a9cb-74acab0e61e8 mem=272M
initrd /boot/initrd-2.4.21-47.0.1.ELvmnix.img
You need to change root to the correct partition, and remove /boot from the kernel path:
title VMware ESX Server
root (hd0,0)‏
uppermem 277504
kernel --no-mem-option /vmlinuz-2.4.21-47.0.1.ELvmnix ro root=UUID=44fc4a1c-d5ac-4ce1-a9cb-74acab0e61e8 mem=272M
initrd /initrd-2.4.21-47.0.1.ELvmn

Option 3 – Using a Live CD to boot your ESX Host (Advanced)

Use a Live CD to boot your ESX host and fix the host from a chroot environment.
This option has more detail than Option 2 and covers repairing damaged stage files and manipulation of device.map , all without using esxcfg commands. However, you must be completely comfortable with your Linux command knowledge to perform this option. You also need to have the Live CD.
Caution: Be aware that this process uses more advanced Linux commands than Option 1 and Option 2. If you are not comfortable with Option 3, and your ESX host is still not recovering,
file a support request with VMware Support and note this KB Article ID in the problem description. For more information, see How to Submit a Support Request .
  1. Boot from the Rescue CD.
  2. List the device names of your filesystems by using the fdisk -l or df -h command.

  3. Check device names for / and /boot filesystems.

    For Example, Internal RAID /boot can be /dev/cciss/c0d0p1 and / can be /dev/cciss/c0d0p7

  4. Run this command to mount the / filesystem and chroot to it:
    • mkdir /mnt/root
    • mount /dev/cciss/c0d0p7 /mnt/root
    • chroot /mnt/root

  5. Run this command to mount /boot filesystem to /boot mountpoint:

    mount /dev/cciss/c0d0p1 /boot

  6. Ensure the /boot contains the kernel, initrd, grub/ subdir with stage* files, grub.conf and menu.lst , which is a symlink to grub.conf.

  7. You need to replace anything from step 5 that is missing. Run this command if any of the stage files are missing:

    cp /usr/share/grub/i386-redhat/* /boot/grub/

    You can copy all the files from /usr/share/grub/i386-redhat/ to /boot/grub/ .

    If grub.conf is missing, you have to create a new one or take a copy from another server.

    An example of /boot/grub/grub.conf is:

    vmware:configversion 1
    # grub.conf generated by anaconda
    #
    # Note that you do not have to rerun grub after making changes to this file
    # NOTICE: You have a /boot partition. This means that
    # all kernel and initrd paths are relative to /boot/, eg.
    # root (hd0,0)
    # kernel /vmlinuz-version ro root=/dev/sdc2
    # initrd /initrd-version.img
    #boot=/dev/sdc
    timeout=10
    default=0
    title VMware ESX Server
    #vmware:autogenerated esx
    root (hd0,0)
    uppermem 277504
    kernel --no-mem-option /vmlinuz-2.4.21-47.0.1.ELvmnix ro root=/dev/cciss/c0d0p7 mem=272M
    initrd /initrd-2.4.21-47.0.1.ELvmnix.img
    title VMware ESX Server (debug mode)
    #vmware:autogenerated esx
    root (hd0,0)
    uppermem 277504
    kernel --no-mem-option /vmlinuz-2.4.21-47.0.1.ELvmnix ro root=/dev/cciss/c0d0p7 mem=272M console=ttyS0,115200 console=tty0 debug
    initrd /initrd-2.4.21-47.0.1.ELvmnix.img-dbg
    title Service Console only (troubleshooting mode)
    #vmware:autogenerated esx
    root (hd0,0)
    uppermem 277504
    kernel --no-mem-option /vmlinuz-2.4.21-47.0.1.ELvmnix ro root=/dev/cciss/c0d0p7 mem=272M tblsht
    initrd /initrd-2.4.21-47.0.1.ELvmnix.img-sc

  8. If the server has multiple drives, LUNs, etc., it may be useful to create/edit a /boot/grub/device.map file with the following content:

    (hd0) /dev/cciss/c0d0p1


    Where the device name in /dev/ is the boot partition device. Usage of the device.map file significantly speeds up the process, as the GRUB does not have to autodetect devices.

  9. Run the /sbin/grub command if you are using device map file:

    /sbin/grub --device-map=/boot/grub/device.map

  10. Run this command in the GRUB shell:

    root (hd0,0)


  11. Run this command in the GRUB shell:

    setup --stage2=stage2 --prefix=/grub (hd0)


    Note: This is for setup, where /boot is (hd0) . If this does not work, try:

    setup (hd0)

  12. Run the quit command to exit the GRUB shell.
  13. Run this command:

    sync

  14. Reboot the server and remove the Rescue CD.


Additional Information

If booting does not move past the GRUB screen and you cannot use the grub shell, there may be an issue with the MBR not being properly written by GRUB.
To resolve this, after having the chroot environment, execute:
/sbin/grub-install /dev/sda
从 GRUB 提示符恢复 ESX 主机
GRUB プロンプトから ESX ホストをリカバリする