Search the VMware Knowledge Base (KB)
View by Article ID

Understanding an "Oops" purple diagnostic screen (1006802)

  • 7 Ratings

Purpose

This article describes the information displayed on a purple screen fault caused by the service console faulting (panic/oops). The purple diagnostic screen can include one or more of these messages:
  • Oops
  • CosPanic
  • COS Error
The purple diagnostic screen appears similar to:

 
If you encounter a purple diagnostic screen that does not match the symptoms above, see Interpreting an ESX host purple diagnostic screen (1004250).

Resolution

When a OOPS or Panic occurs in the service console of a VMware ESX host, a purple screen fault is generated. 
 
Note: If the Advanced Setting, Misc.PsodOnCosPanic is set to zero (0), a purple screen fault does not occur. Ensure this is not the case as the purple screen information is necessary to diagnose any issues the host is experiencing. Also, ensure that the Misc.CosCoreFile is set correctly so that a core dump for the service console is also generated.
 
The contents of the service console fault based purple screen contain two main components. The first component is the VMkernel purple screen output and the second is the service console Linux kernel output. For more information related to decoding a VMkernel purple screen, see Interpreting an ESX host purple diagnostic screen (1004250) .
 
The contents from this example are:
 
VMware ESX Server [Releasebuild-64607]
Oops
frame=0x1f16d34 ip=0xc022e995 cr2=0x100 cr3=0x13401000 cr4=0x6f0
es=0x68 ds=0xc02a0068 fs=0x0 gs=0x0
eax=0x0 ebx=0x0 ecx=0x1 edx=0x800
ebp=0x0 esi=0x0 edi=0xc03a7b20 err=0 eflags=0x0
*0:1024/console 1:1025/idle1 2:1026/idle2 3:1027/idle3
4:1028/idle4 5:1029/idle5 6:1030/idle6 7:1031/idle7
0x0:[0xc022e995]blk_dev+0xbd98d934 stack: 0x0, 0x0, 0x0
VMK uptime: 0:00:02:17.807 TSC: 343459198808
0:00:02:11.319 cpu0:1024)Host: 4781: COS Error: Oops
Starting coredump to disk Starting coredump to disk Dumping using slot 1 of 1... using slot 1 of 1... log
 
Stack trace from cos log:
<4>EIP:    0060:[<c022e995>]    Tainted: P
<4>EFLAGS: 00010246
<4>
<4>EIP is at sr_finish [kernel] 0xa5 (2.4.21-47.0.1.ELvmnix/i686)
<4>eax: 00000000   ebx: 00000000   ecx: 00000001   edx: 00000800
<4>esi: 00000000   edi: c03a7b20   ebp: 00000000   esp: c204fd70
<4>ds: 0068   cs: 0060   es: 0068   ss: 0068
<4>Process esxcfg-rescan (pid: 2997, stackpage=c204f000)
<4>Stack: c045ca80 00000400 00000000 00000000 80000000 00000282 00000001 c03a7ac0
<4>       c9fc6e00 c022ca10 00000003 c021cb94 c02d6fa8 00000000 00000000 00000004
<4>       c204fdd0 c204fdd4 c204fdd8 c8c4ae00 c204fddc 00000000 00000004 c204fddc
<4>Call Trace:   [<c022ca10>] sd_attach [kernel] 0x0 (0xc204fd94)
<4>[<c021cb94>] scan_scsis [kernel] 0x3d4 (0xc204fd9c)
<4>[<c0123b79>] printk [kernel] 0x149 (0xc204feb8)
<4>[<c0212b44>] proc_scsi_gen_write [kernel] 0x624 (0xc204feec)
<4>[<c0168ffe>] locate_fd [kernel] 0xae (0xc204ff40)
<4>[<c0180130>] proc_file_write [kernel] 0x40 (0xc204ff80)
<4>[<c0158a73>] sys_write [kernel] 0xa3 (0xc204ff94)
<4>[<c02a406f>] no_timing [kernel] 0x7 (0xc204ffc0)
<4>[<c02a002b>] zlib_tr_flush_block [kernel] 0x3b (0xc204ffe0)
<4>
<4>Code: 89 90 00 01 00 00 a1 80 9f 4b c0 80 4c 18 12 01 a1 80 9f 4b
<4>
<4>
<4>dell_rbu  0xd2188060 -s .data 0xd2189dcc -s .bss 0xd2189e00
<4>ppdev     0xd2185060 -s .data 0xd2186b80 -s .bss 0xd2186c00
<4>parport   0xd217a060 -s .data 0xd2183540 -s .bss 0xd2183880
<4>ipmi_devintf0xd2160060 -s .data 0xd21614e0 -s .bss 0xd2161580
<4>ipmi_si_drv0xd2171060 -s .data 0xd2177f00 -s .bss 0xd21780c0
<4>ipmi_msghandler0xd2168060 -s .data 0xd216f170 -s .bss 0xd216f1e0
<4>ipt_REJECT0xd2165060 -s .data 0xd21662c0 -s .bss 0xd2166320
 
The service console panic output starts from:
 
Stack trace from cos log:
 
This first important piece of information is the EIP and where the fault had occurred. This shows you where in the Linux kernel the system had caught (or triggered) the fault. In this example, the function that was running in the Linux kernel at the time was sr_finish. This function is used in the processing of storage information.
 
<4>EIP:    0060:[<c022e995>]    Tainted: P
<4>EFLAGS: 00010246
<4>
<4>EIP is at sr_finish [kernel] 0xa5 (2.4.21-47.0.1.ELvmnix/i686)

The next lines are the register dump. This section shows the register and its contents at the time of the fault:
 
<4>eax: 00000000   ebx: 00000000   ecx: 00000001   edx: 00000800
<4>esi: 00000000   edi: c03a7b20   ebp: 00000000   esp: c204fd70
<4>ds: 0068   cs: 0060   es: 0068   ss: 0068
 
This line is very important. The line shows the process that was running at the time of the fault. In this case, a storage rescan was being performed:

<4>Process esxcfg-rescan (pid: 2997, stackpage=c204f000)
 
These lines contain the content of the stack:
 
<4>Stack: c045ca80 00000400 00000000 00000000 80000000 00000282 00000001 c03a7ac0
<4>       c9fc6e00 c022ca10 00000003 c021cb94 c02d6fa8 00000000 00000000 00000004
<4>       c204fdd0 c204fdd4 c204fdd8 c8c4ae00 c204fddc 00000000 00000004 c204fddc

These lines are the call trace of what the Linux kernel was doing prior to the failure. Use this information to help you diagnose any issues. In this example SCSI scanning was in progress:
 
<4>Call Trace:   [<c022ca10>] sd_attach [kernel] 0x0 (0xc204fd94)
<4>[<c021cb94>] scan_scsis [kernel] 0x3d4 (0xc204fd9c)
<4>[<c0123b79>] printk [kernel] 0x149 (0xc204feb8)
<4>[<c0212b44>] proc_scsi_gen_write [kernel] 0x624 (0xc204feec)
<4>[<c0168ffe>] locate_fd [kernel] 0xae (0xc204ff40)
<4>[<c0180130>] proc_file_write [kernel] 0x40 (0xc204ff80)
<4>[<c0158a73>] sys_write [kernel] 0xa3 (0xc204ff94)
<4>[<c02a406f>] no_timing [kernel] 0x7 (0xc204ffc0)
<4>[<c02a002b>] zlib_tr_flush_block [kernel] 0x3b (0xc204ffe0)
<4>
 
This line is the machine code that was running on the CPU at the time of the fault:

<4>Code: 89 90 00 01 00 00 a1 80 9f 4b c0 80 4c 18 12 01 a1 80 9f 4b
 
This is a list of the kernel modules loaded:

<4>dell_rbu  0xd2188060 -s .data 0xd2189dcc -s .bss 0xd2189e00
<4>ppdev     0xd2185060 -s .data 0xd2186b80 -s .bss 0xd2186c00
<4>parport   0xd217a060 -s .data 0xd2183540 -s .bss 0xd2183880
<4>ipmi_devintf0xd2160060 -s .data 0xd21614e0 -s .bss 0xd2161580
<4>ipmi_si_drv0xd2171060 -s .data 0xd2177f00 -s .bss 0xd21780c0
<4>ipmi_msghandler0xd2168060 -s .data 0xd216f170 -s .bss 0xd216f1e0
<4>ipt_REJECT0xd2165060 -s .data 0xd21662c0 -s .bss 0xd2166320
 

Note: If you need more assistance diagnosing your purple screen error:

Additional Information

Known Issues

If you have an "Oops" purple diagnostic screen that exactly matches the error message outlined in one of these articles, follow the applicable directions:

Other Considerations

An Oops in the ESX service console may be triggered by a hardware issue, a software issue with the ESX VMkernel or Linux service console kernel, or with a driver or privileged third-party process running in the service console. If multiple failures have occurred, consider the pattern of failures prior to taking action.

If the error has not been documented within the knowledge base, collect diagnostic information from the ESX host and submit a support request. For more information, see Collecting Diagnostic Information for VMware Products (1008524) and How to Submit a Support Request.

See Also

Request a Product Feature

To request a new product feature or to provide feedback on a VMware product, please visit the Request a Product Feature page.

Feedback

  • 7 Ratings

Did this article help you?
This article resolved my issue.
This article did not resolve my issue.
This article helped but additional information was required to resolve my issue.

What can we do to improve this information? (4000 or fewer characters)




Please enter the Captcha code before clicking Submit.
  • 7 Ratings
Actions
KB: