NSX-T Manager service corfu-nonconfig-server is not running
search cancel

NSX-T Manager service corfu-nonconfig-server is not running

book

Article ID: 322524

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • The NSX-T manager UI is fine and seems to be operational.
  • You have a single node cluster, that is one NSX-T manager and not the recommended three.
  • Issuing command 'get services' as admin user - does not show any stopped or failed services.
  • In /var/log/cbm/tanuki/tanuki.log, you see errors similar to:
2023-01-13T12:51:06.304Z ERROR pool-3-thread-1 ClusterManagerServiceHelper - - [nsx@6876 comp="cluster-manager" errorCode="CBM303" level="ERROR" subcomp="ClusterManagerServiceHelper"] [CBM303] Corfu (nonconfig) server is not found on this node. This condition is currently unsupported.
org.corfudb.runtime.exceptions.unrecoverable.UnrecoverableCorfuError: java.util.concurrent.ExecutionException: com.vmware.nsx.platform.clustering.persistence.exceptions.CorfuShutdownException: Disconnected from database. Terminating thread.
        at org.corfudb.runtime.CorfuRuntime.connect(CorfuRuntime.java:1074) ~[cluster-boot-manager-1.0.jar:?]
        at com.vmware.nsx.platform.clustering.persistence.corfu.CorfuDbDataStore.connect(CorfuDbDataStore.java:264) ~[cluster-boot-manager-1.0.jar:?]
        at com.vmware.nsx.cbm.factory.CorfuDbDataStoreFactoryImpl.initializeDataStoreImpl(CorfuDbDataStoreFactoryImpl.java:91) ~[cluster-boot-manager-1.0.jar:?]
...
  • As root user, running command service 'corfu-nonconfig-server status' show it as 'inactive (dead) since <DATE/TIME>'.
  • As root user running 'service corfu-nonconfig-server restart' attempts to start the service, but after approx a minute, this will fail and go into an inactive state again.
  • Checking log file '/var/log/corfu-nonconfig/tanuki.log' shows something similar to:
2023-01-09T12:42:38.847Z | ERROR |           WrapperSimpleAppMain | o.c.infrastructure.CorfuServer | CorfuServer: Server exiting due to unrecoverable error:
org.corfudb.runtime.exceptions.DataCorruptionException: Checksum mismatch detected while trying to read file. Segment File: /nonconfig/corfu/corfu/log/745.log. File size: 3707026. File position: 3707026. Global tail: -1. Tail segment: 745. Stream tails size: 0


Environment

VMware NSX-T Data Center

Cause

CBM (Cluster Boot Manager) fails to start because the service corfu-nonconfig is inactive. 
The service corfu-nonconfig is inactive because the corfu-nonconfig database is corrupted.
This can occur due to underlying storage issues, when such issues occur, we advise to restore from backup.
Normal Linux OS recovery commands are unable to check for corruptions such as this.

Resolution

There is no Resolution to resolve this issue, please restore the environment using a backup.

Workaround:
There is no workaround to resolve this issue, please restore the environment using a backup.
Please note, this issue could also occur with a three node NSX-T manager cluster.
Having three NSX-T manager nodes, each on separate storage would mean only one node would be impacted by the underlying storage issue which lead to the corfu database corruption.
In such circumstances, the impacted NSX-T manager node could be detached from the cluster and a new NSX-T manager node deployed to create a healthy three node cluster again.