The primary node in an Enterprise PKS cluster is in a failing state due to ncp restarting
search cancel

The primary node in an Enterprise PKS cluster is in a failing state due to ncp restarting

book

Article ID: 319525

calendar_today

Updated On:

Products

VMware

Issue/Introduction

​​​​


Symptoms:
  • You see ncp the process restarting continuously when running a watch monit summary command on a primary node.

  • You see messages similar to the following in the /var/vcap/sys/log/ncp/ncp.stderr.log file:

    long_project_name, long_service_name, int(port_num), lb_pool,
    ValueError: invalid literal for int() with base 10: 'http'

    Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.



Environment

VMware PKS 1.x

Cause

The issue is caused by having a named servicePort in an Ingress. The following steps can be used to validate this:
 

  1. Issue the following command to list all ingress resources:

    kubectl get ing –all-namespaces

  2. Issue a command similar to the following to inspect each ingress returned by the previous command, looking for a named serviePort:

    kubectl get ing <ingress name> -n <namespace> -o yaml

Note: You will see output similar to the following:

 backend:
   serviceName: test
   servicePort: http

Resolution

This is a known issue with VMware NCP plugin 2.3.2 affecting Enterprise PKS. Currently there is no resolution.


Workaround:

To work around this issue, use the Port numbers for servicePort instead of service names. You can issue a command similar to the following to accomplish this:

kubectl edit ing <ingress name> -n <namespace>
backend:
   serviceName: test
   servicePort: 80 (instead of http)

Note: After editing the ingress, ncp should become stable. Run the following command to verify ncp status on the primary node:

watch monit summary