The primary node in an Enterprise PKS cluster is in a failing state due to ncp restarting

search cancel

The primary node in an Enterprise PKS cluster is in a failing state due to ncp restarting

book

Article ID: 319525

calendar_today

Updated On:

Products

VMware

Issue/Introduction

Symptoms:

You see ncp the process restarting continuously when running a watch monit summary command on a primary node.
You see messages similar to the following in the /var/vcap/sys/log/ncp/ncp.stderr.log file:

long_project_name, long_service_name, int(port_num), lb_pool,
ValueError: invalid literal for int() with base 10: 'http'

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware PKS 1.x

Cause

The issue is caused by having a named servicePort in an Ingress. The following steps can be used to validate this:

Issue the following command to list all ingress resources:

kubectl get ing –all-namespaces
Issue a command similar to the following to inspect each ingress returned by the previous command, looking for a named serviePort:

kubectl get ing <ingress name> -n <namespace> -o yaml

Note: You will see output similar to the following:

backend:
serviceName: test
servicePort: http

Resolution

This is a known issue with VMware NCP plugin 2.3.2 affecting Enterprise PKS. Currently there is no resolution.

Workaround:

To work around this issue, use the Port numbers for servicePort instead of service names. You can issue a command similar to the following to accomplish this:

kubectl edit ing <ingress name> -n <namespace>
backend:
serviceName: test
servicePort: 80 (instead of http)

Note: After editing the ingress, ncp should become stable. Run the following command to verify ncp status on the primary node:

watch monit summary

Feedback

thumb_up Yes

thumb_down No