On NSX-T 3.2.0/3.2.1 DFW rules are matched intermittently
search cancel

On NSX-T 3.2.0/3.2.1 DFW rules are matched intermittently

book

Article ID: 318275

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
  • NSX-T Data Center 3.2.0, 3.2.0.1 and 3.2.1
  • DFW rules are defined using Inventory Groups with either dynamic or static membership criteria
  • Intermittently DFW rules are not matched as expected e.g. There is an allow rule for VM to VM traffic but sometimes this traffic is blocked by the default block rule.
  • vmotion of the affected VM may temporarily resolve the issue
  • When viewing Group members on the UI, IPs may be missing from the list of members
  • On the NSX-T Manager, /var/log/cloudnet/nsx-ccp.log may have the following log entries
2022-03-15T12:48:03.524Z  WARN Owl-worker-7 ExecutorTask 1513 - [nsx@6876 comp="nsx-controller" level="WARNING" subcomp="owl"] Failed to process dataUpdate for listener ContainerEventsListenerNewImpl, error message: null, error stack:java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
        at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
        at com.vmware.nsx.rooster.cache.ContainerCacheEntry$IpMacMembers.toStringInternal(ContainerCacheEntry.java:159)
        at com.vmware.nsx.rooster.cache.ContainerCacheEntry$IpMacMembers.toString(ContainerCacheEntry.java:145)
        at java.lang.String.valueOf(String.java:2994)
        at java.lang.StringBuilder.append(StringBuilder.java:131)
        at com.vmware.nsx.rooster.cache.ContainerCacheEntry.toString(ContainerCacheEntry.java:481)
        at com.vmware.nsx.rooster.cache.ContainerCacheImpl.lookup(ContainerCacheImpl.java:119)
        at com.vmware.nsx.rooster.cache.ContainerCacheImpl.lookup(ContainerCacheImpl.java:111)
        at com.vmware.nsx.rooster.event.ContainerEventsListenerNewImpl.processContainers(ContainerEventsListenerNewImpl.java:722)
        at com.vmware.nsx.rooster.event.ContainerEventsListenerNewImpl.onDataDiscovered(ContainerEventsListenerNewImpl.java:563)
        at com.vmware.nsx.ccp.owl.ExecutorTask.compute(ExecutorTask.java:83)
        at java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)

  
  
2022-03-10T21:49:42.674Z  WARN nsx-rpc:CCP-AphProvider-0fac58d6-5083-4f03-82ef-9cc7d363709b:user-executor-3 GroupingMembershipServiceImpl 19814 - [nsx@6876 comp="nsx-controller" level="WARNING" subcomp="Group
ing API"] Exception is caught while fetch all members null
java.util.ConcurrentModificationException: nul
l
        at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445) ~[?:1.8.0_301]
        at java.util.HashMap$KeyIterator.next(HashMap.java:1469) ~[?:1.8.0_301]
        at com.vmware.nsx.rooster.cache.ContainerCacheEntry$IpMacMembers.toStringInternal(ContainerCacheEntry.java:154) ~[libnsx_ccp_rooster.jar:?]
        at com.vmware.nsx.rooster.cache.ContainerCacheEntry$CacheEntryMembers.toString(ContainerCacheEntry.java:300) ~[libnsx_ccp_rooster.jar:?]
        at java.lang.String.valueOf(String.java:2994) ~[?:1.8.0_301]
        at java.lang.StringBuilder.append(StringBuilder.java:131) ~[?:1.8.0_301]
        at com.vmware.nsx.rooster.cache.ContainerCacheEntry.toString(ContainerCacheEntry.java:483) ~[libnsx_ccp_rooster.jar:?]
        at com.vmware.nsx.rooster.cache.ContainerCacheImpl.lookup(ContainerCacheImpl.java:119) ~[libnsx_ccp_rooster.jar:?]
        at com.vmware.nsx.rooster.service.GroupingMembershipServiceImpl.fetchAllMembers(GroupingMembershipServiceImpl.java:1466) ~[libnsx_ccp_rooster.jar:?]
        at com.vmware.nsx.rooster.service.GroupingMembershipServiceImpl.getEffectiveMembers(GroupingMembershipServiceImpl.java:422) [libnsx_ccp_rooster.jar:?]
        at vmware.nsx.grouping.GroupingMembershipServiceNsxRpc$MethodHandlers.invoke(GroupingMembershipServiceNsxRpc.java:530) [libgrouping-membership-java-rpc.jar:?]
        at com.vmware.nsx.rpc.call.ServerCalls$AsyncUnaryCallObserver.next(ServerCalls.java:140) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.call.ServerCalls$AsyncUnaryCallObserver.next(ServerCalls.java:121) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.call.NsxRpcCall$ActiveCallStateBase.invokeNext(NsxRpcCall.java:266) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.call.NsxRpcCall$ActiveCallState.doReceiveNonStreamingRemote(NsxRpcCall.java:384) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.call.NsxRpcCall$ActiveCallState.doReceive(NsxRpcCall.java:482) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.call.NsxRpcCall.doReceive(NsxRpcCall.java:999) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.channel.NsxRpcChannel.doReceiveNewCall(NsxRpcChannel.java:683) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.channel.NsxRpcChannel.doReceive(NsxRpcChannel.java:634) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.channel.task.ChannelReceiveTask.doRun(ChannelReceiveTask.java:21) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.channel.task.ChannelTask.run(ChannelTask.java:45) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.channel.NsxRpcChannel.processOperations(NsxRpcChannel.java:844) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.core.Scheduler.process(Scheduler.java:112) [libnsx_rpc.jar:?]
        at com.vmware.nsx.rpc.core.Scheduler.run(Scheduler.java:90) [libnsx_rpc.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_301]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_301]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_301]


Environment

VMware NSX-T Data Center 3.x
VMware NSX-T Data Center

Cause

From release NSX-T Data Center 3.2.0, responsibility for managing Group membership was moved from the Manager to the Controller component.
It has been found that a race condition may occur when handling group membership updates. The result is that some DFW rules may not apply to VMs correctly.
This issue is less likely to occur on version NSX-T 3.2.1 and resolved in NSX-T 3.2.1.1.

Resolution

This issue is resolved in NSX-T Data Center 3.2.1.1, available at VMware Downloads​​​​.

Workaround:
An NSX-T upgrade is advised. If that is not possible please review the workaround options.

If this issue is already present in the environment, restart the the Controller service on each Manager to clear the condition

as admin user
>restart service controller

It may take a few minutes for the condition to clear and it is possible the condition may recur at a later time.

To completely prevent this issue on affected versions, it is necessary to use DFW rules that do not use Groups.