VMware SD-WAN Dynamic Multipath Optimization (DMPO)
search cancel

VMware SD-WAN Dynamic Multipath Optimization (DMPO)

book

Article ID: 312356

calendar_today

Updated On:

Products

VMware SD-WAN by VeloCloud

Issue/Introduction

This article provides an in-depth overview of Dynamic Multipath Optimization as used by the VMware SD-WAN service.

Environment

VMware SD-WAN by VeloCloud

Resolution

Background

The VMware SD-WAN™ solution empowers enterprise and service providers to utilize multiple WAN transports simultaneously in order to maximize bandwidth while ensuring application performance. The unique Cloud-Delivered architecture offers these benefits for on-premise and cloud applications (SaaS/IaaS). This requires building an overlay network, which consists of multiple tunnels, monitoring and adapting to the change in the underlying WAN transports in real time. To deliver a very resilient overlay network that takes into account real time performance of the WAN links, VMware SD-WAN has developed Dynamic Multi-path Optimization (DMPO). This document explains the key functionalities and benefits that DMPO provides.


Key Functionalities

DMPO is used between all VMware SD-WAN components that process and forward data traffic. The VMware SD-WAN Edge and VMware SD-WAN Gateway are the DMPO endpoints. For connectivity within enterprise locations (branch to branch or branch to hub), the Edges establish DMPO tunnels between themselves. For connectivity to the cloud applications, each Edge establishes DMPO tunnels with one or more Gateways. The three key functionalities DMPO has are discussed below. 


Continuous Monitoring

Automated Bandwidth Discovery: Once the WAN link is detected by the VMware SD-WAN Edge, it first establishes DMPO tunnels with one or more VMware SD-WAN Gateways and runs bandwidth test with the closest Gateway. The bandwidth test is performed by sending short burst of bi-directional traffic and measuring the received rate at each end. Since the Gateway is deployed at the Internet PoPs, it can also identify the real public IP address of the WAN link in case the Edge interface is behind a NAT or PAT device. Similar process applies for private link. For the Edges acting as the hub or headend, the WAN bandwidth is statically defined. However, when the branch Edge establishes DMPO tunnel with the hub Edges, they follow the same bandwidth test procedures similar to how it is done between Edge and Gateway on the public link.

Continuous Path Monitoring: DMPO performs continuous, uni-directional measurements of performance metrics - loss, latency and jitter of every packet, on every tunnel between any two DMPO endpoints, Edge or Gateway. VMware SD-WAN’s per packet steering allows independent decisions in both uplink and downlink directions without introducing any asymmetric routing. DMPO uses both passive and active monitoring approaches. While there is user traffic, DMPO tunnel header contains additional performance metrics including sequence number and timestamp. This enables the DMPO endpoints to identify lost and out-of-order packets, and calculate jitter and latency in each direction. The DMPO endpoints communicates the performance metrics of the path between each other every 100 ms. 

While there is no user traffic, an active probe is sent every 100 ms and, after 5 minutes of no high priority user traffic, the probe frequency is reduced to 500 ms. This comprehensive measurement enables the DMPO to react very quickly to the change in the underlying WAN condition, resulting in the ability to deliver sub-second protection against sudden drops in bandwidth capacity and outages in the WAN.

MPLS Class of Service (CoS): For private link that has CoS agreement, DMPO can be configured to take CoS into account for both monitoring and application steering decisions.


Dynamic Application Steering

Application-aware Per-packet Steering: DMPO identifies traffic using layer 2 to 7 attributes, e.g. VLAN, IP address, protocol, and applications. VMware SD-WAN performs application aware per-packet steering based on business policy configuration and real time link conditions. The business policy contains out of the box Smart Defaults that specifies the default steering behavior and priority of more than 2500 applications. Customers can immediately use of dynamic packet steering and application-aware prioritization without having to define any policy.

Throughout its lifetime, any traffic flow is steered onto one or more DMPO tunnels, in the middle of the communication, with no impact to the flow. A link that is completely down is referred to as having an outage condition. A link that is unable to deliver SLA for a given application is referred to as having a brownout condition. VMware SD-WAN offers sub-second outage and sudden drops in bandwidth capacity protection. With the continuous monitoring of all the WAN links, DMPO detects sudden loss of SLA or outage condition within 300-500 ms and immediately steers traffic flow to protect the application performance, while ensuring no impact to the active flow and user experience. There is one minute hold time from the time that the brownout or outage condition on the link is cleared before DMPO steers the traffic flow back onto the preferred link if specified in the business policy

Intelligent learning enables application steering based on first packet of the application by caching classification results. This is necessary for application-based redirection, e.g. redirect Netflix on to the branch Internet link, bypassing the DMPO tunnel, while backhauling Office 365 to the enterprise regional hub or data center

Example: Smart Defaults specifies that Microsoft Lync is High Priority and is Real-Time application. Assuming there are 2 links with latency of 50 ms and 60ms respectively. Assume all other SLAs are equal or met. DMPO will chose the link the better latency, i.e. link with 50ms latency. If the current link to which the Lync traffic is steered experiences high latency of 200 ms, within less than a second the packets for the Lync same flow is steered on to another link which has better latency of 60 ms. 

Bandwidth Aggregation for Single Flow: For the type of applications that can benefit from more bandwidth, e.g. file transfer, DMPO performs per-packet load balancing, utilizing all available links to deliver all packets of a single flow to the destination. DMPO takes into account the real time WAN performance and decides which paths should be use for sending the packets of the flow. It also performs resequencing at the receiving end to ensure there is no out-of-order packets introduced as a result of per-packet load balancing

Example: Two 50 Mbps links deliver 100Mbps of aggregated capacity for a single traffic flow. QoS is applied at both the aggregate and individual link level.


On-demand Remediation

Error and Jitter Correction: In a scenario where it may not be possible to steer the traffic flow onto the better link, e.g. single link deployment, or multiple links having issue at the same time, DMPO can enable error corrections for the duration the WAN links have issues. The type of error corrections used depends on the type of applications and the type of errors.

Real time applications such as voice, video flows can benefit from Forward Error Correction (FEC) when there is packet loss. DMPO automatically enables FEC on single or multiple links. When there are multiple links, DMPO will select up to two of best links at any given time for FEC. Duplicated packets are discarded and out-of-order packets are re-ordered at the receiving end before delivering to the final destination. 

DMPO enables jitter buffer for the real time applications when the WAN links experience jitter. TCP applications such as file transfer benefits from Negative Acknowledgement (NACK). Upon the detection of missing packet, the receiving DMPO endpoint informs the sending DMPO endpoint to retransmit the missing packet. Doing so protects the end applications from detecting packet loss and , as a result, maximize TCP window and deliver high TCP throughput even during lossy condition. 


DMPO Real World Results



Scenario 2: TCP Performance with and without VMware SD-WAN for Single and Multiple Links. These results demonstrate both NACK for per-packet load balancing. 



Scenario 3: Hybrid WAN scenario with an outage on the MPLS link and both jitter and loss on the Internet (Comcast) link. These results demonstrate sub-second outage protection by steering application flows onto  Internet links and on-demand remediation at the same time on the Internet link.

Business Policy Framework and Smart Defaults

IT administrator controls QoS, steering, and services to be applied to the application traffic through the business policy. Smart Defaults provides out-of-the-box business policy that supports over 2500 applications. DMPO makes steering decision based on the type of application, real time link condition (congestion, latency, jitter, and packet loss), and the business policy. Following is the example of business policy. 

Each application is assigned a category. Each category has default action, which is a combination of Business Priority, Network Service, Link Steering and Service Class. In addition to the default application list, custom applications can be defined.



Each application is assigned one of the three Service Classes – Real Time, Transactional, or Bulk. For the default applications, the Service Class cannot be modified. However, if the customers define their own customer applications, they also specify the Service Class.

For prioritization and QoS, each application is also assigned one of the three Business Priorities – High, Normal, or Low, which can be modified by the customers.

There are 4 types of Network Services – Direct, Multipath, Cloud Proxy and Internet Backhaul. By default, an application is assigned one of the default Network Services, which can be modified by the customers.

  • Direct: This action is typically used for non-critical, trusted Internet applications that should be sent directly, bypassing DMPO tunnel. An example is Netflix.  Netflix is considered, non-business, high bandwidth application and should not be sent over the DMPO tunnels. The traffic sent directly can be load balanced at the flow level. By default, all the low priority applications are given the Direct action for Network Service.
  • Multi-Path: This action is typically given for important applications. By insertion the Multi-Path service the Internet-based traffic is sent to the VMware SD-WAN Gateway. Below table shows the default link steering and on-demand remediation technique for a given Service Class. By default, high and normal priority applications are given the Multi-Path action for Network Service.
  • Cloud-Proxy: This action redirects the application flow to a cloud proxy such as WebSense (now ForcePoint).
  • Internet Backhaul: This action redirects the Internet applications to the specified enterprise location that may or may not have the VMware SD-WAN Edge. The typically use case is to force important Internet applications through a site that has security devices such as firewall, IPS, and content filtering before the traffic is allowed to exit to the Internet. 

Smart Defaults for Network Service

Below are the default values for Network Service action. Note that the VPN traffic is always sent through the tunnels (specifying Direct action for Network Service does not apply to VPN traffic)

Link Steering Abstraction With Transport Group

Across different branch and hub locations, there may be different models of the VMware SD-WAN Edge with different WAN interfaces and carriers. In order to enforce the centralized link steering policy using Profile, it is important that the interfaces and carries are abstracted. Transport Group provides the abstraction of the actual interfaces of the devices and carriers used at various locations. The business policy at the Profile level can be applied to the Transport Group instead, while the business policy at the individual Edge level can be applied to Transport Group, WAN Link (carrier), and Interfaces.

Link Steering by Transport Group
Different locations may have different WAN transports, e.g. WAN carrier name, WAN interface name, DMPO uses the concept of transport group to abstract the underlying WAN carriers or interfaces from the business policy configuration. The business policy configuration can specify the transport group (public wired, public wireless, private wired, etc.) in the steering policy so that the same business policy configuration can be applied across different device types or locations, which may have completely different WAN carriers and WAN interfaces, etc. When the DMPO performs the WAN link discovery, it also assigns the transport group to the WAN link. This is the most desirable option for specifying the links in the business policy because it eliminates the need for IT administrators to know the physical connectivity or WAN carrier.


Link Steering by WAN Link
The WAN interface is connected to a WAN carrier, which is specific to the location of the Edge. DMPO automatically detect the WAN carrier by doing GeoIP lookup, or the IT administrators can specify the WAN carrier. 



Link Steering by Interface
The link steering policy can be applied to the interface, e.g. GE2, GE3, which will be different depending on the Edge model and the location. This is the least desirable option to use in the business policy because IT administrators have to be fully aware of how the Edge is connected to be able to specify which interface to use.
'


Link Steering and On-demand Remediation

There are four possible options for Link Steering – Auto, Preferred, Mandatory, and Available

Link Selection: Mandatory--Pin the traffic to the link or the transport group. The traffic is never steered away regardless of the condition of the link including outage. On-demand remediation is triggered to mitigate brownout condition such as packet loss and jitter.
Example: Netflix is a low priority application and is required to stay on the public wired links at all times.

Link Selection: Preferred--Select the link to be marked as "preferred".  Depending on the type of WAN links available on the Edge, there are three possible scenarios:
  • Where the preferred internet link has multiple public WAN link alternatives: Application traffic stays on the preferred link as long as it meets SLA for that application, and steers to other public links once the preferred link cannot deliver the SLA needed by the application. In the situation that there is no link to steer to, meaning all public links fail to deliver the SLA needed by the application, on-demand remediation is enabled. Alternatively, instead of steering the application away as soon as the current link cannot deliver the SLA needed by the application, DMPO can enable the on-demand remediation until the degradation is too severe to be remediated, then DMPO will steer the application to the better link.
Example: Prefer the video collaboration application on the Internet link until it fails to deliver the SLA needed by video, then steers to a public link that meets this application's SLA.
  • Where the preferred internet link has multiple public WAN link and private WAN link alternatives: Application traffic stays on the preferred link as long as it meets SLA for that application, and steers to the other public link once the preferred link cannot deliver the SLA needed by the application. The preferred link will NOT steer to the private link in the event of an SLA failure, and would only steer to that private link in the event both the preferred link and the optional public link were both either unstable or down completely. In the situation that there is no link to steer to, meaning the optional public links failed to deliver the SLA needed by the application, on-demand remediation is enabled. Alternatively, instead of steering the application away as soon as the current link cannot deliver the SLA needed by the application, DMPO can enable the on-demand remediation until the degradation is too severe to be remediated, then DMPO will steer the application to the better link.
Example A: Prefer the video collaboration application on the Internet link until it fails to deliver the SLA needed by video, then steers to a public link that meets this application's SLA. 
Example B: Prefer the video collaboration application on the Internet link until the link goes unstable or drops completely, the public link alternatives are also unstable or have also dropped completely, then steers to an available private link.
  • Where the preferred internet link has only private WAN link alternatives: application traffic stays on the preferred link regardless of the SLA status for that application, and will not steer to the other private link(s) even if the preferred link cannot deliver the SLA needed by the application.  In place of steering to the private links on an SLA failure for that application, on-demand remediation is enabled. The preferred link would steer to the private link(s) would only steer to the optional private link(s) in the event that the preferred link was either unstable or down completely.
Example: Prefer the video collaboration application on the Internet link until the link goes unstable or drops completely, and then steers to an available private link.
 
Note: The default manner in which a private link is treated with reference to a preferred link (in other words, that a preferred link will only steer to a private link if the preferred link is unstable or offline) will be configurable through a setting to be added to the Orchestrator UI in a later release.

Link Selection: Available--Pick the available link as long as the link is up. If the link fails to deliver the SLA, DMPO enables on-demand remediation. DMPO will not steer the application flows to other link unless the link is completely down.

Example: Web traffic is backhauled over the Internet link to the hub site using the Internet link as long as the link is active regardless of SLA.

Link Selection: Auto--By default, all applications are given the Link Selection of Auto. This means DMPO automatically picks the best links based on the type of application and automatically enable on-demand remediation when necessary. There are four possible combination of Link steering and On-demand Remediation for Internet applications. As mentioned earlier, traffic within the enterprise (VPN) always goes through the DMPO tunnels, hence it always receive the benefits of on-demand remediation.



Below examples explain the default DMPO behavior for different type of applications and link conditions. Please see appendix section for default SLA for different application types.

Example: Real-Time applications
1. Scenario: At least one link that satisfies the SLA for the application
    Expected DMPO behavior: Pick the best available link.
2. Scenario: Single link with packet loss exceeding the SLA for the application
    Expected DMPO behavior: Enable FEC for the real-time applications sent on this link
3. Scenario: Two links with loss on only one link
    Expected DMPO behavior: Enable FEC on both Links.
4. Scenario: Multiple links with loss on multiple links
    Expected DMPO behavior: Enable FEC on two best links.
5. Scenario: Two links but one link appears unstable, i.e. missing three consecutive heartbeats
    Expected DMPO behavior: Mark link un-usable and steer the flow to the next best available link.
6. Scenario: Both jitter and loss on two links
    Expected DMPO behavior: Enable FEC on both links and enable jitter buffer on receiving side. Jitter buffer is enabled when jitter is greater than 7 ms for voice and greater than 5 ms for video. The sending DMPO endpoint notifies the receiving DMPO endpoint to enable jitter buffer. The receiving DMPO endpoint will buffer up to 10 packets or 200 ms of traffic, whichever happens first. Receiving DMPO endpoint uses the original timestamp embedded in the DMPO header to calculate the flow rate to use in de-jitter buffer. If flow is not sent at constant rate, the jitter buffering is disabled.

Example: Transactional and bulk applications
Enables NACK if packet loss exceeds the threshold that is acceptable per application type (see appendix for value). 


Secure Traffic Transmission

For private or internal traffic, DMPO encrypts both the payload, which contains the user traffic, and the tunnel header with IPSec transport mode end-to-end. DMPO supports AES128 and AES256 for encryption. IPSec key management and authentication is done using PKI and IKEv2 protocol. 


Protocols and Ports Used

DMPO uses multiple ports below.
UDP/2426 – used for overlay tunnel management and information exchange between the two DMPO endpoints (Edges and Gateways). It is also used for data traffic that may already be secured or is not important, e.g. SFDC traffic from branch to the cloud between Edge and Gateway because SFDC traffic is already encrypted with TLS.
UDP/500 and UDP/4500 – used for IKEv2 negotiation and for IPSec NAT transparency
IP/50 – in case there is no NAT between the two DMPO endpoints, the IPSec is established over native IP protocol 50 (ESP). 


Appendix: QoE threshold and Application SLA

DMPO uses the following SLA threshold for different types of applications. Once the WAN link condition exceeds one of more thresholds, DMPO will immediately take action to either steer the affected application flows or perform on-demand remediation. Packet loss is calculated by calculating the number of lost packets to the total packets in the last 1-minute interval. The number of lost packets are communicated between each DMPO endpoints every second. The same threshold also reflects in the Quality of Experience (QoE) report.

Another condition that will immediately trigger DMPO to take action is the lost of communications (no user data or probes) received within 300 ms.