Welcome back! In part 1 and 2, we reviewed how different technologies such as BGP and MPLS-TE offer traffic engineering. In this post, we delve into SD-WAN with a focus on applicability of TE in SD-WAN networks.
SD-WAN, being an overlay technology, is different from the scenarios we saw in previous sections. Inspired by the hourglass principle [OfAge], [IAPrinciples], it pushes more intelligence to the edge, where the main promise is service provider independence. Indeed, an enterprise has the ultimate power in choosing providers at each of its branches and stitching them together using the overlay.
The need for traffic engineering stems from the fact that enterprises, more often than not, connect multiple uplinks from different service providers at each site. This is done to (a) increase network capacity, (b) build resiliency/redundancy, (c) reinforce well-balanced SLA characteristics into the network (e.g. MPLS uplinks for critical, sensitive applications and best-effort broadband for bandwidth-hungry applications), and (d) minimize cost.
Given this, here are a few real-life traffic engineering requirements:
VoIP and point-of-sale traffic from spoke to hub should traverse MPLS links only.
Traffic for a set of trusted SaaS applications (e.g. Office365) should exit out to the Internet locally from each site. Rest of the Internet traffic should be sent to a hub (for instance, for additional scrubbing).
Both site-to-site and Internet traffic should make use of all available uplinks for both outbound and inbound directions except for LTE links. LTE link should be used only as a backup.
Both site-to-site and Internet traffic should traverse links that adhere to strict SLA metric constraints. For instance, VoIP traffic should be forwarded and received on links with <= 100msec latency and <= 1% loss.
In a dual-stack deployment, forward VPN traffic over IPv6 and Internet traffic over IPv4.
Figure 3 depicts the major traffic engineering components of SD-WAN.
Like MPLS-TE, SD-WAN uses tunneling technology for the Forwarding layer to ensure a loop-free and precise packet path.
Forwarding decisions are derived based on the following inputs: (1) route table from routing protocols running in the enterprise as well as with the service provider, (2) network SLA metrics from continuous network performance monitoring, and (3) high-level policies as configured by the enterprise. These three constitute the Control layer.
Since there is no visibility and access to the provider networks, network performance monitoring becomes the linchpin of this system. It also serves as the adaptation layer for re-evaluation of forwarding decisions on a continuous basis.
We will now describe traffic engineering specific functions in SPAN.
With multiple uplinks from each site, the first step is to determine each distinct path through which a site can reach the other site. For example, if site 1 has broadband connections from Comcast and AT&T and site 2 has broadband connections from Cox and Verizon, the distinct paths are essentially the unordered cartesian product of the two sets: (Comcast, Cox), (Comcast, Verizon), (AT&T, Cox), and (AT&X, Verizon). Each SPAN node discovers other nodes and associated uplinks through an elaborate mechanism that also involves NAT traversal. It enables the nodes to be placed behind firewalls or NAT gateways with no network reconfiguration. Once a path is discovered, the nodes create a DTLS tunnel with each other over that path. Note: in some cases, there may not be reachability over all possible paths. For example, if each site has one MPLS uplink and one broadband, the only feasible paths may be (site-1-MPLS, site-2-MPLS) and (site-1-broadband, site-2-broadband).
Once tunnels are established over all possible paths, the next step is to select the set of tunnels that satisfy an application’s requirements. To express this policy, SPAN offers a construct called path selector. User specifies an ordered list of path preferences using this construct, where each preference represents a path filter (for instance, positive or negative matches on uplink circuit types and SLA constraints).
SPAN nodes select the desired paths by pruning tunnels that don’t satisfy these constraints, much like CSPF we saw with MPLS-TE. See figure 2 for an illustration. Note, the selection process needs to be symmetric so that traffic flows in both directions over the desired path.
For forwarding policies, SPAN provides a construct called path policy. This is really a hashmap of traffic classes to path selectors as describes in the previous section. The user can configure as many traffic classes as the business requires and map individual applications to corresponding traffic classes.
SPAN nodes run a DPI engine that classifies incoming traffic to the application, and hence to the configured traffic class. Based on this traffic class, the node finds the set of paths to forward the traffic to.
Note: DPI usually takes multiple packets to deterministically identify an application. For Internet-destined packets that are NAT’ed, the node must identify the application on the first packet of the flow, since the flow can’t be switched to another interface in the middle. SPAN employs a set of techniques (for instance, DNS-based identification) to recognize such traffic, called first packet identification.
Network performance monitoring
One of the functions that SPAN nodes run always is called active path monitoring. Probes are exchanged (every second, by default) on each tunnel between each pair of nodes. This is used to measure different metrics of the tunnels, such as loss, latency, jitter, and router hops, that are fed back to the path selection process.
That’s it for today! We covered basics of traffic engineering, reviewed how some of the most popular technologies implement TE, and demonstrated its applicability with SD-WAN.