We were recently notified about a discussion around the overlay tunnel scale in an SD-WAN network in a few mailing lists. As the number of nodes in the network increases, a full-mesh overlay would lead to an
O(n^2) explosion. In what follows, we provide our (short) perspective on some of these scale properties.
Multiple uplink (access) circuits are a commonplace in SD-WAN networks. Subsequently, each node attempts to establish a tunnel over each of its uplinks to every uplink of a peer node, leading to a tunnel for each uplink combination. That does lead to the full-mesh complexity of
O(n^2), but that’s the number of tunnels in the network. The number of tunnels on each node is different than above.
Instead of “42. (The answer to life, the universe, and everything.) - Douglas Adams”, we can actually express these two numbers in mathematical terms.
Number of tunnels on any given node, j, can be expressed as:
T_j = numUplinks_j * ( SUM(numUplinks_i) from i=1 to i=numNodes, i != j )
As per the total number of tunnels in the network, it’s a bit more subtle than a pure math combination formula
(N | 2), as a node doesn’t establish tunnels between its own uplinks. This, in fact, can be represented as a complete multipartite graph, with each set contains the corresponding node’s uplinks. Accordingly, the total number of tunnels in the network can be expressed as follows:
N = SUM(numUplinks_i) from i=1 to i=numNodes,
T = (N | 2) - (SUM(numUplinks_i | 2) from i=1 to i=numNodes)
In summary, the number of tunnels a node needs to manage is in
O(n). The total number of tunnels in the network is in
The real world isn’t always as perfect as the above numbers. The reasons the actual number of tunnels deviate from the above theoretical numbers are:
The actual number of tunnels, in reality, is thus less than the theoretical max.
In other words, can any SD-WAN system or CPE support tunnel scale even when the number of nodes in the network grows arbitrarily high? The good news is that with the right implementation strategy, the overlay tunnels should not be exposed as independent interfaces in the system (that would otherwise be quite restrictive in terms of system resources).
For example, SPAN devices utilize DTLS to create overlay tunnels. It is lightweight, lives completely in user space, and takes up less encapsulation space compared to other VPN technologies. In addition, it lets us express each tunnel, logically, as simply a data structure (instead of creating a separate interface/device construct in the system).
That said, as the saying goes: “there is no such thing as free lunch,” each overlay tunnel will at least include the following state:
Now enter the “feature creep” that constantly leads to more state and processing overheads to be considered per tunnel, including the following:
The following table summarizes how these affect the base system resources:
A good implementation will want to keep all of the optional features pluggable and tunable to achieve scale.
Most of the SPAN devices scale to 1000s of overlay tunnels easily. See the following figure for a quick observation on the CPU and memory snapshot as we scale the number of tunnels on a mid-range SPAN device. This is with a fully-featured configuration. The DPI-level stats maintenance per tunnel contributes the most to the memory usage in our system.
Additionally, SPAN also provides an extensible policy framework to build dynamic topology groups by matching on specific attributes. E.g. if the link is LTE, the topology should be hub-n-spoke. This is quite useful as LTE links are quite sensitive to the amount of data being sent on them.
Now to the real question: does it make sense to always go for a full-mesh overlay topology irrespective of the number of nodes in the network? The answer lies in that property of the system design that’s often overlooked: debuggability. If each node in the network has 1000s of tunnels, how does the network administrator really debug: (a) are all the tunnels up?, (b) is the data going on the right set of tunnels?, and so on. Although the SD-WAN system provides substantial set of tools and visibility, they are not enough to troubleshoot such issues at scale.
For large networks, it thus makes more practical sense to decompose into smaller subnetworks and build a hierarchical topology.