Ethernet VPN (EVPN) Introduction
Instead of being a protocol, EVPN is a solution that utilizes the Multi-Protocol Border Gateway Protocol (MP-BGP) for its control plane in an overlay network. Besides, EVPN employs Virtual extensible Local Area Network (VXLAN) encapsulation for the data plane of the overlay network.
EVPN Control Plane: MP-BGP AFI: L2VPN, SAFI: EVPN
Multi-Protocol BGP (MP-BGP) is an extension of BGP-4 that allows BGP speakers to encode Network Layer Reachability Information (NLRI) of various address types, including IPv4/6, VPNv4, and MAC addresses, into BGP Update messages.
The MP_REACH_NLRI path attribute (PA) carried within MP-BGP update messages includes Address Family Identifier (AFI) and Subsequent Address Family Identifier (SAFI) attributes. The combination of AFI and SAFI determines the semantics of the carried Network Layer Reachability Information (NLRI). For example, AFI-25 (L2VPN) with SAFI-70 (EVPN) defines an MP-BGP-based L2VPN solution, which extends a broadcast domain in a multipoint manner over a routed IPv4 infrastructure using an Ethernet VPN (EVPN) solution.
BGP EVPN Route Types (BGP RT) carried in BGP update messages describe the advertised EVPN NLRIs (Network Layer Reachability Information) type. Besides publishing IP Prefix information with IP Prefix Route (EVPN RT 5), BGP EVPN uses MAC Advertisement Route (EVPN RT 2) for advertising hosts’ MAC/IP address reachability information. The Virtual Network Identifiers (VNI) describe the VXLAN segment of the advertised MAC/IP addresses.
Among these two fundamental route types, BGP EVPN can create a shared delivery tree for Layer 2 Broadcast, Unknown Unicast, and Multicast (BUM) traffic using Inclusive Multicast Route (EVPN RT 3) for joining an Ingress Replication tunnel. This solution does not require a Multicast-enabled Underlay Network. Another option for BUM traffic is Multicast capable Underlay Network.
While EVPN RT 3 is used for building a Multicast tree for BUM traffic, the Tenant Routed Multicast (TRM) solution provides tenant-specific multicast forwarding between senders and receivers. TRM is based on the Multicast VPN (BGP AFI:1/SAFI:5 – Ipv4/Mcast-VPN). TRM uses MVPN Source Active A-D Route (MVPN RT 5) to publish Multicast stream source address and group).
Using BGP EVPN's native multihoming solution, we can establish a port channel between Tenant Systems (TS) and two or more VTEP switches. From the perspective of the TS, a traditional port channel is deployed by bundling a set of Ethernet links into a single logical link. On the multihoming VTEP switches, these links are associated with a logical Port-Channel interface called Ethernet Segments (ES).
EVPN utilizes the EVPN Ethernet Segment Route (EVPN RT 4) as a signaling mechanism between member units to indicate which Ethernet Segments they are connected to. Additionally, VTEP switches use this EVPN RT 4 for selecting a Designated Forwarder (DF) for Broadcast, Unknown unicast, and Multicast (BUM) traffic.
When EVPN Multihoming is enabled on a set of VTEP switches, all local MAC/IP Advertisement Routes include the ES Type and ES Identifier. The EVPN multihoming solution employs the EVPN Ethernet A-D Route (EVPN RT 1) for rapid convergence. Leveraging EVPN RT 1, a VTEP switch can withdraw all MAC/IP Addresses learned via failed ES at once by describing the ESI value in MP-UNREACH-NLRI Path Attribute.
An EVPN fabric employs a proactive Control Plane learning model, while networks based on Spanning Tree Protocol (STP) rely on a reactive flood-and-learn-based Data Plane learning model. In an EVPN fabric, data paths between Tenant Systems are established before data exchange. It's worth noting that without enabling ARP suppression, local VTEP switches flood ARP Request messages. However, remote VTEP switches do not learn the source MAC address from the VXLAN encapsulated frames.
BGP EVPN provides various methods for filtering reachability information. For instance, we can establish an import/export policy based on BGP Route Targets (BGP RT). Additionally, we can deploy ingress/egress filters using elements such as prefix lists or BGP path attributes, like BGP Autonomous System numbers. Besides, BGP, OSPF, and IS-to-IS all support peer authentication.
EVPN Data Plane: VXLAN Introduction
The Virtual Extensible LAN (VXLAN) is an encapsulation schema enabling Broadcast Domain/VLAN stretching over a Layer 3 network. Switches or hosts performing encapsulation/decapsulation are called VXLAN Tunnel End Points (VTEP). VTEPs encapsulate the Ethernet frames, originated by local Tenant Systems (TS), within outer MAC and IP headers followed by UDP header with the destination port 4789, and the source port is calculated from the payload. Between the UDP header and the original Ethernet frame is the VXLAN header describing the VXLAN segment with VXLAN Network Identifier (VNI). A VNI is a 24-bit field, allowing (theoretically) for over 16 million unique VXLAN segments.
VTEP devices allocate Layer 2 VNI (L2VNI) for Intra-VN connection and Layer 3 VNI (L3VNI) for Inter-NV connection. There are unique L2VNI for each VXLAN segment but one common L3VNI for tenant-specific Inter-VN communication. Besides, the Generic Protocol Extension for VXLAN (VXLAN-GPE) enables leaf switches to add Group Policy information to data packets.
EVPN Building Blocks
I have divided Figure 1-2 into four domains: 1) Service Abstraction – Broadcast Domain, 2) Overlay Control Plane, 3) Overlay Data Plane, and 4) Route Propagation. These domains consist of several components which have cross-domain dependencies.
Service Abstraction - Broadcast Domain: Virtual LAN:
A Broadcast Domain (BD) is a logical network segment where all connected devices share the same subnet and can reach each other with Broadcast and Unicast messages. Virtual LAN (VLAN) can be considered an abstraction of a BD. When we create a new VLAN and associate access/trunk interfaces with it, a switch starts building an address table of source MAC addresses from received frames originated by local Tenant Systems. With TS, I am referring to physical or virtual hosts. Besides, The Tenant System can be a forwarding component, such as a firewall and load balancer, attached to one or more Tenant-specific Virtual Networks.
Service Abstraction - Broadcast Domain: EVPN Instance:
EVPN Instance is identified by a Layer 2 Virtual Network Identifier (L2VNI). Besides L2VNI, EVPN instances have a unique Route Distinguisher (RD), allowing overlapping addresses between different Tenants and BGP Route Targets (BGP RT) for BGP import and export policies. Before deploying an EVI, we must configure the VLAN and associate it with the VN segment (EVPN Instance). This is because an autogenerated Route Distinguisher associated with EVI requires a VLAN identifier in the RD local administrator part (a base value 32767 + associated VLAN ID). When we deploy an EVPN Instance, a Layer 2 Forwarding Manager (L2FM) starts encoding local MAC address information from the MAC table to EVI-specific MAC-VRF (L2RIB) and the other way around.
Overlay Control Plane
VTEP switches use BGP EVPN for publishing Tenant Systems’ (TS) reachability information. BGP Routing Information Base (BRIB) consists of Local RIB (Loc-RIB) and Adjacency RIB In/Out (Adj-RIB-In and Adj-RIB-Out) tables. The BGP process stores all valid local and remote Network Layer Reachability Information (NLRI) into the Loc-RIB, while Adj-RIB-Out is a peer-specific table where NLRIs are installed through the BGP Policy Engine. The Policy engine executes our deployed BGP peer policy. An example of Policy Engine operation in a Single-AS Fabric is a peer-specific route-reflector-client definition deployed in Spine switches. By setting a peered Leaf switch as a Route-Reflector (RR) client, we allow Spine switches to publish received NLRIs from one iBGP peer to another iBGP peer, which based on default BGP policy is not permitted. Local Tenant Systems MAC addresses and source interfaces are encoded to BGP Loc-RIB from the L2RIB with encapsulation type and source IP address obtained from the NVE interface configuration.
When a VTEP receives an EVPN NLRI from the remote VTEP with importable Route Targets, it validates the route by checking that it has received from the configured BGP peer and with the correct remote ASN and reachable source IP address. Then, it installs the NLRI (RD, Encapsulation Type, Next Hop, other standard and extended communities, and VNIs) information into BGP Loc-RIB. Note that the local administrator part of the RD may change during the process if the VN segment is associated with another VLAN than in the remote VTEP. Remember that VLANs are locally significant, while EVPN Instances have fabric-wide meaning. Next, the best MAC route is encoded into L2RIB with the topology information (VLAN ID associated with the VXLAN segment) and the next-hop information. Besides, L2RIB describes the route source as BGP. Finally, L2FM programs the information into the MAC address table and sets the NVE peer interface ID as next-hop. Note that VXLAN Manager learns VXLAN peers from the data plane based on the source IP address.
Overlay Data Plane: Network Virtualization Edge (NVE) Interface:
The configuration of a logical NVE Interface dictates the encapsulation type and tunnel IP address for VXLAN tunnels. The VXLAN tunnel source IP address is obtained from the logical Loopback interface, which must be reachable across fabric switches. The IP address of the NVE interface is used in BGP Update messages in the BGP MP-REACH-NLRI as a source IP address. The VXLAN encapsulation type is published as BGP EXTENDED-COMMUNITY Path Attribute along with the Route Target (L2VNI and L3VNI) and System MAC (if an IP address is included).
EVPN instances (EVI) are associated with an NVE interface as a Member VN. We must define the L2BUM traffic forwarding mode (Ingress-Replication or Multicast Group) under each member VN. VXLAN Manager is responsible for data plane encapsulation and decapsulation processes.
MAC Route Propagation: Local VTEP
The previous sections provided an overview of the MAC Route propagation process. This section recaps the operation. Tenant Systems can verify the uniqueness of their IP address by sending a Gratuitous ARP (GARP), which is an unsolicited ARP Reply. The VTEP switch learns the source MAC address from the incoming frame and adds it to the MAC address table. The VLAN ID associated with the MAC entry is derived from the configuration of the Attachment Circuit (incoming interface) or the 802.1Q tag in the Ethernet header. The Attachment Circuit serves as the next hop.
The Layer 2 Forwarding Manager (L2FM) transfers information from the MAC address table to the L2RIB of the MAC-VRF. Subsequently, the MAC route is encoded into the BGP Loc-RIB. The BGP process attaches the EVPN Instance-specific Route Distinguisher to the EVPN NLRI. Besides, EVI-specific Route Targets are configured as EXTENDED_COMMUNITY, along with VXLAN encapsulation defined in the NVE interface configuration. The Next Hop for EVPN NLRI is determined by the IP address associated with the local NVE interface. Finally, the MAC route is sent from the Loc-RIB through the BGP Policy Engine to the Adj-RIB-Out and forwarded to the BGP EVPN Peer.