Thursday 2 May 2024

Configuration of BGP afi/safi L2VPN EVPN and NVE Tunnel Interface

Overlay Network Routing: MP-BGP L2VPN/EVPN



EVPN Fabric Data Plane – MP-BGP


Instead of being a protocol, EVPN is a solution that utilizes the Multi-Protocol Border Gateway Protocol (MP-BGP) for its control plane in an overlay network. Besides, EVPN employs Virtual eXtensible Local Area Network (VXLAN) encapsulation for the data plane of the overlay network.

Multi-Protocol BGP (MP-BGP) is an extension of BGP-4 that allows BGP speakers to encode Network Layer Reachability Information (NLRI) of various address types, including IPv4/6, VPNv4, and MAC addresses, into BGP Update messages. The MP_REACH_NLRI path attribute (PA) carried within MP-BGP update messages includes Address Family Identifier (AFI) and Subsequent Address Family Identifier (SAFI) attributes. The combination of AFI and SAFI determines the semantics of the carried Network Layer Reachability Information (NLRI). For example, AFI-25 (L2VPN) with SAFI-70 (EVPN) defines an MP-BGP-based L2VPN solution, which extends a broadcast domain in a multipoint manner over a routed IPv4 infrastructure using an Ethernet VPN (EVPN) solution.

BGP EVPN Route Types (BGP RT) carried in BGP update messages describe the advertised EVPN NLRIs (Network Layer Reachability Information) type. Besides publishing IP Prefix information with IP Prefix Route (EVPN RT 5), BGP EVPN uses MAC Advertisement Route (EVPN RT 2) for advertising hosts’ MAC/IP address reachability information. The Virtual Network Identifiers (VNI) describe the VXLAN segment of the advertised MAC/IP addresses. 

Among these two fundamental route types, BGP EVPN can create a shared delivery tree for Layer 2 Broadcast, Unknown Unicast, and Multicast (BUM) traffic using Inclusive Multicast Route (EVPN RT 3) for joining an Ingress Replication tunnel. This solution does not require a Multicast-enabled Underlay Network. Another option for BUM traffic is Multicast capable Underlay Network.

While EVPN RT 3 is used for building a Multicast tree for BUM traffic, The Tenant Routed Multicast (TRM) solution provides tenant-specific multicast forwarding between senders and receivers. TRM is based on the Multicast VPN (BGP AFI:1/SAFI:5 – Ipv4/Mcast-VPN). TRM uses MVPN Source Active A-D Route (MVPN RT 5) for publishing Multicast stream source address and group). 

Using BGP EVPN's native multihoming solution, we can establish a Port-Channel between Tenant Systems (TS) and two or more VTEP switches. From the perspective of the TS, a traditional Port-Channel is deployed by bundling a set of Ethernet links into a single logical link. On the multihoming VTEP switches, these links are associated with a logical Port-Channel interface referred to as Ethernet Segments (ES).

EVPN utilizes the EVPN Ethernet Segment Route (EVPN RT 4) as a signaling mechanism between member units to indicate which Ethernet Segments they are connected to. Additionally, VTEP switches use this EVPN RT 4 for selecting a Designated Forwarder (DF) for Broadcast, Unknown unicast, and Multicast (BUM) traffic.

When EVPN Multihoming is enabled on a set of VTEP switches, all local MAC/IP Advertisement Routes include the ES Type and ES Identifier. The EVPN multihoming solution employs the EVPN Ethernet A-D Route (EVPN RT 1) for rapid convergence. Leveraging EVPN RT 1, a VTEP switch can withdraw all MAC/IP Addresses learned via failed ES at once by describing the ESI value in MP-UNREACH-NLRI Path Attribute. 

Note! ESI multi-homing is supported only on the first-generation Cisco Nexus 9300 switches. Nexus 9200, 9300-EX switches and newer models doesn’t support ESI multi-homing. 

An EVPN fabric employs a proactive Control Plane learning model, while networks based on Spanning Tree Protocol (STP) rely on a reactive flood-and-learn-based Data Plane learning model. In an EVPN fabric, data paths between Tenant Systems are established prior to data exchange. It's worth noting that without enabling ARP suppression, local VTEP switches flood ARP Request messages. However, remote VTEP switches do not learn the source MAC address from the VXLAN encapsulated frames.

BGP EVPN provides various methods for filtering reachability information. For instance, we can establish an import/export policy based on BGP Route Targets (BGP RT). Additionally, we can deploy ingress/egress filters using elements such as prefix-lists or BGP path attributes, like BGP Autonomous System numbers. Besides, BGP, OSPF, and IS-to-IS all support peer authentication.

EVPN Fabric Data Plane –VXLAN


The Virtual eXtensible LAN (VXLAN) is an encapsulation schema that enables Broadcast Domain/VLAN stretching over a Layer 3 network. Switches or hosts performing encapsulation/decapsulation are called VXLAN Tunnel End Points (VTEP). VTEPs encapsulate the Ethernet frames, originated by local Tenant Systems (TS), within outer MAC and IP headers followed by UDP header with the destination port 4789 and source port is calculated from the payload. Between the UDP header and the original Ethernet frame is the VXLAN header describing the VXLAN segment with VXLAN Network Identifier (VNI). A VNI is a 24-bit field, theoretically allowing for over 16 million unique VXLAN segments. 

VTEP devices allocate Layer 2 VNI (L2VNI) for Intra-VN connection and Layer 3 VNI (L3VNI) for Inter-NV connection. There are unique L2VNI for each VXLAN segment but one common L3VNI  for tenant-specific Inter-VN communication. Besides, the Generic Protocol Extension for VXLAN (VXLAN-GPE) enables leaf switches to add Group Policy information to data packets. 

When a VTEP receives a EVPN NLRI from the remote VTEP with importable Route Targets, it validates the route by checking that it has received from the configured BGP peer and with the right remote ASN and reachable source IP address. Then, it installs the NLRI (RD, Encapsulation Type, Next Hop, other standard and extended communities and VNIs) information into BGP Loc-RIB. Note that the local administrator part of the RD may change during the process if the VN segment is associated with another VLAN than in the remote VTEP. Remember that VLANs are locally significant, while EVPN Instances has fabric-wide meaning. Next, the best MAC route (or routes, ECMP is enabled) is encoded into L2RIB with the topology information (VLAN Id associated with the VXLAN segment) and the next-hop information. Besides, L2RIB describes the route source as BGP. Finally, L2FM programs the information into MAC address table and sets the NVE peer interface Id as next-hop. Note that VXLAN Manager learns VXLAN peers from the data plane based on the source IP address. 

Our EVPN Fabric is a Single-AS solution, where Leaf and Spine switches are in the same BGP AS area, making Leaf-Spine switches iBGP neighbors. We assign a BGP AS area 6500 to all switches and configure both Spine switches as BGP Route Reflectors, as shown in Figure 2-6. We reserve the IP subnet 192.168.10.0/24 for the Overlay network's BGP process, from which we take IP addresses for the logical interface Loopback 10. We use these addresses as a) BGP Router Identifiers (BRIDs), b) defining BGP neighbors and c) source addresses for BGP Update messages.

Leaf switches act as VXLAN Tunnel Endpoints (VTEPs), responsible for encapsulating/decapsulating data packets to/from Customer networks on the Fabric's Transport network side. A logical Network Virtual Edge (NVE) interfaces of Leaf switches use VXLAN tunneling, where the tunnel source IP address is the IP address of Loopback 20. We reserve the subnet 192.168.20.0/24 for this purpose, as shown in Figure 2-6. 

In Figure 2-6, I have listed the VTEP Loopback identifier and IP address sections belonging to the Underlay network. The reason is that the source/destination IP addresses used for tunneling between VTEP devices must be routable by the devices in the Transport network (Underlay Network). In the context of BGP EVPN, the term "Overlay" refers to the fact that it advertises only the MAC and IP addresses and subnets required for IP communication among devices connected to EVPN segments.

The following image also lists mandatory NX-OS features that we must enable to configure both the BGP EVPN Control Plane and the Data Plane.



Figure 2-6: EVPN Fabric Overlay Network Control Plane and Data Plane.


Image 2-7 depicts our implementation of a Single-AS EVPN Fabric. The Spine switch serves as a BGP Route Reflector, forwarding BGP Update messages from Leaf switches to other Leaf switches. The BGP process on Leaf switches sets the IP address of the Loopback 10 interface as the Next-hop in the MP_REACH_NLRI Path Attribute for all advertised EVPN NLRI Route Types.

The Network Virtual Edge (NVE) interfaces use the IP address of Loopback 10 for VXLAN tunneling. The NVE interface sub-command "host reachability protocol BGP" instructs the NVE interface to use the Control Plane learning model based on the received BGP Updates about EVPN NLRIs.




Figure 2-7: EVPN Fabric Overlay Network Control Plane and Data Plane Building Blocks.



BGP EVPN Configuration


Example 2-18 shows the configuration of Spine-12 for BGP. The first two commands enable BGP EVPN. In the actual BGP configuration, we first specify the BGP AS number as 65000. Then, we attach the IP address we defined for Loopback 10 as the BGP Route ID. The command Address-family l2vpn evpn with the subcommand maximum-paths 2 enables flow-based load sharing across two BGP peers if their EVPN NLRI AS_PATH attributes are identical. The commonly used term for this is Equal Cost Multi-Pathing (ECMP). 

Using the neighbor command, we define the BGP neighbor's IP address. For each BGP neighbor, we define a BGP AS number and the source IP address for the locally generated BGP Update messages. With the command address-family l2vpn, we indicate that we want to exchange EVPN NLRI information with this neighbor. 

Depending on the advertised EVPN Route Type, a set of BGP Extended Community attributes are carried with advertised EVPN NLRIs. Hence, we need the command send-community extended. By default, the BGP loop prevention mechanism prevents iBGP peers from advertising NLRI information learned from other iBGP peers. We bypass this mechanism by configuring the Spine switches as BGP Route Reflectors using the neighbor-specific route-reflector-client command.


feature bgp
nv overlay evpn
!
router bgp 65000
  router-id 192.168.10.12
  address-family l2vpn evpn
    maximum-paths 2
  neighbor 192.168.10.101
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
!
  neighbor 192.168.10.102
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
!
  neighbor 192.168.10.103
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
!
  neighbor 192.168.10.104
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client

Example 2-18: Spine Switches BGP Configuration.
Example 2-19 illustrates the BGP configuration of switch Leaf-101. The BGP configurations of all Leaf switches are identical except for the BGP router ID.

feature bgp
nv overlay evpn
!
router bgp 65000
  router-id 192.168.10.101
  address-family l2vpn evpn
    maximum-paths 2
  neighbor 192.168.10.11
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended

  neighbor 192.168.10.12
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended

Example 2-19: Leaf Switches BGP Configuration.

BGP EVPN Verification

From Example 2-20, we can see the BGP commands we have associated with the BGP neighbor Leaf-101 on Spine-11.


Spine-11# sh bgp l2vpn evpn neighbors 192.168.10.101 commands
Command information for 192.168.10.101
                 Update Source: locally configured
                     Remote AS: locally configured

 Address Family: L2VPN EVPN
                Send Community: locally configured
            Send Ext-community: locally configured
        Route Reflector Client: locally configured
Spine-11#

Example 2-20: Leaf Switches BGP Configuration.

Example 2-21 shows the BGP neighbors of Spine-11 with their AS numbers and statistics regarding received and sent BGP messages (Open, Keepalive, Update, and Notification). All EVPN Route Type counters are zero because we haven't yet deployed EVPN instances.


Spine-11# sh bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 192.168.10.12, local AS number 65000
BGP table version is 6, L2VPN EVPN config peers 4, capable peers 4
0 network entries and 0 paths using 0 bytes of memory
BGP attribute entries [0/0], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor        V    AS    MsgRcvd    MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.10.101  4 65000         14         17        0    0    0 00:00:02 0
192.168.10.102  4 65000         19         20        0    0    0 00:00:02 0
192.168.10.103  4 65000          6          4        0    0    0 00:00:06 0
192.168.10.104  4 65000         14         17        0    0    0 00:00:02 0

Neighbor        T    AS PfxRcd     Type-2     Type-3     Type-4     Type-5     Type-12
192.168.10.101  I 65000 0          0          0          0          0          0
192.168.10.102  I 65000 0          0          0          0          0          0
192.168.10.103  I 65000 0          0          0          0          0          0
192.168.10.104  I 65000 0          0          0          0          0          0
Spine-11#

Example 2-21: Leaf Switches BGP Configuration.


Example 2-21 shows information and statistics about the BGP neighborship between switches Spine-11 and Leaf-101. Leaf-101 belongs to the same BGP Autonomous System (AS) area 65000 as Spine-11, making Leaf-101 an iBGP neighbor. I have highlighted the parts that confirm the functionality of our configuration. The neighborship state is "Established", indicating that the switches are ready to send and receive BGP Update messages. Spine-11 uses the logical interface Loopback10 as its source address in BGP Update messages. The Capabilities and Graceful Restart sections show that the switches support the BGP address family L2VPN EVPN. At the end of the output, we see that Leaf-101 is configured as a Route-Reflector Client.
Spine-11# sh bgp l2vpn evpn neighbors 192.168.10.101
BGP neighbor is 192.168.10.101, remote AS 65000, ibgp link, Peer index 3
  BGP version 4, remote router ID 192.168.10.101
  Neighbor previous state = OpenConfirm
  BGP state = Established, up for 00:02:40
  Neighbor vrf: default
  Using loopback10 as update source for this peer
  Using iod 71 (loopback10) as update source
  Last read 00:00:35, hold time = 180, keepalive interval is 60 seconds
  Last written 00:00:35, keepalive timer expiry due 00:00:24
  Received 18 messages, 0 notifications, 0 bytes in queue
  Sent 21 messages, 1 notifications, 0(0) bytes in queue
  Enhanced error processing: On
    0 discarded attributes
  Connections established 2, dropped 1
  Last update recd 00:02:35, Last update sent  = never
   Last reset by us 00:02:51, due to router-id configuration change
  Last error length sent: 0
  Reset error value sent: 0
  Reset error sent major: 6 minor: 107
  Notification data sent:
  Last reset by peer never, due to No error
  Last error length received: 0
  Reset error value received 0
  Reset error received major: 0 minor: 0
  Notification data received:

  Neighbor capabilities:
  Dynamic capability: advertised (mp, refresh, gr) received (mp, refresh, gr)
  Dynamic capability (old): advertised received
  Route refresh capability (new): advertised received
  Route refresh capability (old): advertised received
  4-Byte AS capability: advertised received
  Address family L2VPN EVPN: advertised received
  Graceful Restart capability: advertised received

  Graceful Restart Parameters:
  Address families advertised to peer:
    L2VPN EVPN
  Address families received from peer:
    L2VPN EVPN
  Forwarding state preserved by peer for:
  Restart time advertised to peer: 120 seconds
  Stale time for routes advertised by peer: 300 seconds
  Restart time advertised by peer: 120 seconds
  Extended Next Hop Encoding Capability: advertised received
  Receive IPv6 next hop encoding Capability for AF:
    IPv4 Unicast  VPNv4 Unicast

  Message statistics:
                              Sent               Rcvd
  Opens:                         4                  2
  Notifications:                 1                  0
  Updates:                       2                  2
  Keepalives:                   12                 12
  Route Refresh:                 0                  0
  Capability:                    2                  2
  Total:                        21                 18
  Total bytes:                 327                306
  Bytes in queue:                0                  0

  For address family: L2VPN EVPN
  BGP table version 10, neighbor version 10
  0 accepted prefixes (0 paths), consuming 0 bytes of memory
  0 received prefixes treated as withdrawn
  0 sent prefixes (0 paths)
  Community attribute sent to this neighbor
  Extended community attribute sent to this neighbor
  Third-party Nexthop will not be computed.
  Advertise GW IP is enabled
  Route reflector client
  Last End-of-RIB received 00:00:05 after session start
  Last End-of-RIB sent 00:00:05 after session start
  First convergence 00:00:05 after session start with 0 routes sent

  Local host: 192.168.10.11, Local port: 33940
  Foreign host: 192.168.10.101, Foreign port: 179
  fd = 90
Example 2-21: Leaf Switches BGP Configuration.

Overlay Network Data Plane: VXLAN 



NVE Interface Configuration


Example 2-22 shows the configuration of the NVE interface and the required feature configuration for client overlay networks. The "feature nv overlay" enables VXLAN overlay networks. The "feature vn-segment-vlan-based" specifies that only the MAC addresses of the VLAN associated with the respective EVPN instance (EVI) are stored in the MAC-VRF's Layer2 RIB (L2RIB). In other words, the EVPN instance forms a single broadcast domain. Under the NVE interface, we define the logical interface Loopback20's IP address as the tunnel source address. Additionally, we specify that the NVE interface implements the Control Plane learning model, meaning the switch learns remote MAC addresses from BGP Update messages, not from the data traffic received through the tunnel interface (Data Plane learning).

feature nv overlay
feature interface-vlan
feature vn-segment-vlan-based
!
interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback20

Example 2-22: Leaf Switches BGP Configuration.

NVE Interface Verification


Example 2-23 shows the summary information about the settings of the interface NVE 1. Leaf-101 uses Loopback20 as a source interface when sending traffic over the interface NVE1. Besides, Leaf-101 uses the Control Plane learning model. Leaf-101 encodes the router MAC address to BGP Update messages as "Router MAC" Extended community associated with EVPN Route type2 (MAC-IP Advertisement Route) when the update carries both MAC and IP addresses. The remote leaf switches use it as a source MAC address in the inner Ethernet when frame when forwarding Inter-VN traffic.

Leaf-101# show nve interface nve 1
Interface: nve1, State: Up, encapsulation: VXLAN
 VPC Capability: VPC-VIP-Only [not-notified]
 Local Router MAC: 5003.0000.1b08
 Host Learning Mode: Control-Plane
 Source-Interface: loopback20 (primary: 192.168.20.101, secondary: 0.0.0.0)
Example 2-23: Leaf Switches BGP Configuration.

Example 2-24 demonstrates that Leaf-101 currently lacks any NVE peers because its VXLAN manager initiates an NVE peer relationship with other VTEPs upon receiving the first data packet over the NVE interface.


Leaf-101# show nve peers detail
Leaf-101#
Example 2-24: Leaf Switches BGP Configuration.

At this stage, we have configured the EVPN Fabric to the point where we can deploy our first EVPN instances and test and analyze both the Intra-VN and Inter-VN Control Plane and Data Plane perspectives.


No comments:

Post a Comment