Tuesday 14 May 2024

EVPN Instance Deployment Scenario 1: L2-Only EVPN Instance

In this scenario, we are building a protected Broadcast Domain (BD), which we extend to the VXLAN Tunnel Endpoint (VTEP) switches of the EVPN Fabric, Leaf-101 and Leaf-102. Note that the VTEP operates in the Network Virtualization Edge (NVE) role for the VXLAN segment. The term NVE refers to devices that encapsulate data packets to transport them over routed IP infrastructure. Another example of an NVE device is the MPLS Provider Edge (MPLS-PE) router at the edge of the MPLS network, doing MPLS labeling. The term “Tenant System” (TS) refers to a physical host, virtual machine, or an intra-tenant forwarding component attached to one or more Tenant-specific Virtual Networks. Examples of TS forwarding components include firewalls, load balancers, switches, and routers. 

We begin by configuring L2 VLAN 10 to Leaf-101 and Leaf-102 and associate it with the vn-segment 10010. From the NVE perspective, this constitutes an L2-Only network segment, meaning we do not configure an Anycast Gateway (AGW) for the segment, and it does not have any VRF association.

Next, we deploy a Layer 2 EVPN Instance (EVI) with VXLAN Network Identifier (VNI) 10010. We utilize the 'auto' option to generate the Route Distinguisher (RD) and the Route Target (RT) import and export values for the EVI. The RD value is derived from the NVE Interface IP address and the VLAN Identifier (VLAN 10) associated with the EVI, added to the base value 32767 (e.g., 192.168.100.101:32777). The use of the VLAN ID as part of the automatically generated RD value is the reason why VLAN is configured before the EVPN Instance. Similarly, the RT values are derived from the BGP ASN and the VNI (e.g., 65000:10010).

As the final step for EVPN Instance deployment, we add EVI 10010 under the NVE interface configuration as a member vni with the Multicast Group 239.1.1.1 we are using for Broadcast, Unknown Unicast, and Multicast (BUM) traffic. 

For connecting TS1 and TS2 to the Broadcast domain, we will configure Leaf-101's interface Eth 1/5 and Leaf-102's interface Eth1/3 as access ports for VLAN 10.

A few words regarding the terminology utilized in Figure 3-2. '3-Stage Routed Clos Fabric' denotes both the physical topology of the network and the model for forwarding data packets. The 3-Stage Clos topology has three switches (ingress, spine, and egress) between the attached Tenant Systems. Routed, in turn, means that switches forward packets based on the destination IP address.

With the term VXLAN Segment, I refer to a stretched Broadcast Domain, identified by the VXLAN Network Identifier value defined under the EVPN Instance on Leaf switches.



Figure 3-2: L2-Only Intra VN Connection.

Figure 3-3 depicts the Cisco Nexus Dashboard Fabric Controller (NDFC) Fabric Builder’s Resources for reserving Identifier ranges for VLANs and VXLAN Layer 2 Networks.



Figure 3-3: Cisco NDFC: Define Id Ranges for VLANs and Layer 2 VXLAN VNIs.

Deploying EVPN Instance

Local Broadcast Domain - Virtual LAN

A Broadcast Domain (BD) is a logical network segment where all connected devices share the same subnet and can reach each other with Broadcast and Unicast messages. Virtual LAN (VLAN) can be considered an abstraction of a local, switch-based BD in EVPN Fabric. Phase 1 in Figure 3-3 shows the local VLAN ID 10 and the vn-segment association with EVPN Instance 10010. Example 3-1 shows NX-OS Command Line Interface (CLI) commands for creating a VLAN.


Fabric-Wide Broadcast Domain - EVPN Instance 

EVPN Instance is identified by a Layer 2 VXLAN Network Identifier (L2VNI). Besides L2VNI, EVPN instances have a unique Route Distinguisher (RD), allowing overlapping addresses between different Tenants and BGP Route Targets (RT) for BGP import and export policies. 

We employ auto-generated RD and RT values for the EVPN instance. The IP address of the BGP RID interface serves as the global admin, while the VLAN ID value added to the Base Number 32767 forms the local admin. For instance, on Leaf-101, the resulting RD value for EVPN Instance 10010 (associated with VLAN 10) is 192.168.10.101:32777. Each VTEP uses a unique RD value to enable the differentiation of routes about the same address received from different VTEPs within the EVPN Fabric network.

The BGP RT value is created using the BGP AS Number as the global admin and the VNI defined for the EVPN instance as the local admin. In our example, the EVPN Instance receives a Route Target value of 65000:10010. All VTEP devices share the same EVPN Instance-specific RT values, ensuring the proper functioning of the EVPN route import/export policy.

Subsequently, we add the VNI of the EVPN instance under the NVE interface configuration and specify it to utilize Multicast Group 239.1.1.1 for Broadcast, Unknown unicast, and Multicast (BUM) traffic. Example 3-1 shows the NX-OS CLI commands. 



Figure 3-4: The Configuration of VLAN, EVPN Instance and NVE Interface.

Example 3-1 illustrates all the configurations required for deploying L2-Only EVPN Instance.

vlan 10
  vn-segment 10010
!
evpn 
 vni 10010 l2 
  rd auto
  route-target import auto
  route-target export auto
!
interface nve1
  member vni 10010
    mcast-group 239.1.1.1
!
Interface eth 1/5
  switchport mode access
  switchport access vlan 10

Example 3-1: L2-Only Broadcast Domain Configuration: VLAN, EVI, and NVE Interface.


High-Level Control Plane Analysis


When we create a new VLAN and associate access/trunk interfaces with it, a switch starts building an address table of source MAC addresses learned from the received frames from the local Tenant Systems. In Figure 3-4, we have connected TS1 toLeaf-101's Interface Eth1/5 (Access Port for VLAN 10). Leaf-101 records the source MAC address from the received Ethernet to the MAC Address Table, with VLAN ID 10 and next-hop interface Eth1/5. 

When an EVPN Instance is created, a Layer 2 Forwarding Manager (L2FM) begins encoding MAC address entries from the MAC Address Table associated with the EVI into the MAC-VRF, an EVI-specific Layer 2 Routing and Forwarding instance (L2RIB). In Figure 3-4, the L2FM copies MAC address information about TS1 from MAC Address Table 10. Since the address is learned locally from the Data Plane, it is designated as "Prod: Local.

The MAC information from the MAC VRF is passed to the BGP process, which then encodes this data into the BGP Loc-RIB table. The Route Target value and encapsulation type are added as EXTENDED_COMMUNITIES. The MAC Address information is encoded within the MP_REACH_NLRI Path Attribute. The Address Family Identifier (AFI) and Sub-Address Identifier (SAFI) define that this entry describes EVPN Network Layer Reachability Information (NLRI). The Next-Hop address is the IP address of the interface NVE1. The EVPN Route Type for MAC addresses is EVPN Route Type 2, MAC Advertisement Route. The Route Distinguisher is encoded as an NLRI value. Since the MAC address belongs to a VLAN without a routing interface, it is not present in the ARP Table, from which the IP address is derived for the IP field. Consequently, the IP address is not included in the NLRI. The received label field defines the L2 VNI, which the remote VTEP must use in the VXLAN tunnel header when forwarding data packets to TS1.

Figure 3-5: Local VTEP Leaf-101: MAC Address Learning Process.

Figure 3-6 shows the Remote Leaf-102 Control Plane process when it receives a BGP Update message from Leaf-101 via the Spine switch. Leaf-102 stores the BGP Update message information unchanged in the BGP-Adj-RIB table. According to the BGP import policy (import RT 65000:10010) configured for EVPN Instance 10010 on Leaf-102, the information from the BGP Update message is stored in the BGP Loc-RIB. During the transfer process, the global admin field of the NLRB's RD value is replaced with the NVE interface address of Leaf-102 (192.168.10.102). Besides, during the process, it is checked whether the BGP Update message has been received from one of Leaf-102's configured BGP Peer and whether a route to the IP address in the Next-Hop field is found in the Unicast RIB.

The MAC address, along with the Next-Hop IP address and L2VNI, is stored from the BGP Loc-RIB into the MAC-VRF. The information is marked as learned from the received BGP Update message (Prod: BGP, Flags: Received). L2VNI 10010 is associated with the Next-Hop IP address 192.168.20.101. Subsequently, L2FM stores the information in the MAC address table. Control Plane MAC indicates that the address has been learned through the Control Plane. The destination port is marked as the one indicated by MAC-VRF's Next-Hop IP.


Figure 3-6: Remote VTEP Leaf-102: MAC Address Learning Process.

Local VTEP Leaf-101: Low-Level Control Plane Analysis


In this section, we examine how the MAC address of TS1 is propagated across the Broadcast Domain from the VLAN 10 MAC address table on the local VTEP Leaf-101 to the VLAN 10 MAC address table on the remote VTEP Leaf-102. Figure 3-7 depicts the databases where the MAC address is stored during the propagation process from Leaf-101 to Leaf-102 via Spine-11.

Local Learning: MAC Address Table Update


In Figure 3-7, Interface Eth1/5 on Leaf-101 is a VLAN 10 Access port. When TS1 sends the first Ethernet frame, the Layer 2 Forwarding Manger (L2FM) records the source MAC address into VLAN 10 MAC address table. Besides the VLAN ID and the ingress port, the entry Type is dynamic. 



Figure 3-7: Local MAC Address Table Update.

Example 3-2 confirms that Leaf-101 has learned the MAC address 0050.7966.6806 from port Eth1/5 and it belongs to VLAN 10. 

Leaf-101# sh mac address vlan 10
Legend:
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan,
        (NA)- Not Applicable A – ESI Active Path, S – ESI Standby Path
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*   10     0050.7966.6806   dynamic  NA         F      F    Eth1/5
Leaf-101#

Example 3-2: VLAN 10 MAC Address Table Entry About MAC Address 0050.7966.6806.
After updating the VLAN 10 MAC address table, L2FM sends the information to the MAC-VRF table (Layer 2 RIB).


Figure 3-8: Local MAC Address Table Update > MAC-VRF.

Example 3-3 verifies that the MAC address has been learned on interface Eth1/5 (Interface Index 0x1a000800) and stored in the MAC address table (Db: 0-MACDB) by L2FM (Source: 1-LFM). Subsequently, the address information is sent to the MAC-VRF (Operation: SEND_MAC_INS_TO_L2I).
 
Leaf-101# show system internal l2fm l2dbg macdb address 0050.7966.6806 vlan 10
Legend
------
Db:  0-MACDB, 1-GWMACDB, 2-SMACDB, 3-RMDB, 4-SECMACDB  5-STAGEDB
Db:  6-MACFAILDB, 7-PEER_SYNC_DB, 8-CACHE_DB, 9-HOLD_DB
Src: 0-UNKNOWN, 1-L2FM, 2-PEER, 3-LC, 4-HSRP
     5-GLBP, 6-VRRP, 7-STP, 8-DOTX, 9-PSEC 10-CLI 11-PVLAN
     12-ETHPM, 13-ALW_LRN, 14-Non_PI_MOD, 15-MCT_DOWN, 16 - SDB
     17-OTV, 18-Deounce Timer, 19-AM, 20-PCM_DOWN, 21 - MCT_UP
     22-VxLAN, 23-L2RIB 24-CTRL, 25-UFDM 26-VRRPV3 27-VIM 28-DEJAVU 29-SMAC_MV
     30-ARP, 31-DHCP
Slot:0 based for LCS 31-MCEC 20-OTV/ORIB

 VLAN: 10 MAC: 0050.7966.6806 FE ID: 0
  Time                     If/swid    Db Op                    Src Slot  FE-BMP  Count Detail
May  3 13:27:14 2024:107617 0x1a000800 0  SEND_MAC_INS_TO_L2RI 1    0    0xffff   --

Example 3-3: MAC Address Table > MAC¬-VRF.

The example below displays the SNMP interface index to Ethernet Interface mapping.

Leaf-101# show interface snmp-ifindex | i 0x1a000800
Eth1/5          436209664  (0x1a000800)

Example 3-4: SNMP Interface Index to Ethernet Interface Mapping.

Local Learning: MAC-VRF/L2RIB Table Update


In this section, we will examine how MAC-VRF updating works.


Figure 3-9: Local MAC-VRF(L2RIB) Update Process – Receive.

Example 3-5 demonstrates how the L2RIB update process works from bottom to top. L2RIB receives local MAC address information and creates a new entry for it. The Next-Hop port is set to Eth1/5, and the VXLAN Network Identifier/EVPN Instance identifier (VNI/EVI) is set to 10010.


Leaf-101# show system internal l2rib event-history mac | i 0050.7966.6806

[l2rib_show_mac_rt:2929] (10,0050.7966.6806,3): 
VNI/EVI: 10010 rtFlags: L, adminDist: 6, seqNum: 0 ecmpLabel: 0 SOO: 0(N/A)

[l2rib_svr_mac_ent_gpb_encode:1199] (10,0050.7966.6806,3): 
Encoding MAC best route (ADD, client id 8)

[l2rib_obj_mac_route_create:3721] (10,0050.7966.6806,3): 
NH[0]: Eth1/5 

[l2rib_obj_mac_route_create:3703] (10,0050.7966.6806,3): 
MAC route created with seqNum: 0, flags: L, (),

[l2rib_obj_mac_route_create:3607] (10,0050.7966.6806,3): 
Route is local, isMacRemoteAtTheDelete: 0

[l2rib_client_show_route_msg:1787] 
Rcvd MAC ROUTE msg: (10, 0050.7966.6806), vni 0, admin_dist 0, seq 0, soo 0,

Example 3-5: MAC Learning Process: MAC-VRF (L2RIB) Update.

Example 3-6 shows that we have associated VLAN 10 with VNI/EVI 10010.

Leaf-101# show vlan id 10 vn-segment 
VLAN Segment-id
---- -----------
10   10010       

Example 3-6: Local VLAN to EVPN Instance Mapping.

Example 3-7 illustrates the MAC address stored in the MAC-VRF with its reachability information. The MAC address belongs to VLAN 10 (Topology 10) and was learned locally (Prod: Local, Flags: L). The Next-Hops field points to the local ingress interface Eth1/5.


Leaf-101# show l2route evpn mac evi vni 10010

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Asy):Asymmetric (Gw):Gateway
(Bh):Blackhole, (Dum):Dummy
(Pf):Permanently-Frozen, (Orp): Orphan

(PipOrp): Directly connected Orphan to PIP based vPC BGW
(PipPeerOrp): Orphan connected to peer of PIP based vPC BGW
Topology    Mac Address    Prod   Flags              Seq No     Next-Hops
----------- -------------- ------ ------------------ ---------- ---------
10          0050.7966.6806 Local  L,                 0          Eth1/5

Example 3-7: MAC-VRF (L2RIB) Entry About MAC Address 0050.7966.6806.


Figure 3-10: MAC Address Propagation Process – From L2RIB to BGP Loc-RIB.

Example 3-8 displays the information that is sent from the MAC-VRF to the BGP process.

Leaf-101# show l2route dataplane mac topology 10 detail

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Asy):Asymmetric (Gw):Gateway
(Bh):Blackhole, (Dum):Dummy
(Pf):Permanently-Frozen, (Orp): Orphan

(PipOrp): Directly connected Orphan to PIP based vPC BGW
(PipPeerOrp): Orphan connected to peer of PIP based vPC BGW
Topology    Mac Address    Prod   Flags              Seq No     Next-Hops
----------- -------------- ------ ------------------ ---------- --------------------
10          0050.7966.6806 Local  L,                 0          Eth1/5
            Route Resolution Type: Regular
            Forwarding State: Resolved
            Sent To: BGP

Example 3-8: MAC-VRF (L2RIB) to BGP Table.

Local Learning: BGP Processes


First, we examine how the BGP process receives the MAC address with its reachability information.


Figure 3-11: MAC Address Propagation Process – From BGP Loc-RIB.

Example 3-9 illustrates how the BGP process receives the MAC Route 0050.7966.6806 from the L2RIB on L2VNI 10010.


Leaf-101# show bgp event-history l2rib | i 0050.7966.6806

<snip>
[bgp_l2rib_message_cb:15240] 
L2RIB: (EVI 0/10010) Received add MAC route 0050.7966.6806 ESI none flags 0x000002 soo 0 seq 0 reorig 0 tag 0 pctag 0 blackhole no dummy_mac 0
<snip>

Example 3-9: BGP Table Update Process#1.

Example 3-10 confirms that the MAC address is stored in the BGP table as an EVPN Route Type 2 (MAC Advertisement Route). We examine the fields of the MAC Advertisement Route in Example 3-11, which displays the MAC Advertisement Route entry in the BGP Loc-RIB.


Leaf-101# show bgp internal event-history events | i 0050.7966.6806

<snip>
[bgp_l2vpn_evpn_process_one_prefix:9559] (default) RIB: [L2VPN EVPN] (EVI L2-10010) add prefix 192.168.10.101:32777:[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.0] OK, 1 local paths in BRIB
<snip>

Example 3-10: BGP Table Update Process#2.

Figure 3-9: MAC Address Propagation Process – Local MAC-VRF Address Table.

Example 3-11 shows the complete MAC Address information stored in the BGP Loc-RIB. The first part of the auto-generated Route Distinguisher (RD) is the Leaf-101's BGP RID. The second part adds a VLAN ID to the base number 32767. In our example, the complete RD is 192.168.10.101:32777. For MAC-Only routing entries, the significant address fields are:
  • [2 ] - EVPN Route Type: MAC Advertisement Route,
  • [48] - MAC address Length MAC in bits [48]
  • [00507966.6806] –MAC address. 
All other fields are set to zero. 

The BGP process uses the IP address 192.168.20.101 of Interface NVE1 as the next-hop and IP address 192.168.10.101 as the BGP Update source address. The BGP process derives the Extended Communities Route Target (RT) 65000:10010 from BGP ASN and EVPN Instance ID and the Encapsulation type from the NVE1 tunnel interface, which uses VXLAN encapsulation.

The last row shows that the MAC Advertisement route is advertised to both our spine switches, to Spine-11 (192.168.10.11) and Spine-12 (192.168.10.12). 


Leaf-101# sh bgp l2vpn evpn 0050.7966.6806
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.10.101:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.0]/216, version 43
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn
Multipath: eBGP iBGP

  Advertised path-id 1
  Path type: local, path is valid, is best path, no labeled nexthop
  AS-Path: NONE, path locally originated
    192.168.20.101 (metric 0) from 0.0.0.0 (192.168.10.101)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10010
      Extcommunity: RT:65000:10010 ENCAP:8

  Path-id 1 advertised to peers:
    192.168.10.11      192.168.10.12

Example 3-11: BGP Table Routing Entry for MAC Address of TS1.


After encoding the BGP EVPN MAC Advertisement route into the BGP-Loc RIB, the BGP process proceeds to program this information into neighbor-specific Adj-RIB-Out tables for all eligible BGP EVPN peers. From the perspective of Leaf-101, 'eligible' means that only local MAC Advertisement routes will be propagated to spine switches. The default BGP loop prevention mechanism ensures that routes learned from one iBGP peer are not advertised to another iBGP peer. In other words, Leaf-101 refrains from advertising NLRIs learned from Spine-11 to Spine-12.



Figure 3-12: MAC Address Propagation Process – From BGP Adj-RIB-Out.

In Example 3-13, Leaf-101 sends the EVPN MAC Advertisement Route to Spine-11. The Route Distinguisher is 192.168.10.101:32777, and the published local route is [2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.0]/216 with the next-hop 192.168.20.101 (NVE1). The BGP Update message instructs remote VTEPs to utilize Layer 2 VXLAN Identifier (L2VNI) 10010 in the VXLAN tunnel header for this destination.

Leaf-101# show bgp l2vpn evpn neighbors 192.168.10.11 advertised-routes

Peer 192.168.10.11 routes for address family L2VPN EVPN:
BGP table version is 35, Local Router ID is 192.168.10.101
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - best2

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 192.168.10.101:32777    (L2VNI 10010)
*>l[2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.0]/216
                      192.168.20.101                    100      32768 i

Example 3-14: EVPN MAC Advertisement Route from Leaf-101 to Spine-11.

Capture 3-1 shows the packet capture about the BGP Update message carrying EVPN MAC Advertisement sent by Leaf-101 to Spine-11. 

Ethernet II, Src: 50:03:00:00:1b:08, Dst: 50:01:00:00:1b:08
Internet Protocol Version 4, Src: 192.168.10.101, Dst: 192.168.10.11
Transmission Control Protocol, Src Port: 36574, Dst Port: 179, Seq: 1, Ack: 1, Len: 104
<snip>
    TCP payload (104 bytes)
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 104
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 81
    Path attributes
        Path Attribute - MP_REACH_NLRI
            <snip>
            Type Code: MP_REACH_NLRI (14)
            Length: 44
            Address family identifier (AFI): Layer-2 VPN (25)
            Subsequent address family identifier (SAFI): EVPN (70)
            Next hop: 192.168.20.101
                IPv4 Address: 192.168.20.101
            Number of Subnetwork points of attachment (SNPA): 0
            Network Layer Reachability Information (NLRI)
                EVPN NLRI: MAC Advertisement Route
                    Route Type: MAC Advertisement Route (2)
                    Length: 33
                    Route Distinguisher: 0001c0a80a658009 (192.168.10.101:32777)
                    ESI: 00:00:00:00:00:00:00:00:00:00
                        ESI Type: ESI 9 bytes value (0)
                        ESI Value: 00 00 00 00 00 00 00 00 00
                        ESI 9 bytes value: 00 00 00 00 00 00 00 00 00
                    Ethernet Tag ID: 0
                    MAC Address Length: 48
                    MAC Address: 00:50:79:66:68:06
                    IP Address Length: 0
                    IP Address: NOT INCLUDED
                    VNI: 10010
        Path Attribute - ORIGIN: IGP
        Path Attribute - AS_PATH: empty
        Path Attribute - LOCAL_PREF: 100
        Path Attribute - EXTENDED_COMMUNITIES
            <snip>
            Type Code: EXTENDED_COMMUNITIES (16)
            Length: 16
            Carried extended communities: (2 communities)
                Route Target: 65000:10010 [Transitive 2-Octet AS-Specific]
                Encapsulation: VXLAN Encapsulation [Transitive Opaque]
                    Tunnel type: VXLAN Encapsulation (8)

Capture 3-1: BGP Update Message from Leaf-101 to Spine-11.


Example 3-14 verifies that Spine-11 has received the MAC Advertisement route from Leaf-101. However, as the example shows, Spine-11 has not imported it to Loc-RIB. And how do we know that? First, the Route Distinguisher global admin value is the BGP RID of Leaf-101, not the BGP RID of Spine itself. Second, neither "Import to" nor "Import from" sections are listed on the output. The last line verifies that Spine-11, as a BGP Route Reflector, has advertised this EVPN MAC Advertisement route to leaf switches Leaf-102, Leaf-103, and Leaf-104.

Spine-11# sh bgp l2vpn evpn 0050.7966.6806
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.10.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.7966.6806]:[0]:[0.0.0.0]/216, version 130
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW
Multipath: eBGP iBGP

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop
  AS-Path: NONE, path sourced internal to AS
    192.168.20.101 (metric 41) from 192.168.10.101 (192.168.10.101)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010
      Extcommunity: RT:65000:10010 ENCAP:8

  Path-id 1 advertised to peers:
    192.168.10.102     192.168.10.103     192.168.10.104

Example 3-14: Spine-11: Received EVPN MAC Advertisement Route.


Wednesday 8 May 2024

Deploying and Analyze EVPN Instances: Deployment Scenarios

In the previous section, we built a Single-AS EVPN Fabric with OSPF-enabled Underlay Unicast routing and PIM-SM for Multicast routing using Any Source Multicast service. In this section, we configure two L2-Only EVPN Instances (L2-EVI) and two L2/L3 EVPN Instances (L2/3-EVI) in the EVPN Fabric. We examine their operations in six scenarios depicted in Figure 3-1.

Scenario 1 (L2-Only EVI, Intra-VN): 

In the Deployment section, we configure an L2-Only EVI with a Layer 2 VXLAN Network Identifier (L2VNI) of 10010. The Default Gateway for the VLAN associated with the EVI is a firewall. In the Analyze section, we observe the Control Plane and Data Plane operation when a) connecting Tenant Systems TS1 and TS2 to the segment, and b) TS1 communicates with TS2 (Intra-VN Communication).

Scenario 2 (L2-Only EVI, Inter-VN): 

In the Deployment section, we configure another L2-Only EVI with L2VNI 10020, to which we attach TS3 and TS4. In the Analyze section, we examine EVPN Fabric's Control Plane and Data Plane operations when TS2 (L2VNI 10010) sends data to TS3 (L2VNI 10020), Inter-VN Communication.

Scenario 3 (L2/L3 EVI, Intra-VN): 

In the Deployment section, we configure a Virtual Routing and Forwarding (VRF) Instance named VRF-NWKT with L3VNI 10077. Next, we configure the EVI with L2VNI 10030. We attach VLAN 10 to this segment, which Anycast Gateway (AGW) we bind to the routing domain VRF-NWKT. In the Analyze section, we study the Control Plane process when TS5 joins the network, focusing mainly on TS5's host IP address propagation.

Scenario 4 (Intra-VN, Silent Host): 

In the Deployment section, we configure an EVI with L2VNI 10040 in the EVPN Fabric, where the VLAN attached to it belongs to the same routing domain VRF-NWKT as EVI 10030. This EVI includes a "Silent Host" TS8, which generates no data traffic unless requested. Besides, we publish the segment-specific subnetwork within the routing domain VRF-NWKT. In the Analyze section, we focus on examining the Control Plane aspect of the EVPN Route Type 5 (IP Prefix Route) process.

Scenario 5 (Inter-VN, Symmetric IRB): 

In this section, we examine the Integrated Routing and Bridging (IRB) Symmetric routing model between two EVPN Instances. We analyze Control Plane and Data Plane functionality by studying Inter-VN communication from the perspective of TS6 to destinations TS7 and TS8 (silent host).

Scenario 6 (Inter-VN between protected and unprotected VNs): 

In this final scenario's Deployment section, we configure the firewall to advertise the subnetworks of protected L2-Only EVPN instances to the routing domain VRF-NWKT. Then, in the Analyze section, we examine how these networks appear to unprotected EVPN Instances attached to the VRF-NWKT routing domain. We also investigate Data Plane packet forwarding concerning traffic between TS5 and TS1.

We will go through each scenario in detail in the upcoming chapters.

Figure 3-1: EVPN Instance Deploying and Analyzing Scenarios.


Thursday 2 May 2024

Configuration of BGP afi/safi L2VPN EVPN and NVE Tunnel Interface

Overlay Network Routing: MP-BGP L2VPN/EVPN



EVPN Fabric Data Plane – MP-BGP


Instead of being a protocol, EVPN is a solution that utilizes the Multi-Protocol Border Gateway Protocol (MP-BGP) for its control plane in an overlay network. Besides, EVPN employs Virtual eXtensible Local Area Network (VXLAN) encapsulation for the data plane of the overlay network.

Multi-Protocol BGP (MP-BGP) is an extension of BGP-4 that allows BGP speakers to encode Network Layer Reachability Information (NLRI) of various address types, including IPv4/6, VPNv4, and MAC addresses, into BGP Update messages. The MP_REACH_NLRI path attribute (PA) carried within MP-BGP update messages includes Address Family Identifier (AFI) and Subsequent Address Family Identifier (SAFI) attributes. The combination of AFI and SAFI determines the semantics of the carried Network Layer Reachability Information (NLRI). For example, AFI-25 (L2VPN) with SAFI-70 (EVPN) defines an MP-BGP-based L2VPN solution, which extends a broadcast domain in a multipoint manner over a routed IPv4 infrastructure using an Ethernet VPN (EVPN) solution.

BGP EVPN Route Types (BGP RT) carried in BGP update messages describe the advertised EVPN NLRIs (Network Layer Reachability Information) type. Besides publishing IP Prefix information with IP Prefix Route (EVPN RT 5), BGP EVPN uses MAC Advertisement Route (EVPN RT 2) for advertising hosts’ MAC/IP address reachability information. The Virtual Network Identifiers (VNI) describe the VXLAN segment of the advertised MAC/IP addresses. 

Among these two fundamental route types, BGP EVPN can create a shared delivery tree for Layer 2 Broadcast, Unknown Unicast, and Multicast (BUM) traffic using Inclusive Multicast Route (EVPN RT 3) for joining an Ingress Replication tunnel. This solution does not require a Multicast-enabled Underlay Network. Another option for BUM traffic is Multicast capable Underlay Network.

While EVPN RT 3 is used for building a Multicast tree for BUM traffic, The Tenant Routed Multicast (TRM) solution provides tenant-specific multicast forwarding between senders and receivers. TRM is based on the Multicast VPN (BGP AFI:1/SAFI:5 – Ipv4/Mcast-VPN). TRM uses MVPN Source Active A-D Route (MVPN RT 5) for publishing Multicast stream source address and group). 

Using BGP EVPN's native multihoming solution, we can establish a Port-Channel between Tenant Systems (TS) and two or more VTEP switches. From the perspective of the TS, a traditional Port-Channel is deployed by bundling a set of Ethernet links into a single logical link. On the multihoming VTEP switches, these links are associated with a logical Port-Channel interface referred to as Ethernet Segments (ES).

EVPN utilizes the EVPN Ethernet Segment Route (EVPN RT 4) as a signaling mechanism between member units to indicate which Ethernet Segments they are connected to. Additionally, VTEP switches use this EVPN RT 4 for selecting a Designated Forwarder (DF) for Broadcast, Unknown unicast, and Multicast (BUM) traffic.

When EVPN Multihoming is enabled on a set of VTEP switches, all local MAC/IP Advertisement Routes include the ES Type and ES Identifier. The EVPN multihoming solution employs the EVPN Ethernet A-D Route (EVPN RT 1) for rapid convergence. Leveraging EVPN RT 1, a VTEP switch can withdraw all MAC/IP Addresses learned via failed ES at once by describing the ESI value in MP-UNREACH-NLRI Path Attribute. 

Note! ESI multi-homing is supported only on the first-generation Cisco Nexus 9300 switches. Nexus 9200, 9300-EX switches and newer models doesn’t support ESI multi-homing. 

An EVPN fabric employs a proactive Control Plane learning model, while networks based on Spanning Tree Protocol (STP) rely on a reactive flood-and-learn-based Data Plane learning model. In an EVPN fabric, data paths between Tenant Systems are established prior to data exchange. It's worth noting that without enabling ARP suppression, local VTEP switches flood ARP Request messages. However, remote VTEP switches do not learn the source MAC address from the VXLAN encapsulated frames.

BGP EVPN provides various methods for filtering reachability information. For instance, we can establish an import/export policy based on BGP Route Targets (BGP RT). Additionally, we can deploy ingress/egress filters using elements such as prefix-lists or BGP path attributes, like BGP Autonomous System numbers. Besides, BGP, OSPF, and IS-to-IS all support peer authentication.

EVPN Fabric Data Plane –VXLAN


The Virtual eXtensible LAN (VXLAN) is an encapsulation schema that enables Broadcast Domain/VLAN stretching over a Layer 3 network. Switches or hosts performing encapsulation/decapsulation are called VXLAN Tunnel End Points (VTEP). VTEPs encapsulate the Ethernet frames, originated by local Tenant Systems (TS), within outer MAC and IP headers followed by UDP header with the destination port 4789 and source port is calculated from the payload. Between the UDP header and the original Ethernet frame is the VXLAN header describing the VXLAN segment with VXLAN Network Identifier (VNI). A VNI is a 24-bit field, theoretically allowing for over 16 million unique VXLAN segments. 

VTEP devices allocate Layer 2 VNI (L2VNI) for Intra-VN connection and Layer 3 VNI (L3VNI) for Inter-NV connection. There are unique L2VNI for each VXLAN segment but one common L3VNI  for tenant-specific Inter-VN communication. Besides, the Generic Protocol Extension for VXLAN (VXLAN-GPE) enables leaf switches to add Group Policy information to data packets. 

When a VTEP receives a EVPN NLRI from the remote VTEP with importable Route Targets, it validates the route by checking that it has received from the configured BGP peer and with the right remote ASN and reachable source IP address. Then, it installs the NLRI (RD, Encapsulation Type, Next Hop, other standard and extended communities and VNIs) information into BGP Loc-RIB. Note that the local administrator part of the RD may change during the process if the VN segment is associated with another VLAN than in the remote VTEP. Remember that VLANs are locally significant, while EVPN Instances has fabric-wide meaning. Next, the best MAC route (or routes, ECMP is enabled) is encoded into L2RIB with the topology information (VLAN Id associated with the VXLAN segment) and the next-hop information. Besides, L2RIB describes the route source as BGP. Finally, L2FM programs the information into MAC address table and sets the NVE peer interface Id as next-hop. Note that VXLAN Manager learns VXLAN peers from the data plane based on the source IP address. 

Our EVPN Fabric is a Single-AS solution, where Leaf and Spine switches are in the same BGP AS area, making Leaf-Spine switches iBGP neighbors. We assign a BGP AS area 6500 to all switches and configure both Spine switches as BGP Route Reflectors, as shown in Figure 2-6. We reserve the IP subnet 192.168.10.0/24 for the Overlay network's BGP process, from which we take IP addresses for the logical interface Loopback 10. We use these addresses as a) BGP Router Identifiers (BRIDs), b) defining BGP neighbors and c) source addresses for BGP Update messages.

Leaf switches act as VXLAN Tunnel Endpoints (VTEPs), responsible for encapsulating/decapsulating data packets to/from Customer networks on the Fabric's Transport network side. A logical Network Virtual Edge (NVE) interfaces of Leaf switches use VXLAN tunneling, where the tunnel source IP address is the IP address of Loopback 20. We reserve the subnet 192.168.20.0/24 for this purpose, as shown in Figure 2-6. 

In Figure 2-6, I have listed the VTEP Loopback identifier and IP address sections belonging to the Underlay network. The reason is that the source/destination IP addresses used for tunneling between VTEP devices must be routable by the devices in the Transport network (Underlay Network). In the context of BGP EVPN, the term "Overlay" refers to the fact that it advertises only the MAC and IP addresses and subnets required for IP communication among devices connected to EVPN segments.

The following image also lists mandatory NX-OS features that we must enable to configure both the BGP EVPN Control Plane and the Data Plane.



Figure 2-6: EVPN Fabric Overlay Network Control Plane and Data Plane.


Image 2-7 depicts our implementation of a Single-AS EVPN Fabric. The Spine switch serves as a BGP Route Reflector, forwarding BGP Update messages from Leaf switches to other Leaf switches. The BGP process on Leaf switches sets the IP address of the Loopback 10 interface as the Next-hop in the MP_REACH_NLRI Path Attribute for all advertised EVPN NLRI Route Types.

The Network Virtual Edge (NVE) interfaces use the IP address of Loopback 10 for VXLAN tunneling. The NVE interface sub-command "host reachability protocol BGP" instructs the NVE interface to use the Control Plane learning model based on the received BGP Updates about EVPN NLRIs.




Figure 2-7: EVPN Fabric Overlay Network Control Plane and Data Plane Building Blocks.



BGP EVPN Configuration


Example 2-18 shows the configuration of Spine-12 for BGP. The first two commands enable BGP EVPN. In the actual BGP configuration, we first specify the BGP AS number as 65000. Then, we attach the IP address we defined for Loopback 10 as the BGP Route ID. The command Address-family l2vpn evpn with the subcommand maximum-paths 2 enables flow-based load sharing across two BGP peers if their EVPN NLRI AS_PATH attributes are identical. The commonly used term for this is Equal Cost Multi-Pathing (ECMP). 

Using the neighbor command, we define the BGP neighbor's IP address. For each BGP neighbor, we define a BGP AS number and the source IP address for the locally generated BGP Update messages. With the command address-family l2vpn, we indicate that we want to exchange EVPN NLRI information with this neighbor. 

Depending on the advertised EVPN Route Type, a set of BGP Extended Community attributes are carried with advertised EVPN NLRIs. Hence, we need the command send-community extended. By default, the BGP loop prevention mechanism prevents iBGP peers from advertising NLRI information learned from other iBGP peers. We bypass this mechanism by configuring the Spine switches as BGP Route Reflectors using the neighbor-specific route-reflector-client command.


feature bgp
nv overlay evpn
!
router bgp 65000
  router-id 192.168.10.12
  address-family l2vpn evpn
    maximum-paths 2
  neighbor 192.168.10.101
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
!
  neighbor 192.168.10.102
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
!
  neighbor 192.168.10.103
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
!
  neighbor 192.168.10.104
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client

Example 2-18: Spine Switches BGP Configuration.
Example 2-19 illustrates the BGP configuration of switch Leaf-101. The BGP configurations of all Leaf switches are identical except for the BGP router ID.

feature bgp
nv overlay evpn
!
router bgp 65000
  router-id 192.168.10.101
  address-family l2vpn evpn
    maximum-paths 2
  neighbor 192.168.10.11
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended

  neighbor 192.168.10.12
    remote-as 65000
    update-source loopback10
    address-family l2vpn evpn
      send-community
      send-community extended

Example 2-19: Leaf Switches BGP Configuration.

BGP EVPN Verification

From Example 2-20, we can see the BGP commands we have associated with the BGP neighbor Leaf-101 on Spine-11.


Spine-11# sh bgp l2vpn evpn neighbors 192.168.10.101 commands
Command information for 192.168.10.101
                 Update Source: locally configured
                     Remote AS: locally configured

 Address Family: L2VPN EVPN
                Send Community: locally configured
            Send Ext-community: locally configured
        Route Reflector Client: locally configured
Spine-11#

Example 2-20: Leaf Switches BGP Configuration.

Example 2-21 shows the BGP neighbors of Spine-11 with their AS numbers and statistics regarding received and sent BGP messages (Open, Keepalive, Update, and Notification). All EVPN Route Type counters are zero because we haven't yet deployed EVPN instances.


Spine-11# sh bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 192.168.10.12, local AS number 65000
BGP table version is 6, L2VPN EVPN config peers 4, capable peers 4
0 network entries and 0 paths using 0 bytes of memory
BGP attribute entries [0/0], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor        V    AS    MsgRcvd    MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.10.101  4 65000         14         17        0    0    0 00:00:02 0
192.168.10.102  4 65000         19         20        0    0    0 00:00:02 0
192.168.10.103  4 65000          6          4        0    0    0 00:00:06 0
192.168.10.104  4 65000         14         17        0    0    0 00:00:02 0

Neighbor        T    AS PfxRcd     Type-2     Type-3     Type-4     Type-5     Type-12
192.168.10.101  I 65000 0          0          0          0          0          0
192.168.10.102  I 65000 0          0          0          0          0          0
192.168.10.103  I 65000 0          0          0          0          0          0
192.168.10.104  I 65000 0          0          0          0          0          0
Spine-11#

Example 2-21: Leaf Switches BGP Configuration.


Example 2-21 shows information and statistics about the BGP neighborship between switches Spine-11 and Leaf-101. Leaf-101 belongs to the same BGP Autonomous System (AS) area 65000 as Spine-11, making Leaf-101 an iBGP neighbor. I have highlighted the parts that confirm the functionality of our configuration. The neighborship state is "Established", indicating that the switches are ready to send and receive BGP Update messages. Spine-11 uses the logical interface Loopback10 as its source address in BGP Update messages. The Capabilities and Graceful Restart sections show that the switches support the BGP address family L2VPN EVPN. At the end of the output, we see that Leaf-101 is configured as a Route-Reflector Client.
Spine-11# sh bgp l2vpn evpn neighbors 192.168.10.101
BGP neighbor is 192.168.10.101, remote AS 65000, ibgp link, Peer index 3
  BGP version 4, remote router ID 192.168.10.101
  Neighbor previous state = OpenConfirm
  BGP state = Established, up for 00:02:40
  Neighbor vrf: default
  Using loopback10 as update source for this peer
  Using iod 71 (loopback10) as update source
  Last read 00:00:35, hold time = 180, keepalive interval is 60 seconds
  Last written 00:00:35, keepalive timer expiry due 00:00:24
  Received 18 messages, 0 notifications, 0 bytes in queue
  Sent 21 messages, 1 notifications, 0(0) bytes in queue
  Enhanced error processing: On
    0 discarded attributes
  Connections established 2, dropped 1
  Last update recd 00:02:35, Last update sent  = never
   Last reset by us 00:02:51, due to router-id configuration change
  Last error length sent: 0
  Reset error value sent: 0
  Reset error sent major: 6 minor: 107
  Notification data sent:
  Last reset by peer never, due to No error
  Last error length received: 0
  Reset error value received 0
  Reset error received major: 0 minor: 0
  Notification data received:

  Neighbor capabilities:
  Dynamic capability: advertised (mp, refresh, gr) received (mp, refresh, gr)
  Dynamic capability (old): advertised received
  Route refresh capability (new): advertised received
  Route refresh capability (old): advertised received
  4-Byte AS capability: advertised received
  Address family L2VPN EVPN: advertised received
  Graceful Restart capability: advertised received

  Graceful Restart Parameters:
  Address families advertised to peer:
    L2VPN EVPN
  Address families received from peer:
    L2VPN EVPN
  Forwarding state preserved by peer for:
  Restart time advertised to peer: 120 seconds
  Stale time for routes advertised by peer: 300 seconds
  Restart time advertised by peer: 120 seconds
  Extended Next Hop Encoding Capability: advertised received
  Receive IPv6 next hop encoding Capability for AF:
    IPv4 Unicast  VPNv4 Unicast

  Message statistics:
                              Sent               Rcvd
  Opens:                         4                  2
  Notifications:                 1                  0
  Updates:                       2                  2
  Keepalives:                   12                 12
  Route Refresh:                 0                  0
  Capability:                    2                  2
  Total:                        21                 18
  Total bytes:                 327                306
  Bytes in queue:                0                  0

  For address family: L2VPN EVPN
  BGP table version 10, neighbor version 10
  0 accepted prefixes (0 paths), consuming 0 bytes of memory
  0 received prefixes treated as withdrawn
  0 sent prefixes (0 paths)
  Community attribute sent to this neighbor
  Extended community attribute sent to this neighbor
  Third-party Nexthop will not be computed.
  Advertise GW IP is enabled
  Route reflector client
  Last End-of-RIB received 00:00:05 after session start
  Last End-of-RIB sent 00:00:05 after session start
  First convergence 00:00:05 after session start with 0 routes sent

  Local host: 192.168.10.11, Local port: 33940
  Foreign host: 192.168.10.101, Foreign port: 179
  fd = 90
Example 2-21: Leaf Switches BGP Configuration.

Overlay Network Data Plane: VXLAN 



NVE Interface Configuration


Example 2-22 shows the configuration of the NVE interface and the required feature configuration for client overlay networks. The "feature nv overlay" enables VXLAN overlay networks. The "feature vn-segment-vlan-based" specifies that only the MAC addresses of the VLAN associated with the respective EVPN instance (EVI) are stored in the MAC-VRF's Layer2 RIB (L2RIB). In other words, the EVPN instance forms a single broadcast domain. Under the NVE interface, we define the logical interface Loopback20's IP address as the tunnel source address. Additionally, we specify that the NVE interface implements the Control Plane learning model, meaning the switch learns remote MAC addresses from BGP Update messages, not from the data traffic received through the tunnel interface (Data Plane learning).

feature nv overlay
feature interface-vlan
feature vn-segment-vlan-based
!
interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback20

Example 2-22: Leaf Switches BGP Configuration.

NVE Interface Verification


Example 2-23 shows the summary information about the settings of the interface NVE 1. Leaf-101 uses Loopback20 as a source interface when sending traffic over the interface NVE1. Besides, Leaf-101 uses the Control Plane learning model. Leaf-101 encodes the router MAC address to BGP Update messages as "Router MAC" Extended community associated with EVPN Route type2 (MAC-IP Advertisement Route) when the update carries both MAC and IP addresses. The remote leaf switches use it as a source MAC address in the inner Ethernet when frame when forwarding Inter-VN traffic.

Leaf-101# show nve interface nve 1
Interface: nve1, State: Up, encapsulation: VXLAN
 VPC Capability: VPC-VIP-Only [not-notified]
 Local Router MAC: 5003.0000.1b08
 Host Learning Mode: Control-Plane
 Source-Interface: loopback20 (primary: 192.168.20.101, secondary: 0.0.0.0)
Example 2-23: Leaf Switches BGP Configuration.

Example 2-24 demonstrates that Leaf-101 currently lacks any NVE peers because its VXLAN manager initiates an NVE peer relationship with other VTEPs upon receiving the first data packet over the NVE interface.


Leaf-101# show nve peers detail
Leaf-101#
Example 2-24: Leaf Switches BGP Configuration.

At this stage, we have configured the EVPN Fabric to the point where we can deploy our first EVPN instances and test and analyze both the Intra-VN and Inter-VN Control Plane and Data Plane perspectives.