Sunday 5 May 2019

VXLAN Underlay Routing - Part IV: Two-AS eBGP


Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

eBGP as an Underlay Network Routing Protocol: Two-AS eBGP

This post explains the Two-AS eBGP solution in VXLAN Fabric, where there is single AS Area for all Leaf switches and other AS Area for all Spine switches. It also discusses how the default operating model used in eBGP peering has to be modified in order to achieve a routing solution required by VXLAN Fabric. These modifications are mainly related to BGP loop prevention model and BGP next-hop path-attribute processing.

Figure 1-1 illustrates the topology used in this chapter. Leaf-101 and Leaf-102 both belong to BGP AS 65000, while Spine-11 belongs to BGP AS 65099. Loopback interfaces used for Overlay Network BGP peering (L100) and for NVE peering (L50) are advertised over BGP AFI IPv4 peering (Underlay Network Control Plane). Host MAC/IP address information is advertised over BGP AFI L2VPN EVPN peering (Overlay Network Control Plane). Ethernet frames between host Café and Abba are encapsulated with a VXLAN tunnel header where the source and destination IP addresses used in the outer IP header are taken from NVE1 interfaces.





Figure 1-1: High-Level operation of VXLAN Fabric



Underlay Network Control Plane eBGP

Figure 1-2 illustrates the Underlay Network addressing scheme and the peering model. BGP IPv4 peering is configured between the physical interfaces. Examples 1-1 through the 1-3 shows the basic BGP IPv4 peering configurations of switches. Both Leaf-101 and Leaf-102 advertise the IP addresses of Loopback 50 and Loopback 100 to Spine-11 while Spine-11 only advertises IP address of Loopback 50 to leaf switches.


Figure 1-2: VXLAN Fabric Underlay Network eBGP IPv4 peering.



router bgp 65000
  router-id 192.168.0.101
  address-family ipv4 unicast
    network 192.168.50.101/32
    network 192.168.100.101/32
  neighbor 10.101.11.11
    remote-as 65099
    description ** BGP Underlay to Spine-11 **
    address-family ipv4 unicast
Example 1-1: Leaf-101 basic BGP configuration.

router bgp 65000
  router-id 192.168.0.102
  address-family ipv4 unicast
    network 192.168.50.102/32
    network 192.168.100.102/32
  neighbor 10.102.11.11
    remote-as 65099
    description ** BGP Underlay to Spine-11 **
    address-family ipv4 unicast
Example 1-2: Leaf-102 basic BGP configuration.

router bgp 65000
  router-id 192.168.0.11
  address-family ipv4 unicast
    network 192.168.100.101/32
  neighbor 10.101.11.101
    remote-as 65000
    description ** BGP Underlay to Leaf-101 **
    address-family ipv4 unicast
  neighbor 10.102.11.102
    remote-as 65000
    description ** BGP Underlay to Spine-11 **
    address-family ipv4 unicast
Example 1-3: Spine-11 basic BGP configuration.

At this stage, the BGP peering between Leaf-101 and Spine-11 is up as can be seen from example 1-4.

Leaf-101# sh ip bgp summ | beg Neigh
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.101.11.11    4 65099    2969    2954       11    0    0 02:27:42 2        
Example 1-4: Leaf-101 BGP peering.

However, there are no entries concerning routes originated by Leaf-102 in Leaf-101 BGP table (example 1-5).

Leaf-101# sh ip bgp
<snipped>

   Network             Next Hop       Metric  LocPrf   Weight Path
*>l192.168.50.101/32   0.0.0.0                100       32768 i
*>e192.168.100.11/32   10.101.11.11                         0 65099 i
*>l192.168.100.101/32  0.0.0.0                100       32768 i
*>e192.168.238.0/29    10.101.11.11                         0 65099 i
Example 1-5: Leaf-101 BGP routes.

There are two reasons why BGP Updates originated by Leaf-102 does not end up to BGP table of Leaf-101. First, Spine-11 does not forward BGP updates received from Leaf-102 to Leaf-101. Example 1-6 shows that only self-originated routes are advertised to Leaf-101 by Spine-11. This is because the AS-PATH Path Attribute carried in BGP Update message includes the AS 65000 that is also used in Leaf-101 specific IPv4 BGP peering AS configuration. This is the default loop-prevention mechanism.

Spine-11# sh ip bgp neighbors 10.101.11.101 advertised-routes
<snipped>
   Network            Next Hop            Metric     LocPrf     Weight Path
*>l192.168.100.11/32  0.0.0.0                           100      32768 i
*>l192.168.238.0/29   0.0.0.0                           100      32768 i
Example 1-6: Routes advertised to Leaf-101 by Spine-11.

Disabling peer-AS verification process before sending BGP Update with command disable-peer-as-check (example 1-7) changes this default behavior.

router bgp 65099
  neighbor 10.101.11.101
    address-family ipv4 unicast
      disable-peer-as-check
Example 1-7: Disabling peer-AS verification on Spine-11.

As can be seen from the example 1-8, Spine-11 now forwards BGP Update received from Leaf-102 to Leaf-101.

Spine-11# sh ip bgp neighbors 10.101.11.101 advertised-routes
<snipped>
Network               Next Hop       Metric LocPrf    Weight Path
*>e192.168.50.102/32  10.102.11.102                        0 65000 i
*>l192.168.100.11/32  0.0.0.0               100        32768 i
*>e192.168.100.102/32 10.102.11.102                        0 65000 i
*>l192.168.238.0/29   0.0.0.0               100        32768 i
Example 1-8: Routes advertised to Leaf-101 by Spine-11.

The second reason why routes do not end up into Leaf-101 BGP table is that even though Leaf-101 receives routes, it rejects them. BGP process discards BGP Updates messages learned from an eBGP peer, which carries receiving device AS Area information in its AS-Path list. This is a default BGP loop prevention mechanism. This default behavior can be bypassed with allowas-in” command under a peer-specific configuration (example 1-9).

router bgp 65000
  neighbor 10.101.11.11
    address-family ipv4 unicast
      allowas-in 3
Example 1-9: Allow-as in on Leaf-101.

After this addition, Leaf-101 accepts and installs routes originated by Leaf-102 into BGP table (example 1-10).

Leaf-101# sh ip bgp
<snipped>
   Network            Next Hop    Metric   LocPrf   Weight Path
*>l192.168.50.101/32  0.0.0.0               100      32768 i
*>e192.168.50.102/32  10.101.11.11                       0 65099 65000 i
*>e192.168.100.11/32  10.101.11.11                       0 65099 i
*>l192.168.100.101/32 0.0.0.0               100      32768 i
*>e192.168.100.102/32 10.101.11.11                       0 65099 65000 i
*>e192.168.238.0/29   10.101.11.11                       0 65099 i
Example 1-10: Allow-as in on Leaf-101.

The IP connectivity between the Leaf switches can now be verified by pinging between the Loopback interfaces (example 1-11).

Leaf-101# ping 192.168.100.102 source 192.168.100.101 count 2
<snipped>
64 bytes from 192.168.100.102: icmp_seq=0 ttl=253 time=9.268 ms
64 bytes from 192.168.100.102: icmp_seq=1 ttl=253 time=6.586 ms

<snipped>

Leaf-101# ping 192.168.50.102 source 192.168.50.101 count 2
<snipped>
64 bytes from 192.168.50.102: icmp_seq=0 ttl=253 time=27.166 ms
64 bytes from 192.168.50.102: icmp_seq=1 ttl=253 time=17.275 ms
Example 1-11: ping from Leaf-101 to Leaf-102.

Overlay Network Control Plane eBGP

Figure 1-3 illustrates the Overlay Network addressing scheme and peering topology. BGP L2VPN EVPN peering is configured between Loopback 100 interfaces. Examples 1-11 and 1-12 show the basic BGP L2VPN EVPN afi peering configurations of switches.



Figure 1-3: VXLAN Fabric Overlay Network eBGP L2VPN EVPN peering.



Both Leaf-switches can use the same configuration template if BGP L2VPN EVPN peering is configured between Loopback Interfaces. BGP sets TTL for BGP OPEN message to one by default. When peering between logical interfaces instead of the directly connected physical interface, the default TTL value one has to be manually increased by one with ebgp-multihop 2” command. In addition, peering between the logical loopback interfaces requires the update-source IP address modification since the IP address of the outgoing physical interface is used as a source IP for BGP messages sent to the external peer by default. This is achieved by using “update-source loopback 100” command under peer-specific configuration section. In addition, the same BGP loop-prevention mechanism that rejects routes with own AS-number applies also in Overlay Network, and “allowas-in” is needed

router bgp 65000
  neighbor 192.168.100.11
    remote-as 65099
    description ** BGP Overlay to Spine-11 **
    update-source loopback100
    ebgp-multihop 2
    address-family l2vpn evpn
      allowas-in 3
      send-community
      send-community extended
Example 1-12: Basic BGP L2VPN EVPN configuration on Leaf-101and Leaf-102.

Example 1-13 illustrates the Spine-11 BGP L2VPN EVPN peering configuration with Leaf-101. The “disable-peer-as-check” command is needed in Overlay BGP L2VPN EVPN peering just like it was needed in Underlay BGP IPv4 peering.

router bgp 65099
  neighbor 192.168.100.101
    remote-as 65099
    description ** BGP Overlay to Leaf-101 **
    update-source loopback100
    ebgp-multihop 2
    address-family l2vpn evpn
      disable-peer-as-check
      send-community
      send-community extended
Example 1-13: Basic BGP L2VPN EVPN peering configuration on Leaf-102.

Now the BGP L2VPN EVPN peering is up, though Spine-11 has not installed any routes from neither Leaf-101 nor Leaf-102 into its BGP table.

Spine-11# sh bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 192.168.0.11, local AS number 65099
BGP table version is 4, L2VPN EVPN config peers 2, capable peers 2
0 network entries and 0 paths using 0 bytes of memory
BGP attribute entries [0/0], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.100.101 4 65000       6       6        4    0    0 00:00:13 0        
192.168.100.102 4 65000       6       6        4    0    0 00:00:03 0        
Example 1-14: show bgp l2vpn evpn summry.

L2VPN EVPN NLRIs are imported/exported based on Route-Target (RT) values. In Leaf-101and Leaf-102, there is an evpn instance where the import/export policy has been defined (example 1-15).

Leaf-101# sh run bgp | sec evpn
<snipped>
evpn
  vni 10000 l2
    rd auto
    route-target import auto
    route-target export auto
    route-target both auto evpn
Example 1-15: evpn vni 10000 Route-Target import/export policy on Leaf-101.

There is no local EVPN instance configured on Spine-11, therefore it does not forward EVPN updates received from one eBGP peer to another eBGP peer. This rule applies to eBGP peering. In order to Spine-11 operate like a route-reflector, the command “retain route-target” is needed under global BGP L2VPN EVPN address-family (example -16). This way also the next-hop address carried in the update is retained. 

Spine-11(config)# router bgp 65099
Spine-11(config-router)# address-family l2vpn evpn
Spine-11(config-router-af)# retain route-target all
Example 1-16: retain route-target all command on Spine-11.

Now, when the BGP L2VPN EVPN NLRIs are recent to Spine by Leaf-101…

Leaf-101# clear bgp l2vpn evpn 192.168.100.11 soft out
Example 1-17: clear bgp l2vpn evpn on Leaf-101.

… the MAC-only and MAC-IP NLRIs are received and installed into BGP table of Spine-11. Note! Timestamps are removed (entries are updated from bottom to top).

Spine-11# sh bgp internal event-history events | i cafe
RIB: [L2VPN EVPN] Add/delete 192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/112, flags=0x200, evi_ctx invalid, in_rib: no

RIB: [L2VPN EVPN] Add/delete 192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/144, flags=0x200, evi_ctx invalid, in_rib: no

BRIB: [L2VPN EVPN] (192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/112 (192.168.100.101)): returning from bgp_brib_add, reeval=0new_path: 1, change: 1, undelete: 0, history: 0, force: 0, (pflags=0x40002020) rnh_flag_change 0

BRIB: [L2VPN EVPN] (192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/112 (192.168.100.101)): bgp_brib_add: handling nexthop, path->flags2: 0x80000

BRIB: [L2VPN EVPN] Created new path to 192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/112 via 192.168.0.101 (pflags=0x40000000, pflags2=0x0)

BRIB: [L2VPN EVPN] Installing prefix 192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/112 (192.168.100.101) via 192.168.50.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 ENCAP:8

BRIB: [L2VPN EVPN] (192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/144 (192.168.100.101)): returning from bgp_brib_add, reeval=0new_path: 1, change: 1, undelete: 0, history: 0, force: 0, (pflags=0x40002020) rnh_flag_c

BRIB: [L2VPN EVPN] (192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/144 (192.168.100.101)): bgp_brib_add: handling nexthop, path->flags2: 0x80000

BRIB: [L2VPN EVPN] Created new path to 192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/144 via 192.168.0.101 (pflags=0x40000000, pflags2=0x0)

BRIB: [L2VPN EVPN] Installing prefix 192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/144 (192.168.100.101) via 192.168.50.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 RT:65000:10077 ENC
Example 1-18:sh bgp internal event-history events | i cafĂ©” on Spine-11.

This can also be verified by checking the BGP table. 

Spine-11# sh bgp l2vpn evpn 1000.0010.cafe
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.0.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/216, version 93
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path
  AS-Path: 65000 , path sourced external to AS
    192.168.50.101 (metric 0) from 192.168.100.101 (192.168.0.101)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8

  Path-id 1 advertised to peers:
    192.168.100.102
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/272, version 90
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path
  AS-Path: 65000 , path sourced external to AS
    192.168.50.101 (metric 0) from 192.168.100.101 (192.168.0.101)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007

  Path-id 1 advertised to peers:
    192.168.100.102
Example 1-19: sh bgp l2vpn evpn 1000.0010.cafe on Spine-11.

Example 1-20 show that Leaf-102 has received BGP Update from Spine-11. Closer examination BGP table shows that the next-hop is Spine-11 though it should to be Leaf-101.

Leaf-102# sh bgp l2vpn evpn 1000.0010.cafe
! <------ COMMENT: BGP Adj-RIB-In information ----->
!          <--- Comment: MAC-only entry --->
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.0.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/216, version 19
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path
             Imported to 1 destination(s)
  AS-Path: 65099 65000 , path sourced external to AS
    192.168.100.11 (metric 0) from 192.168.100.11 (192.168.0.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8

  Path-id 1 not advertised to any peer

!         <--- Comment: MAC-IP entry --->
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/272, version 4
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path
             Imported to 3 destination(s)
  AS-Path: 65099 65000 , path sourced external to AS
    192.168.100.11 (metric 0) from 192.168.100.11 (192.168.0.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007

  Path-id 1 not advertised to any peer

! <------ COMMENT: BGP Loc-RIB information (from Adj-RIB-In)----->

Route Distinguisher: 192.168.0.102:32777    (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/216, version 20
Paths: (1 available, best #1)
Flags: (0x000212) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path, in rib
             Imported from 192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/216
  AS-Path: 65099 65000 , path sourced external to AS
    192.168.100.11 (metric 0) from 192.168.100.11 (192.168.0.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8

  Path-id 1 not advertised to any peer
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/272, version 5
Paths: (1 available, best #1)
Flags: (0x000212) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path, in rib
             Imported from 192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/272
  AS-Path: 65099 65000 , path sourced external to AS
    192.168.100.11 (metric 0) from 192.168.100.11 (192.168.0.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007

  Path-id 1 not advertised to any peer

Route Distinguisher: 192.168.0.102:3    (L3VNI 10077)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/272, version 6
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path
             Imported from 192.168.0.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[172.16.10.101]/272
  AS-Path: 65099 65000 , path sourced external to AS
    192.168.100.11 (metric 0) from 192.168.100.11 (192.168.0.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007

  Path-id 1 not advertised to any peer
Example 1-20: show bgp l2vpn evpn 1000.0010.cafe on Leaf-102.

In addition, the “show nve peer detail” command shows that the NVE peering is between Leaf-102 and Spine-11 while it should be between Leaf-102 and Leaf-101 (192.168.50.101). The reason for this is that Spine-11 changes the next-hop to its own IP address when it forwards BGP Update originated by Leaf-101 to Leaf-102 and the NVE peer information is taken from the next-hop field of L2VPN EVPN BGP Update.

Leaf-102# sh nve peers detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 192.168.100.11
    NVE Interface       : nve1
    Peer State          : Up
    Peer Uptime         : 00:31:36
    Router-Mac          : 5e00.0000.0007
    Peer First VNI      : 10000
    Time since Create   : 00:31:36
    Configured VNIs     : 10000,10077
    Provision State     : peer-add-complete
    Learnt CP VNIs      : 10000,10077
    vni assignment mode : SYMMETRIC
    Peer Location       : N/A
Example 1-21: show nve peers detail on Leaf-102.

This means that there is no L2/L3 connectivity between host Café and host Abba as can be seen from example 1-22.

Cafe#ping 172.16.10.102
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.10.102, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Example 1-22: ping from host Café to host Abba.

Capture 1-1 is taken from the link between Leaf-101 and Spine-11 while host Café tries to ping host Abba. First, since the hosts are in same subnet 172.16.10.0/24, host Cafe has to resolve the MAC address of host Abba. It sends an ARP request (L2 broadcast).

Ethernet II, Src: 10:00:00:10:ca:fe, Dst: ff:ff:ff:ff:ff:ff
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: 10:00:00:10:ca:fe
    Sender IP address: 172.16.10.101
    Target MAC address: 00:00:00:00:00:00
    Target IP address: 172.16.10.102
Capture 1-1: ARP request from host Cafe.

ARP suppression is implemented in vni 10000 in both Leaf switches. Since Leaf-101 knows the MAC address of host Abba (learned via BGP), it replies to ARP request by sending an ARP Reply as a unicast straight to host Café.

Ethernet II, Src: 10:00:00:10:ab:ba, Dst: 10:00:00:10:ca:fe
Address Resolution Protocol (reply)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: reply (2)
    Sender MAC address: 10:00:00:10:ab:ba
    Sender IP address: 172.16.10.102
    Target MAC address: 10:00:00:10:ca:fe
    Target IP address: 172.16.10.101
Capture 1-2: ARP reply from Leaf-101.



Now host Cafe has resolved the MAC/IP of host Abba and it sends an ICMP request towards host Abba. Leaf-101 receives the ICPM request and make a routing decision based on L2 RIB, where the next-hop incorrectly points to Spine-11 (example 1-23).

Leaf-101# sh l2route mac all

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops     
----------- -------------- ------ ------------- ---------- ----------------
10          1000.0010.abba BGP    SplRcv        0          192.168.100.11
10          1000.0010.cafe Local  L,            0          Eth1/3        
77          5e00.0002.0007 VXLAN  Rmac          0          192.168.100.11
Example 1-23: ping from host CafĂ©  to host Abba.

Leaf-101 encapsulates the frame and sets the outer destination IP to 192.168.100.11 (Capture 1-3). When Spine-11 receives the packet, it does not have any idea what to do with it and it rejects the packet.

Ethernet II, Src: 1e:af:01:01:1e:11, Dst: c0:8e:00:11:1e:11
Internet Protocol Version 4, Src: 192.168.50.101, Dst: 192.168.100.11
User Datagram Protocol, Src Port: 54810, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 10000
    Reserved: 0
Ethernet II, Src: Private_10:ca:fe (10:00:00:10:ca:fe), Dst: Private_10:ab:ba (10:00:00:10:ab:ba)
Internet Protocol Version 4, Src: 172.16.10.101, Dst: 172.16.10.102
Internet Control Message Protocol
Capture 1-3: forwarded frame by Leaf-101.




Figure 1-4: ICMP process.



In order to fix this, Spine-11 has to send L2VPN EVPN BGP Updates without modifying the Next-Hop Path Attribute. First, there is a route-map that prevents the next-hop modification. This route map is then taken into action.

route-map DO-NOT-MODIFY-NH permit 10
  set ip next-hop unchanged
!
router bgp 65099
  router-id 192.168.0.11
  address-family ipv4 unicast
    network 192.168.100.11/32
    network 192.168.238.0/29
  address-family l2vpn evpn
    nexthop route-map DO-NOT-MODIFY-NH
    retain route-target all
  neighbor 10.101.11.101
    remote-as 65000
    description ** BGP Underlay to Leaf-101 **
    address-family ipv4 unicast
      disable-peer-as-check
  neighbor 10.102.11.102
    remote-as 65000
    description ** BGP Underlay to Leaf-102 **
    address-family ipv4 unicast
      disable-peer-as-check
  neighbor 192.168.100.101
    remote-as 65000
    description ** BGP Overlay to Leaf-101 **
    update-source loopback100
    ebgp-multihop 2
    address-family l2vpn evpn
      disable-peer-as-check
      send-community
      send-community extended
      route-map DO-NOT-MODIFY-NH out
  neighbor 192.168.100.102
    remote-as 65000
    description ** BGP Overlay to Leaf-102 **
    update-source loopback100
    ebgp-multihop 2
    address-family l2vpn evpn
      disable-peer-as-check
      send-community
      send-community extended
      route-map DO-NOT-MODIFY-NH out
Example 1-24: Spine-11 bgp final configuration.

Now, Leaf-101 learns MAC/IP routes with correct next-hop information.

Leaf-101# sh bgp l2vpn evpn 1000.0010.abba
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.0.101:32777    (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216, version 395
Paths: (1 available, best #1)
Flags: (0x000212) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path, in rib
             Imported from 192.168.0.102:32777:[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
  AS-Path: 65099 65000 , path sourced external to AS
    192.168.50.102 (metric 0) from 192.168.100.11 (192.168.0.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8

  Path-id 1 not advertised to any peer
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.abba]:[32]:[172.16.10.102]/272, version 369
Paths: (1 available, best #1)
Flags: (0x000212) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path, in rib
             Imported from 192.168.0.102:32777:[2]:[0]:[0]:[48]:[1000.0010.abba]:[32]:[172.16.10.102]/272
  AS-Path: 65099 65000 , path sourced external to AS
    192.168.50.102 (metric 0) from 192.168.100.11 (192.168.0.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0002.0007

  Path-id 1 not advertised to any peer

Route Distinguisher: 192.168.0.102:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216, version 394
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path
             Imported to 1 destination(s)
  AS-Path: 65099 65000 , path sourced external to AS
    192.168.50.102 (metric 0) from 192.168.100.11 (192.168.0.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8

  Path-id 1 not advertised to any peer
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.abba]:[32]:[172.16.10.102]/272, version 367
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path
             Imported to 3 destination(s)
  AS-Path: 65099 65000 , path sourced external to AS
    192.168.50.102 (metric 0) from 192.168.100.11 (192.168.0.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0002.0007

  Path-id 1 not advertised to any peer

Route Distinguisher: 192.168.0.101:3    (L3VNI 10077)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.abba]:[32]:[172.16.10.102]/272, version 370
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: external, path is valid, is best path
             Imported from 192.168.0.102:32777:[2]:[0]:[0]:[48]:[1000.0010.abba]:[32]:[172.16.10.102]/272
  AS-Path: 65099 65000 , path sourced external to AS
    192.168.50.102 (metric 0) from 192.168.100.11 (192.168.0.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0002.0007

  Path-id 1 not advertised to any peer
Example 1-25: BGP table on Leaf-101 concerning host Abba.

The NVE peer information (example 1-26), as well as L2 routing information is L2 RIB (example 1-27) are correct.

Leaf-102# sh nve peer detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 192.168.50.101
    NVE Interface       : nve1
    Peer State          : Up
    Peer Uptime         : 01:03:58
    Router-Mac          : 5e00.0000.0007
    Peer First VNI      : 10000
    Time since Create   : 01:03:58
    Configured VNIs     : 10000,10077
    Provision State     : peer-add-complete
    Learnt CP VNIs      : 10000,10077
    vni assignment mode : SYMMETRIC
    Peer Location       : N/A
Example 1-26: NVE peer information on Leaf-102.

Leaf-101# sh l2route mac all

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops     
----------- -------------- ------ ------------- ---------- ----------------
10          1000.0010.abba BGP    SplRcv        0          192.168.50.102
10          1000.0010.cafe Local  L,            0          Eth1/3        
77          5e00.0002.0007 VXLAN  Rmac          0          192.168.50.102
Example 1-27: L2 RIB on Leaf-101.
Host Cafe is now able to ping host Abba.
Cafe#ping 172.16.10.102
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.10.102, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 19/25/33 ms
Example 1-28: ping from host Cafe to host Abba.

As a final verification, capture 1-4 shows that ICMP packets are sent inside VXLAN encapsulation to 192.168.100.102.

Ethernet II, Src: 1e:af:01:01:1e:11, Dst: c0:8e:00:11:1e:11
IPv4, Src: 192.168.50.101, Dst: 192.168.50.102
User Datagram Protocol, Src Port: 59959, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 10000
    Reserved: 0
Ethernet II, Src: 10:00:00:10:ca:fe, Dst: 10:00:00:10:ab:ba
IPv4, Src: 172.16.10.101, Dst: 172.16.10.102
Internet Control Message Protocol
Capture 1-4: VXLAN encapsulated ICMP packets.


Author: Toni Pasanen CCIE#28158
Published: 3.5.2019
Updated: 
-------------------------------------------------
References:

Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective
ISBN-10: 1-58714-467-0 – Krattiger Lukas, Shyam Kapadia, and Jansen Davis

3 comments:

  1. hi Toni,
    please allow me to ask one question here:
    on spine switches command "retain route-target" is used as Spine11 does now have any vrf configured so L2VPN EVPN NLRIs installed into BGP table or RIB.

    while when using BGP or ospf as underlay network protocol, we do not use this command and in these cases Spine 11 does not have EVPN instance as well. why do spine11 in these case forward EVPN NLRIs from leaf101 to Leaf102 without command "retain route-target"?

    Regards
    Michael

    ReplyDelete
    Replies
    1. I just changed the sentence. Does it now make more sence?

      Delete
    2. it makes perfect sentences now !
      Thanks Toni.
      Michael

      Delete