Monday 19 November 2018

VXLAN Part XIV: Control Plane Operation in BGP EVPN VXLAN Fabric

Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

The focus of this post is a Control Plane operation in VXLAN fabric. First, we are going to see how the local switch Leaf-101 learns and installs the MAC address and IP address information of host Beef into databases. Then, we are going to see how Leaf-101 advertises the information to remote Leaf-102 by using BGP EVPN. After that, we are going to see how remote switch Leaf-102 receives the BGP EVPN Update and import routes into MAC-VRF and from there into databases. Note that in Leaf-101 the VLAN 10 is attached to VNI 10000 while VLAN 20 is attached to the same vn-segment in Leaf-102.


Figure 14-1: IP- and MAC addressing and VLAN-to-VN-segment mapping.


L2VNI MAC-only and MAC-IP learning process

Let’s first consider what information Leaf switches need to be able to transmit Intra-VNI Unicast traffic over VXLAN fabric. Figure 14-2 shows an overview of how the MAC and MAC/IP information learning process works from local Leaf and remote Leaf point of view. The MAC learning process starts when host Beef wakes up and connect to the Leaf-101. Beef sends a GARP message to inform its’ existence to the network and to verify the uniqueness of its’ IP address. Local Leaf-101 L2FWDER component notices incoming frame and stores the MAC address information to MAC address-table. From the MAC address table, the information is installed into L2RIB as a MAC-only entry. MAC-IP information is installed into L2RIB as a MAC-IP entry by the Host Mobility Manager (HMM) component, which also installs route into L3RIB. From the L2RIB, the MAC-IP information is installed into ARP-Suppression cache (ARP suppression is enabled in our example). MAC-IP information is also installed into local ARP table. From the local Leaf-101 perspective the MAC-only information is stored in MAC address-table, and L2RIB while MAC-IP is installed into ARP-table, L2RIB and into ARP suppression-cache.

After local learning process, MAC and MAC-IP information are advertised as a separate BGP EVPN Route-Type 2 advertisement to remote Leaf-102. The Mac-only advertisement has BGP Extended community 65000:10000 while MAC-IP advertisement has additional RT 65000:10077. Routes are imported into corresponding MAC-VRF where the MAC-only route is installed into L2RIB and from there to remote Leaf-102 MAC address-table. MAC-IP route is installed into L2RIB as MAC-IP entry and from there it is installed into ARP suppression-cache but not into local ARP-table. Host route /32, is also installed into the L3 routing table of VRF. There is also BGP Recursive Next Hop database, which has information of how the next hop address received on BGP EVPN Update is reachable.

The same thing happens when host Abba wake up and connect to Leaf-102. 

Figure 14-2: MAC and MAC/IP learning process overview.

Next, we look at the MAC-only and MAC-IP learning process more accurately.

MAC address table

In the beginning, Leaf-101 and Leaf-102 do not have any information about directly connected hosts. We will start by pinging from hosts Abba and Beef to AGW 192.168.11.1. This starts the MAC learning process. Our focus is to examine how the MAC and MAC-IP information of host Beef (MAC: 1000.0010.beef/IP:192.168.11.12) is learned from both the local switch Leaf-101 and remote switch Leaf-102 perspective.

In example 14-1, we can see that Leaf-101 has learned the MAC address of host Beef via interface E1/2 which belongs to VLAN 10. The MAC-address aging timer default value is 1800 seconds.

Leaf-101# show system internal l2fwder mac
Legend:
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*     1    5e00.0007.0000   dynamic   00:03:18   F     F     Eth1/2 
*    10    1000.0010.beef   dynamic   00:01:53   F     F     Eth1/2 
*     1    5e00.0006.0000   dynamic   00:00:21   F     F     Eth1/2 
G    77    5e00.0000.0007    static   -          F     F   sup-eth1(R)
*    10    1000.0020.abba    static   -          F     F  (0x47000001) nve-peer1 192.168 
G    20    5e00.0000.0007    static   -          F     F   sup-eth1(R)
G    10    5e00.0000.0007    static   -          F     F   sup-eth1(R)
G     1    5e00.0000.0007    static   -          F     F   sup-eth1(R)
    1           1         -00:01:00:01:00:01         -             1

Example 14-1: Mac address tables of Leaf-101.

In example 14-2, we can verify that the VLAN 10 is attached to VNI 10000.

Leaf-101# show vlan id 10 vn-segment


VLAN Segment-id
---- -----------
10   10000     
Example 14-2: Leaf-101 EVPN instance VLAN mapping.

At this phase, the MAC address-table of local switch Leaf-101 is updated.


Figure 14-3: Leaf-101 MAC address tables.

L2 routing information Base (L2RIB) – MAC-only entry

In the example 14-3, we can see that the VLAN, MAC-address, Next-Hop IP-address information concerning host Beef is installed locally by L2FWDER component into L2RIB of EVPN Instance 10000 (also called MAC-VRF). The reason for the existence of L2RIB is that BGP advertises only routes that can be found from the local RIB. In case of BGP EVPN, we are advertising also L2 routes which can be found from L2RIB. Note that show output does not show the actual VN-segment but it can be verified by using the command “show vlan id 10 vn-segment” in as shown in example 14-2.

Leaf-101# show l2route evpn mac evi 10

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops     
----------- -------------- ------ ------------- ---------- ----------------
10          1000.0010.beef Local  L,            0          Eth1/2        
10          1000.0020.abba BGP    SplRcv        0          192.168.100.102
Example 14-3: L2 Routing Information Base (L2RIB) MAC-Only

At this phase, the MAC address-table and L2RIB of local switch Leaf-101 are updated.

Figure 14-4: MAC address tables and L2RIB

L2 routing information Base (L2RIB) – MAC-IP entry

In addition to advertising the MAC-only information of host Beef to remote Leaf-102, Leaf-101 is also going to advertise MAC-IP information. The HMM component installs the MAC-IP information into L2RIB of VRF.

In the example 14-4, we can see the HHM table.

Leaf-101# show fabric forwarding ip local-host-db vrf TENANT77

HMM host IPv4 routing table information for VRF TENANT77
Status: *-valid, x-deleted, D-Duplicate, DF-Duplicate and frozen,
        c-cleaned in 00:06:13

    Host                 MAC Address        SVI        Flags      Physical Interface
*   192.168.11.12/32     1000.0010.beef     Vlan10     0x420201   Ethernet1/2
Example 14-4: VRF TENANT77 HMM information


In the example 14-5 we can see that HHM table information is produced into L2RIB. We can also see that the information is sent to BGP.

Leaf-101# sh l2route evpn mac-ip evi 10 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated
Topology    Mac Address    Prod   Flags         Seq No     Host IP         Next-Hops     
----------- -------------- ------ ---------- --------------- ---------------
10          1000.0010.beef HMM    --            0          192.168.11.12  Local         
            Sent To: BGP
            L3-Info: 10077
10          1000.0020.abba BGP    --            0          192.168.11.22  192.168.100.102
            Sent To: ARP
Example 14-5: L2 Routing Information Base (L2RIB) – MAC-IP

Now the MAC address-table, L2RIB (MAC-Only and MAC-IP) and HMM and tables of local switch Leaf-101 are updated.

Figure 14-5: MAC, L2RIB MAC-only, HMM and L2RIB MAC-IP.

ARP-table, ARP Suppression-Cache, and L3RIB

ARP-table information is learned from the GARP message sent by host Beef. The default aging time for locally learned ARP-entries is in NX-OS 1500 seconds, which is 300 seconds shorter than MAC-address aging timer. When the ARP aging timers exceed, the switch checks the presence of the host by sending an ARP-request to host. If the host response to ARP-request, the switch will reset the aging timer. If the host does not reply, the entry is removed from the ARP-table but kept in BGP EVPN table for 1800 seconds (MAC aging timer) before the withdrawn message is sent. The MAC address aging timer should be bigger than the ARP aging timer. This is because the ARP refresh process will also update the MAC table and unnecessary flooding can be avoided.

Leaf-101# sh ip arp vrf TENANT77

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context TENANT77
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
192.168.11.12   00:00:44  1000.0010.beef  Vlan10        
Example 14-6: ARP-table

ARP Suppression-Cache entry is based on information found from L2RIB (MAC-IP).

Leaf-101# sh ip arp suppression-cache detail

Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote Vtep Addrs

192.168.11.12   00:03:41 1000.0010.beef   10 Ethernet1/2         L
192.168.11.22   01:09:01 1000.0020.abba   10 (null)              R        192.168.100.102
Example 14-7: ARP Suppression-Cache.

The host route is also installed into L3RIB of VRF by HMM component.

Leaf-101# show ip route 192.168.11.12 vrf TENANT77
IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.11.12/32, ubest/mbest: 1/0, attached
    *via 192.168.11.12, Vlan10, [190/0], 01:03:22, hmm
Leaf-101#
Example 14-8: L3 Routing Information Base (L3RIB) – host route/32.

At this moment, the MAC address-table, L2RIB, HMM, L2RIB MAC-IP tables, ARP-table, ARP Suppression-Cache and L3RIB of local switch Leaf-101 are updated. Now the switch Leaf-101 is able to route and switch Intra-VNI (L2VNI) data as well as route Inter-VNI (L3VNI) to and from the host Beef.

Figure 14-6: MAC, L2RIB MAC-only, HMM, L2RIB MAC-IP, ARP-table, ARP Suppression-Cache, L3RIB.

Local leaf BGP EVPN Instance/MAC-VRF

The MAC-Only and MAC-IP routing information are installed from the L2RIB into BGP EVPN EVI 10000 table as two individual BGP routes (BGP EVPN Route-Type = [2]). In figure 14-9 we can see that both routes have BGP Extended Community Route-Target value 65000:10000, which is used for Intra-VNI BGP export/import policy (L2VNI 10000). This RT value is configured as an auto-generated RT under EVPN VNI 10000. “Received Label” defines the VN-segment Id. ENCAP: 8 means that the packet destined to host must use VXLAN encapsulation. 
The MAC-IP route has two additional BGP Extended Communities attached to it. The RT 65000:10077 is used for Inter-VNI BGP export/import policy (L3VNI 10077). RT value is derived from the VRF Context configuration. The Router MAC (RMAC) is used in VXLAN Encapsulation. VXLAN is a MAC-over-IP/UDP and the RMAC is used as a source/destination MAC address in inner Ethernet header in VXLAN encapsulated packets.

Note! Route-Targets and Route Distinguisher usage and generation mechanism are explained in detail on VXLAN Part VII: VXLAN Control Plane Operation.

The square-brackets in front of the MAC address [48] is the length of the MAC address and the square-brackets in front of the IP address is the length of the mask ([0] for MAC-only and [32] for MAC-IP). The last value /216 and /272 is a bit count for the whole entry. 

Leaf-101# show bgp l2vpn evpn 1000.0010.beef
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777    (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216, version 60
Paths: (1 available, best #1)
Flags: (0x000102) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path
  AS-Path: NONE, path locally originated
    192.168.100.101 (metric 0) from 0.0.0.0 (192.168.77.101)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8

  Path-id 1 advertised to peers:
    192.168.77.11 
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 4
Paths: (1 available, best #1)
Flags: (0x000102) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path
  AS-Path: NONE, path locally originated
    192.168.100.101 (metric 0) from 0.0.0.0 (192.168.77.101)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007

  Path-id 1 advertised to peers:
    192.168.77.11 
Example 14-9: BGP entry about host Beef NLRI information on Leaf-101.

Figure 14-7: BGP EVPN EVI 10000 table on Leaf-101.

Remote leaf BGP EVPN Instance/MAC-VRF

Remote Leaf-102 receives both MAC-Only and MAC-IP BGP EVPN Route-Type 2 update sent by Leaf-101. In figure 14-8, we can see that MAC-Only route is imported from the BGP L2VPN address-family table into BGP EVPN Instance 10000 BRIB. This RT 65000:10000 based BGP Import policy is configured under EVPN Instance 10000 VN-segment configuration. 


Figure 14-8: BGP EVPN EVI 10000 table on Leaf-102 (MAC-Only).

In figure 14-9, we can see that also the MAC-IP route is imported from the BGP address-family L2VPN into BGP EVPN Instance 10000 BRIB.

Figure 14-9: BGP EVPN EVI 10000 table on Leaf-102 (MAC-IP).

In example 14-10, under the Comment#1, We can see the original BGP EVPN Update installed into BGP address-family L2VPN EVPN. There we can see both Route-Type 2 BGP EVPN Updates Mac-only and MAC-IP concerning host Beef. The Route-Distinguisher in the original update is 192.168.100.77:32777 [Leaf-101 BGP RID:32767+Vlan Id 10).

Under the Comment#2, we can see that both MAC-only and MAC-IP are imported into BGP EVPN instance 10000 (L2VNI) BRIB. Now, if we take a look to the RD, we can see that when Leaf-102 imports routes from the original update into BGP EVPN Instance 10000 BRIB the RD is changed to correspond to Leaf-102 definitions. The BGP RID is now changed to 192.168.100.102 (BGP RID of Leaf-102) and 32777 is changed to 32787 since we are now using VLAN id 20 locally for VNI 10000 (32767 + 20). This gives the Route-Distinguisher 192.168.100.102:32787.

Both of these route import is based on RT 65000:10000. MAC-Only route is used for switching where MAC-IP is used to generate ARP Suppression-cache entry so that Leaf-102 is able to answer to ARP-request on behalf of Beef. ARP suppression-cache is only used when ARP Suppression is enabled like in the case we have.

Note! ARP suppression, if used, should be enabled per VN-segment only after all the initial testing has been done.

The output after Comment#3 shows the route import into BGP EVPN Instance 10077 (L3VNI). This information is used for Inter-VNI routing inside VRF. Note that prefix information used between the VRF are sent as a BGP EVPN Route-Type 5 advertisement (IP Prefix). The Route Distinguisher is based on BGP RID:VRF Id combination. VRF Id can be verified by using “show vrf” command. This import policy is based on RT export/import configured under VRF Context.


Leaf-102# show bgp l2vpn evpn 1000.0010.beef
BGP routing table information for VRF default, address family L2VPN EVPN
#----> Comment#1: BGP table of address-family L2VPN EVPN <--------
Route Distinguisher: 192.168.77.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216, version 73
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 1 destination(s)
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 4
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 3 destination(s)
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer
#----> Comment#2 BGP EVPN Instance 10000 BRIB (MAC-Only and MAC-IP) <--------
Route Distinguisher: 192.168.77.102:32787    (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216, version 74
Paths: (1 available, best #1)
Flags: (0x000212) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, in rib
             Imported from 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 5
Paths: (1 available, best #1)
Flags: (0x000212) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, in rib
             Imported from 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer
#----> Comment#3 L3 NLRI for VRF 10077  <--------
Route Distinguisher: 192.168.77.102:3    (L3VNI 10077)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 6
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

Leaf-102#
Example 14-10: BGP entry about host Beef NLRI information on Leaf-102.

The BGP EVPN Instance 10000 BRIB can been seen in example 14-11.

Leaf-102# show bgp l2vpn evpn vni-id 10000
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 70, Local Router ID is 192.168.77.102
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 192.168.77.102:32787    (L2VNI 10000)
*>i[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216
                      192.168.100.101                   100          0 i
*>l[2]:[0]:[0]:[48]:[1000.0020.abba]:[0]:[0.0.0.0]/216
                      192.168.100.102                   100      32768 i
*>i[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272
                      192.168.100.101                   100          0 i
*>l[2]:[0]:[0]:[48]:[1000.0020.abba]:[32]:[192.168.11.22]/272
                      192.168.100.102                   100     
Example 14-11: BGP BRIB of VNI 10000 on Leaf-102.


L2RIB and MAC address-table on remote Leaf-102

In example 14-12, we can see that the MAC-Only information is installed into L2RIB from the BGP EVPN Instance BRIB. The topology identifier 20 is the local VLAN Id for VNI 10000 on Leaf-102. Next-Hop is local Leaf-101.

Leaf-102# show l2route evpn mac evi 20

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops     
----------- -------------- ------ ------------- ---------- ----------------
20          1000.0010.beef BGP    SplRcv        0          192.168.100.101
20          1000.0020.abba Local  L,            0          Eth1/2        
Example 14-12: L2RIB on Leaf-102.

The Mac-only information is installed into the address-table from the L2RIB.

Leaf-102# show system internal l2fwder mac
Legend:
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*     1    5e00.0005.0000   dynamic   00:04:36   F     F     Eth1/2 
G    77    5e00.0004.0007    static   -          F     F   sup-eth1(R)
*     1    5e00.0007.0000   dynamic   00:02:28   F     F     Eth1/2 
G    20    5e00.0004.0007    static   -          F     F   sup-eth1(R)
G     1    5e00.0004.0007    static   -          F     F   sup-eth1(R)
*    20    1000.0010.beef    static   -          F     F  (0x47000001) nve-peer1 192.168 
*    20    1000.0020.abba   dynamic   00:01:30   F     F     Eth1/2 
    1           1         -00:01:00:01:00:01         -             1
Example 14-13: MAC-address table on Leaf-102.

In Figure 14-10, we can see that Leaf-102 has import MAC-Only route of host Beef into L2RIB where it is taken into VLAN 20 MAC address-table.

Figure 14-10: BGP EVPN EVI 10000 table on Leaf-101.


L2RIB MAC-IP and ARP Suppression-Cache on remote Leaf-102

In the example 14-14, we can see that the MAC-IP route is imported into L2RIB from the BGP with the Next-Hop of Leaf-101.

Leaf-102# show l2route evpn mac-ip evi 20
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated
Topology    Mac Address    Prod   Flags         Seq No     Host IP         Next-Hops     
----------- -------------- ------ ---------- --------------- ---------------
20          1000.0010.beef BGP    --            0          192.168.11.12  192.168.100.101
20          1000.0020.abba HMM    --            0          192.168.11.22  Local         
Example 14-14: L2RIB MAC-IP information on Leaf-102.

In the example 14-15, we can see that the L2RIB MAC-IP binding information is moved into ARP Suppression-cache. There is no aging timer in ARP Suppression-cache, entries are only removed when BGP withdrawn message is received from the Leaf-101 which “owns” the route. Now if locally connected host Abba sends an ARP-request to resolve MAC-IP binding of host Beef, the Leaf-102 is able to send an ARP-Response without forwarding ARP-Request to all Leaf switches participating in VNI 10000. In case that the ARP Suppression-Cache lookup is a miss, Leaf-102 will forward the messages to other Leaf by using VN-segment specific Multicast Group.

Leaf-102# sh ip arp suppression-cache detail

Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote Vtep Addrs

192.168.11.22   00:04:37 1000.0020.abba   20 Ethernet1/2         L
192.168.11.12   01:10:11 1000.0010.beef   20 (null)              R        192.168.100.101
Example 14-15: BGP entry about host Beef NLRI information on Leaf-102.

In the example 14-16 we can see that MAC-IP binding information is not stored into ARP table.

Leaf-102# sh ip arp vrf TENANT77

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context TENANT77
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
192.168.11.22   00:02:00  1000.0020.abba  Vlan20         
Example 14-16: ARP table of Leaf-102.


Figure 14-11: MAC-IP and ARP Suppression-Cache on Leaf-102.

L3RIB of VRF on remote Leaf-102

For Inter-VNI routing, Leaf-102 installs the route from the BGP table into VRF RIB (L3VNI). Example 14-17 shows that route to 192.168.11.12 (Beef), is learned from iBGP, it is reachable behind the VXLAN tunnel with id 0xc0a86465 and L3VNI Id 10077 must use in VXLAN header.

Leaf-102# show ip route 192.168.11.12 vrf TENANT77
IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.11.12/32, ubest/mbest: 1/0
    *via 192.168.100.101%default, [200/0], 01:02:30, bgp-65000, internal, tag 65000 (evpn) segid: 10077 tunnelid: 0xc0a86465 encap: VXLAN

Example 14-17: BGP entry about host Beef NLRI information on Leaf-102.

Example 14-18 shows the BGP RNH Data Base, where we can see that L3VNI routed traffic towards Next-Hop 192.168.100.101 uses the system MAC-address of Leaf-101 (NVE1 interface MAC).

Leaf-102# show nve internal bgp rnh database
--------------------------------------------
Total peer-vni msgs recvd from bgp: 2
Peer add requests: 2
Peer update requests: 0
Peer delete requests: 0
Peer add/update requests: 2
Peer add ignored (peer exists): 0
Peer update ignored (invalid opc): 0
Peer delete ignored (invalid opc): 0
Peer add/update ignored (malloc error): 0
Peer add/update ignored (vni not cp): 0
Peer delete ignored (vni not cp): 0
--------------------------------------------
Showing BGP RNH Database, size : 2 vni 0

Flag codes: 0 - ISSU Done/ISSU N/A        1 - ADD_ISSU_PENDING        
            2 - DEL_ISSU_PENDING          3 - UPD_ISSU_PENDING
       

VNI    Peer-IP            Peer-MAC            Tunnel-ID  Encap     (A/S)  Flags
10000  192.168.100.101    0000.0000.0000      0x0        vxlan     (1/0)    0
10077  192.168.100.101    5e00.0000.0007      0xc0a86465 vxlan     (1/0)    0

Leaf-102#
Example 14-18: BGP RNH DB of Leaf-102.

Figure 14-12: Information needed for Inter-VNI Uncast forwarding.

As a last verification step, we will verify the status of the tunneling connection between the Leaf-101 and Leaf-102. We can see that the connection has been created an hour ago and it has been up ever since. We also see the VN-segments used in NVE peer.

Leaf-102# show nve peers detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 192.168.100.101
    NVE Interface       : nve1
    Peer State          : Up
    Peer Uptime         : 01:00:50
    Router-Mac          : 5e00.0000.0007
    Peer First VNI      : 10000
    Time since Create   : 01:00:50
    Configured VNIs     : 10000,10077,20000,30000
    Provision State     : peer-add-complete
    Learnt CP VNIs      : 10000,10077
    vni assignment mode : SYMMETRIC
    Peer Location       : N/A

Leaf-102#
Example 14-19: BGP entry about host Beef NLRI information on Leaf-102.


Figure 14-13: Information needed for Inter-VNI Uncast forwarding.

This post introduces the Control Plane operation BGP EVPN VXLAN Fabric. To keep this post at somehow reasonable short, I decided that instead of adding Data Plane operation into this post, I am going to write the dedicated post focused on the Data Plane operation.  

Author: Toni Pasanen CCIE#28158
Published: 19.11.2018
References: Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective
ISBN-10: 1-58714-467-0 – Krattiger Lukas, Shyam Kapadia, and Jansen Davis

12 comments:

  1. Excellent write up. The control-plane components nicely demystified.

    ReplyDelete
  2. HI Toni,
    Recently I deployed small a NSX vxlan system and noticed a interesting point:
    If say a host in vlan A on one vtep wish to speak to vlanB on another vtep and vlanB is a slient host so there is no record in arp and arp suppress table or mac table.
    hostA--------vlanA---VTEP1-----------------------VTEP2-----------vlanB------------hostB(slient)
    in this way, nsx will use hybird method (unicast and multicast) to grab vlanB's detail

    what about this case in cisco xvlan network?
    as far as the normal logic, hostA wish to speak to hostB, it will first notice hostB is on a different network and it will send packet to the default gateway (VTEP), VTEP will check the routing table.(while at this stage, there is no record of hostB)
    Here is the my confusing part:
    1. if VTEP has vlan B(HOSTB), vtep can flood the arp request. and then follows the vxlan process.(this is my assumption)
    2. what if on this VTEP, there is no vlan B. what will happen.

    Michael

    ReplyDelete
    Replies
    1. Hi Michael,
      you can find the answers from next part XV which is more detailed explanation of control plane and data plane operation. At the end of the post there are examples about both solution
      a) VTEP-1-VLAN10+VLAN20-------VTEP-2-VLAN10+VLAN20
      b) VTEP-1-VLAN10-------VTEP-2-VLAN20

      Delete
    2. My apology Toni.It seems I am too hurry to ask these questions and thanks again for your kindness.
      while another question comes for b):
      how does VTEP2 knows that VTEP1 does not have vlan20, so it will forward type5 to VTEP1?
      Or shall I ask whether there is some kinds of sync mechniasm in evpn recording who has which vlan?

      Regards
      Michael

      Delete
  3. HI Toni,

    have you considered the aging time issue:
    hostA-----vlanA-----VTEP1-------------VTEP2---------vlanB---------hostB
    say HostB left network and after certain time VTEP2 will remove hostB from mac table.after this will VTEP2 use evpn to inform VTEP1 for the same?
    IF yes, will VTEP2 periodically send update to all other vteps at each aging time or just send out such message when some host left network ?

    Your Sincerely
    Michael

    ReplyDelete
    Replies
    1. I copied this one from the part XV:

      "Example 1-16 shows the ARP table of VRF TENANT77. The default aging time for locally learned ARP-entries is in NX-OS is 1500 seconds, which is 300 seconds shorter than MAC-address aging timer. When the ARP aging timers exceed, the switch checks the presence of the host by sending an ARP-request to host. If the host response to ARP-request, the switch will reset the aging timer. If the host does not reply, the entry is removed from the ARP-table but kept in BGP EVPN table for an additional 1800 seconds (MAC aging timer) before the withdrawn message is sent. The MAC address aging timer should be bigger than the ARP aging timer. This is because the ARP refresh process will also update the MAC table and unnecessary flooding can be avoided."

      Cheers - Toni

      Delete
    2. Dear Toni,
      I kept searching this on Internet and I found this for my confusion, please correct me if I am wrong.
      During EVPN neighbour built, type3 will be sent out acknowleding the peer what L2 VNI it has, so the peer will know whether to use type2 or 5 for update.
      Say:
      vlan10---vlan20----vtep1---------------vtep2---------vlan20
      when vtep1 and 2 form evpn neighbourhood, vtep2 will send out update with nlri type3 acknowledging vtep1 it has vlan 10 and 20.So VTEP1 knows vtep2 only has vlan 20 and when update vlan20 hosts to vtep2, vtep1 will use type2 and when update vlan 10 to vtep2 it will use type 5.

      Cheers
      Michael

      Delete
    3. Hi Michael,

      Route-Type 3 (Inclusive Multicast Route) is used for building an L2BUM distribution tree over the NON-Multicast Underlay Network core. It is not needed when Underlay Network is Multicast enabled.
      If we have topology:
      V10--V20------V20.
      VTEP-B will send route-type2 about MAC/IP known via Vlan20 to VTEP-A.
      This information is then received by VTEP-A and imported into the BGP table based on L2VNI RT.
      VTEP-B does not have VLAN10 in it so it will not import VLAN10 specific information into the BGP table.
      However, both VTEPs are participating in the same VRF Context which uses the same Route-Target and lets us say VLAN100 for Inter-VN routing.
      So when VTEP-A advertises MAC/IP address known via VLAN10, it uses also L3VNI specific route-target which is also imported by VTEP-B.
      This way both VTEPs can route traffic between VLAN10 and VLAN 20.
      Cheers - Toni

      Delete
  4. Hello All

    I my production environment i observed that there is no different RT has been assigned to route type-2 MAC and MAC-IP route.
    My question is in what situation SW assign same RT to both type-2 (MAC and MAC-IP)
    What is the harm or benefit of having same or different RT in BGP EVPN l2 route scenario
    Thanks and Regards

    ReplyDelete
    Replies
    1. The MAC only is for switching while MAC-IP is for ARP. I have explained that in my previous post https://nwktimes.blogspot.com/2018/05/vxlan-part-vii-vxlan-bgp-evpn-control.html. MAC and MAC-IP NLRIs can be send within one BGP Update and or as a unique updates. There is no reason why there should be different RTs because eventually we are describing the reachability of the same end-point.

      Delete
    2. Thanks Toni for the explanation so you mean in some cases i will see different RT and in some cases i may see same RT so it doesnt matter at correct? Can I specify one L3VNI from which my type-2 route can be forwarded ?
      for example i have multiple vlan/SVI(L2VNI) and also i have multiple VRFs(L3VNI) on my leaf SW i want to that some vlan take one VRF and rest vlan take other VRF L3VNI for forwards the type-2 routes from one leaf to other leaf
      Thanks for the support and explanation

      Delete

  5. Very nice and informative blog. It really helped me add some useful points in my knowledge. Thanks for sharing!
    Also check out these amazing Cisco products if you want:

    C3850-NM-8-10G
    C3850-NM-2-10G
    C9200L-24P-4X-E

    ReplyDelete

Note: only a member of this blog may post a comment.