Friday, 28 December 2018

VXLAN Part XV: Analysis of the BGP EVPN Control Plane Operation

Document Status: Unfinished
Edited: Monday, 7 January 2019

This chapter covers the following topics:

MAC address learning process (Intra-VNI switching): This section describes how the local VTEP switch learns the MAC addresses of its’ directly connected hosts from the ingress frame and installs the information into the MAC VRF in Layer 2 Routing Information Base (L2RIB) by the L2 forwarding component (L2FWDER). This section also shows how the local VTEP switch advertises the MAC address information to the remote VTEP switch by using BGP EVPN Route Type 2 advertisement (MAC Advertisement Route) and how the Remote VTEP switch installs information into MAC VRF in L2RIB and from there into MAC address table. Intra-L2VNI (Switching) Data Plane operation is explained at the end of the section with various frame capture examples. The white “MAC line” represents these processes in figure 7-1.

MAC-IP address learning process (ARP for Intra-VNI switching and ): This section gives a detailed description how the local VTEP switch learns the IP addresses of its’ locally connected hosts from ARP messages generated by the host and how the Host Mobility Manager component (HMM) installs the information into the IP VRF. This section also shows how the local VTEP switch advertises the IP address information to the remote VTEP switch by using BGP EVPN Route Type 2 (MAC Advertisement Route) advertisement and how the remote VTEP switch installs this information into IP VRF in L2RIB as well as into L3RIB of VRF TENANT77. In addition, this section explains how the ARP Suppression mechanism use MAC-IP binding information to reduce BUM (Broadcast, Unknown Unicast, and Multicast) traffic in VXLAN Fabric. The grey “IP line” represents these processes in figure 7-1.


Prefix advertisement: This section covers how the local VTEP switch redistributes its Anycast Gateway (AGW) subnets into BGP and advertises this information to the remote VTEP switch by using BGP EVPN Route Type 5 (IP Prefix Route) advertisement. This section also explains how the information is used to discover silent hosts. This section also describes how the remote VTEP installs the route from the BGP into local L3RIB. The black “Prefix line” represents these processes in figure 7-1.

Figure 1-1: BGP EVPN Control Plane Operational Overview.



MAC Address Learning Process (Intra-VNI Switching)



Overview

Phase 1: MAC Address Table on Local VTEP


Virtual Machine Beef comes up. It expresses its’ existence to a network and validates the uniqueness of its IP-address by sending a Gratuitous ARP (GARP). VTEP switch Leaf-101 receives the GARP message from interface e1/2 and stores the MAC address information from the Source MAC address field of Ethernet header into MAC address table of VLAN 10.


Phase 2: MAC VRF on Local VTEP


The L2FWDER component notices the new MAC address from the interface e1/2. L2FWDER then installs the MAC address into MAC VRF (also called EVI instance) located in L2 Routing Information Base (L2RIB) of VRF TENANT77. MAC VRF in L2RIB contains the MAC address and source port information as well as information about topology id (=VLAN Id). Flag field of the learned entry is marked with Local Flag (locally learned MAC address).
Why do we have two almost similar L2 Databases in VTEP switches (MAC table vs. MAC VRF)? Routes can be sent to BGP only if the route is in the RIB. In addition, routes from BGP can be installed into RIB but not directly into MAC address table.



Phase 3: BGP MAC Route Export on Local VTEP

VTEP switch Leaf-101 exports the MAC route from the L2RIB into BGP Loc-RIB, from where it is sent through the Output Policy Engine to Adj-RIB-Out (Pre). From the Adj-RIB-Out (Pre) route is installed through the policy into Adj-RIB-Out (Post) with the Path Attributes based on the BGP peer type (iBGP/eBGP/RR-Client). From the Adj-RIB-Out Post, the MAC Advertisement Route (Route Type 2) Update message is sent to Spine-11 (Route-Reflector). The RR Spine-11 forwards the message to its RR-Client Leaf-102. Figure 1-1 illustrates the whole process while following figures are simplified and Adj-RIB-In/Out are shown as one entity without Pre-Post sub-DBs.

In addition to MAC address and Next Hop information, the NLRI includes the Route Distinguisher (RD), which is a kind of prefix. RD for MAC route is formed from the sender VTEP switch BGP RID + Vlan Id where MAC address belongs to. In Leaf-101, RD value 192.168.77.101:32777 is attached to all outgoing MAC route advertisement concerning VLAN 10. Spine switches use RD information to differentiate possible overlapping MAC/IP information (Spine switches are not L2VNI/VRF aware).

There is also MPLS Label Stack 1 field in NLRI, which includes the L2VNI Identifier. Leaf-101 local VLAN 10 is mapped to VNI 10000 (= MPLS Label Stack 1: 10000). VNI Id is used in Data Plane in VXLAN header.

The update message has two BGP Extended Community Attributes. First one, the Route-Target attribute is used for route export/import policy by VTEP switches. The second one, Encapsulation type defines the encapsulation used in Data Plane (Type 8 = VXLAN).


Phase 4: BGP AFI L2EVPN MAC Route Import on Remote VTEP

VTEP switch Leaf-102 receives the MAC route Advertisement and installs it into Adj-RIB-In Pre database without modification. Routes are imported based on EVPN import policies into Adj-RIB-In Post. During this import process, the RDs are changed from the received RD to RD defined under EVPN Instance. Routes moved into Adj-RIB-In Post are then run through the BGP Best Path decision process and the best route is installed into Loc-RIB.


Phase 5: MAC VRF on Remote VTEP

From the Loc-RIB, route information is imported into L2RIB (MAC VRF). Based on the L2VNI Id carried in MPLS Label Stack 1 field, MAC route is installed into MAC VRF with topology Id 20 (VLAN 20). The source of the information is BGP. Port information points to the remote NVE1 interface IP address of VTEP switch Leaf-101.


Phase 6: MAC Address Table on Remote VTEP

As the last step, the remote VTEP Leaf-102 L2FWDER component installs the MAC reachability information from the MAC VRF into its VLAN 20 MAC address table. The Next-Hop points to Leaf-101 NVE1 interface.

Now both Leaf-101 and Leaf-102 has up to date information on their databases concerning the reachability information of host vmBeef MAC address and they are able to send frames to vmBeef.


BGP EVPN Route type 2: MAC advertisement
Figure 1-3: BGP EVPN Control Plane Operational MAC advertisement.

 
Monitoring

Phase 1: MAC Address Table on Local VTEP

Example 1-1 shows the MAC address table of local VTEP switch Leaf-10. The MAC address 1000.0010.beef is located behind port e1/2 and it belongs to VLAN 10. The default MAC entry aging time is 1800 seconds.

Leaf-101# show system internal l2fwder mac
Legend:
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*    10    1000.0010.beef   dynamic   00:03:27   F     F     Eth1/2
Example 1-1: show system internal l2fwder mac


Phase 2: MAC VRF on Local VTEP

Example 1-2 illustrates the process of how the L2FWDER component notices the new MAC address entering from the interface e1/2 (interface index 1a00200). Example 1-3 verifies the if_index to interface mapping. The received frame has 802.1Q tag, where VLAN Id is set to10. Based on VLAN Id, the L2FWDER component is able to install the MAC reachability information into right MAC VRF. Example 1-4 verifies the VLAN to VNI topology mapping. Example 1-5 illustrates the actual content of MAC VRF in L2RIB.

Leaf-101# show system internal l2fwder event-history events | i beef
l2fwder_dbg_ev, 690 l2fwder_vxlan_mac_update, 886MAC move 1000.0010.beef (10) 0x0 -> 0x1a000200

l2fwder_dbg_ev, 690 l2fwder_l2rib_add_delete_local_mac_routes, 154Adding route  topo-id: 10, macaddr: 1000.0010.beef, nhifindx: 0x1a000200

l2fwder_dbg_ev, 690 l2fwder_l2rib_mac_update, 736MAC move 1000.0010.beef (10) 0x0 -> 0x1a000200

l2fwder_construct_and_send_macmv_ntf_per_cookie, 5261 mac 1000.0010.beef vlan 1 new if_index = 1a000200, old if_index = 0, is_del=0
Example 1-2: show system internal l2fwder event-history events | i beef

Example 1-3 shows from top to down how the L2RIB is updated.

Leaf-101# sh system internal l2rib event-history mac | i beef

Rcvd MAC ROUTE msg: (10, 1000.0010.beef), vni 0, admin_dist 0, seq 0, soo 0,

(10,1000.0010.beef):Mobility check for new rte from prod: 3

(10,1000.0010.beef):Current non-del-pending route local:no, remote:no, linked mac-ip count:1

(10,1000.0010.beef):Clearing routelist flags: Del_Pend,

(10,1000.0010.beef,3):Is local route. is_mac_remote_at_the_delete: 0

(10,1000.0010.beef,3):MAC route created with seq 0, flags L, (),

(10,1000.0010.beef,3): soo 0, peerid 0, pc-ifindex 0

(10,1000.0010.beef,3):Encoding MAC best route (ADD, client id 5)

(10,1000.0010.beef,3):vni:10000 rt_flags:L, admin_dist:6, seq_num:0 ecmp_label:0 soo:0(--)

(10,1000.0010.beef,3):res:Regular esi:(F) peerid:0 nve_ifhdl:1224736769 mh_pc_ifidx:0 nh_count:1

(10,1000.0010.beef,3):NH[0]:Eth1/2

Example 1-3: show system internal l2rib event-history mac | i beef


Example 1-4 show that the if-index 0x1a000200 points to interface e1/2.

Leaf-101# show interface snmp-ifindex | i 0x1a000200

Eth1/2          436208128  (0x1a000200)

Example 1-4: show interface snmp-ifindex | i 0x1a000200

Example 1-5 shows that the VLAN 10 is attached to L2VNI 10000.
Leaf-101# show vlan id 10 vn-segment


VLAN Segment-id
---- -----------
10   10000   

Example 1-5: show vlan id 10 vn-segment

Example 1-6 illustrate the MAC VRF entry in L2RIB of L2VNI 10000.

Leaf-101#  show l2route evpn mac evi 10

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops     
----------- -------------- ------ ------------- ---------- ----------------
10          1000.0010.beef Local  L,            0          Eth1/2        

Example 1-6: show l2route evpn mac evi 10

Phase 3: BGP MAC route processing on Local VTEP

Example 1-7 shows how the BGP process of local VTEP switch Leaf-101 receives the MAC route sent from L2RIB. Leaf-101 installs the MAC route information into BGP Loc-RIB with required information related to BGP EVPN Route-Type 2 advertisement (L2VNI Identifier, Route-Target and Encapsulation type). The bit count /112 at the end of address is the sum of bits for RD (8 octets) + MAC address (6 octets) = 14 octets = 112 bits.

Leaf-101# sh bgp internal event-history events | i beef

BRIB: [L2VPN EVPN] Installing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 (local) via 192.168.100.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 ENCAP:8

EVT: Received from L2RIB MAC route: Add ESI 0000.0000.0000.0000.0000 topo 10000 mac 1000.0010.beef flags 0x000002 soo 0 seq 0 reorig: 0

Example 1-7: show bgp l2vpn evpn 1000.0010.beef


Example 1-8 shows the BGP Loc-RIB entry concerning the NLRI of vmBeef. The address information in BGP entry are explained above:

§  Route Distinguisher 192.168.77.101:32777
§  [2] - BGP EVPN Route-Type 2, MAC/IP Advertisement Route
§  [0] - Ethernet Segment Identifier (ESI), all zeroed out = single homed site
§  [0] - Ethernet Tag Id, EVPN routes must use value 0
§  [48] - Length of MAC address
§  [1000.0010.beef] - MAC address
§  [0] - Length of IP address
§  [0.0.0.0] - Carried IP address
§  /216 - Length of the MAC VRF NLRI in bits: RD (8 octets) + MAC address (6 octets) + L2VNI Id (3 octets) + ESI (10 octets) = 27 octets = 216 bits.

The L2VNI information is shown in the Received Label field. There are also two BGP Extended Community Path Attributes:

§  Route-Target: 65000:10000 - Used for export/Import policy (Control Plane)
§  Encapsulation 8: Defines the encapsulation type VXLAN (Data Plane).

Leaf-101# show bgp l2vpn evpn 1000.0010.beef
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777    (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216, version 28
Paths: (1 available, best #1)
Flags: (0x000102) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path
  AS-Path: NONE, path locally originated
    192.168.100.101 (metric 0) from 0.0.0.0 (192.168.77.101)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8

  Path-id 1 advertised to peers:
    192.168.77.11 
<- Comment: For the simplicity, the MAC-IP entry removed from this output->

Example 1-8: show bgp l2vpn evpn 1000.0010.beef

Capture 1-1 shows the BGP EVPN Update message sent by Leaf-101. Note that the Next Hop address and the MPLS Label Stack (L2VNI ID) are only visible in HEX portion of the capture:
Next Hop:                        HEX c0 a8 64 65 = BIN 192.168.100.101
MPLS Label Stack 1:     HEX 00 27 10 = 10000 (L2VNI id)   
                  
Border Gateway Protocol - UPDATE Message
    Type: UPDATE Message (2)
    Path attributes
        Path Attribute - ORIGIN: IGP
        Path Attribute - AS_PATH: empty
        Path Attribute - LOCAL_PREF: 100
        Path Attribute - EXTENDED_COMMUNITIES
            Type Code: EXTENDED_COMMUNITIES (16)
            Carried extended communities: (2 communities)
                Route Target: 65000:10000 [Transitive 2-Octet AS-Specific]
                    Type: Transitive 2-Octet AS-Specific (0x00)
                    Subtype (AS2): Route Target (0x02)
                    2-Octet AS: 65000
                    4-Octet AN: 10000
                Encapsulation: VXLAN Encapsulation [Transitive Opaque]
                    Type: Transitive Opaque (0x03)
                    Subtype (Opaque): Encapsulation (0x0c)
                    Tunnel type: VXLAN Encapsulation (8)
        Path Attribute - MP_REACH_NLRI
            Flags: 0x90, Optional, Extended-Length, Non-transitive, Complete
            Length: 44
            Address family identifier (AFI): Layer-2 VPN (25)
            Subsequent address family identifier (SAFI): EVPN (70)
            Next hop network address (4 bytes)
            Number of Subnetwork points of attachment (SNPA): 0
            Network layer reachability information (35 bytes)
                EVPN NLRI: MAC Advertisement Route
                  Route Type: MAC Advertisement Route (2)
                  Length: 33
                  Route Distinguisher: 0001c0a84d658009 (192.168.77.101:32777)
                  ESI: 00 00 00 00 00 00 00 00 00
                        ESI Type: ESI 9 bytes value (0)
                        ESI 9 bytes value: 00 00 00 00 00 00 00 00 00
                  Ethernet Tag ID: 0
                  MAC Address Length: 48
                  MAC Address: Private_10:be:ef (10:00:00:10:be:ef)
                  IP Address Length: 0
                  IP Address: NOT INCLUDED
                  MPLS Label Stack 1: 625, (BOGUS: Bottom of Stack NOT set!)

0000  5e 00 00 01 00 07 5e 00 00 00 00 07 08 00 45 c0   ^.....^.......E.
0010  00 9c 74 69 00 00 40 06 e9 71 c0 a8 4d 65 c0 a8   ..ti..@..q..Me..
0020  4d 0b 66 ea 00 b3 52 ff 40 6e f4 86 72 ab 80 18   M.f...R.@n..r...
0030  0e 42 7f 75 00 00 01 01 08 0a 00 0f 04 b0 00 0f   .B.u............
0040  02 c5 ff ff ff ff ff ff ff ff ff ff ff ff ff ff   ................
0050  ff ff 00 68 02 00 00 00 51 40 01 01 00 40 02 00   ...h....Q@...@..
0060  40 05 04 00 00 00 64 c0 10 10 00 02 fd e8 00 00   @.....d.........
0070  27 10 03 0c 00 00 00 00 00 08 90 0e 00 2c 00 19   '............,..
0080  46 04 c0 a8 64 65 00 02 21 00 01 c0 a8 4d 65 80   F...de..!....Me.
0090  09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30   ...............0
00a0  10 00 00 10 be ef 00 00 27 10                     ........'.

Capture 1-1: BGP EVPN Update concerning the MAC address of vmBeef

Phase 4: BGP MAC Route Import on Remote VTEP

Example 1-9 shows the partial output of BGP Adj-RIB-In and BGP Loc-RIB tables of remote VTEP switch Leaf-102 concerning the MAC address of vmBeef NLRI. The first part after Comment-1 shows update entries stored into Adj-RIB-In. The only difference compared to what was seen in VTEP Leaf-101 BGP Loc-RIB is that the switch Spine (RR) has added an “Originator (Leaf-101)” and “Cluster List (Spine-11)” information to update message. The second part after Comment-2 shows the BGP Loc-RIB information imported from BGP Adj-RIB-In through the Policy Engine and decision process. If we compare NLRI information between BGP Adj-RIB-In and BGP Loc-RIB, we can see, that during the import process, the only changed NLRI information is Route Distinguisher. IP address part is changed to correspond the BGP RID of Leaf-102 and the later part has changed from 32777 to 32787 because of the different VLAN id attached to L2VNI 10000 in Leaf-102 (VLAN 10 in Leaf-101 and VLAN 20 in Leaf-102)


Leaf-102# show bgp l2vpn evpn 1000.0010.beef

<Comment-1: this BGP Adj-RIB-In received from Spine-11>
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216, version 277
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 1 destination(s)
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

<MAC-IP part Snipped>

<Comment-2: this BGP Loc-RIB Imported from BGP Adj-RIB-In>
Route Distinguisher: 192.168.77.102:32787    (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216, version 278
Paths: (1 available, best #1)
Flags: (0x000212) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, in rib
             Imported from 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

Example 1-9: show bgp l2vpn evpn 1000.0010.beef



Example 1-10 shows the BGP Import process (only partial output for simplicity). The VTEP Leaf-102 receives the BGP EVPN Update. It installs the route into BGP Adj-RIB-In. It validates the Next Hop and then the route is imported into BGP Loc-RIB. From the BGP Loc-RIB route is sent to L2RIB.

Leaf-102#  sh bgp internal event-history events | i beef

<Comment-3: Route is sent to L2RIB from BGP Loc-RIB>

RIB: [L2VPN EVPN]: Send to L2RIB 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112

RIB: [L2VPN EVPN] For 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112, added 1 next hops, suppress 0

RIB: [L2VPN EVPN] Adding 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 via 192.168.100.101 to NH list (flags2: 0x0)

RIB: [L2VPN EVPN] Add/delete 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112, flags=0x200, in_rib: no

IMP: [L2VPN EVPN] Created import destination entry for 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112

<Comment-2: Route is installed into BGP Loc-RIB from the BGP Adj-RIB-In>

IMP: [L2VPN EVPN] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 to <default> RD 192.168.77.102:32787

BRIB: [L2VPN EVPN] (192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 (192.168.77.11)): returning from bgp_brib_add, reeval=0new_path: 1, change: 1, undelete: 0, history: 0, force: 0, (pfl
ags=0x40002010) rnh_flag_change 0

BRIB: [L2VPN EVPN] (192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 (192.168.77.11)): bgp_brib_add: handling nexthop, path->flags2: 0x80000

BRIB: [L2VPN EVPN] Created new path to 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 via 192.168.77.111 (pflags=0x40000000, pflags2=0x0)

<Comment-1: Route is installed into BGP Adj-RIB-In>

BRIB: [L2VPN EVPN] Installing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 (192.168.77.11) via 192.168.100.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 ENCAP:8

Example 1-10: sh bgp internal event-history events | i beef



Phase 5: MAC VRF on Remote VTEP

As shown in previous example 1-10, the MAC route information is sent from BGP Loc-RIB to L2RIB. Example 1-11 shows the operation of L2FWDER and example 1-12 shows the installation process. Example 1-13 verifies the VLAN to VNI topology mapping. Example 1-14 illustrates the actual content of MAC VRF in L2RIB.
Leaf-102# show system internal l2fwder event-history events | i beef
l2fwder_dbg_ev, 690 l2fwder_l2rib_add_remote_entry, 299Add remote mac entry mac: 1000.0010.beef vni: 20 sw_bd 20 vtep ip: 192.168.100.101

l2fwder_dbg_ev, 690 l2fwder_l2rib_msg_cb, 453MAC address: 1000.0010.beef

Example 1-11: show system internal l2fwder event-history events | i beef

Example 1-12 shows from top to down how the L2RIB is updated.

Leaf-102# sh system internal l2rib event-history mac | i beef
Rcvd MAC ROUTE msg: (20, 1000.0010.beef), vni 0, admin_dist 0, seq 0, soo 0,

(20,1000.0010.beef):Mobility check for new rte from prod: 5

(20,1000.0010.beef):Current non-del-pending route local:no, remote:yes, linked mac-ip count:1

(20,1000.0010.beef):Mobility type: remote-to-remote:

(20,1000.0010.beef): New route ESI: (F), SOO: 0, Seq num: 0Existing route ESI: (F), SOO: 0, Seq num: 0 , rt_type: 1

20,1000.0010.beef,5):Using seq number from Recv-based BGP route

(20,1000.0010.beef,5):Setting Recv flag

(20,1000.0010.beef,5):MAC route modified (rc=0) with seq num:0, flags: (SplRcv), soo:0, peerid:1, MH<truncated>

(20,1000.0010.beef,5):Encoding MAC route (ADD, client id 0)

(20,1000.0010.beef,5):vni:10000 rt_flags: admin_dist:20, seq_num:0 ecmp_label:0 soo:0(--)

(20,1000.0010.beef,5):res:Regular esi:(F) peerid:1 nve_ifhdl:1224736769 mh_pc_ifidx:0 nh_count:1

(20,1000.0010.beef,5):NH[0]:192.168.100.101

Example 1-12: sh system internal l2rib event-history mac | i beef
  
Example 1-13 shows that the VLAN 20 is attached to L2VNI 10000.
Leaf-102# sh vlan id 20 vn-segment

VLAN Segment-id
---- -----------
20   10000     

Example 1-13: sh vlan id 20 vn-segment

Example 1-14 shows the MAC VRF entry in L2RIB of L2VNI 10000.

Leaf-102# show l2route evpn mac evi 20

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops     
----------- -------------- ------ ------------- ---------- ----------------
20          1000.0010.beef BGP    SplRcv        0          192.168.100.101

Example 1-14: show l2route evpn mac evi 20

Phase 6: MAC Address Table on Remote VTEP

Example 1-15 shows the updated MAC address table of VTEP switch Leaf-102.

Leaf-102# show system internal l2fwder mac | i beef
Legend:
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*    20    1000.0010.beef    static   -          F     F  (0x47000001) nve-peer1

Example 1-15: show system internal l2fwder mac | i beef


Data Plane testing

ARP Request/Reply
Both Virtual Machines vmBeef and vmAbba belongs to the same subnet 192.168.11.0/24. VmBeef starts pinging to vmAbba. VmBeef does not yet have the MAC address information of host vmAbba in its ARP table so it starts address resolution process (figure 1-4). It sends an ARP-request message where it asks who has the IP address 192.168.11.22. The destination MAC address of ARP request is L2 Broadcast address ff.ff.ff.ff.ff.ff.

When Leaf-101 receives the frame from its port e1/2, it checks the VLAN Id from the 802.1Q tag and based on it, Leaf-101 knows that the Broadcast frame belongs to and has to be switched inside Local VLAN 10 and global L2VNI 10000. Leaf-101 removes the 802.1Q tag from the original Ethernet frame and add the VXLAN header, UDP header, IP header, and outer Ethernet header. The outer Ethernet header gets it source MAC address from the Leaf-101 NVE 1 interface while the destination MAC address is derived from the destination Multicast Group IP address 238.0.0.10. The IP header destination address is 238.0.0.10, which is Mcast group address used in VNI 10000 for BUM traffic. The source IP address is taken from the NVE 1 Interface. UDP destination port is 4789, which is reserved for VXLAN and the UDP source port is generated based on inner frame payload. Note that the UDP source port is the only changing variable when doing ECMP load balancing between the equal-cost links based on 5-tuple input (destination IP/source IP, Layer 4 Protocol and source port/destination port). The VXLAN header Virtual Network Identifier (VNI) is taken from the VLAN to VNI database where VLAN 10 belongs to (VNI 10000). Leaf-101 forwards the packet out of all Interfaces belonging to Outgoing Interface List (OIL) of Mcast Group 238.0.0.10.

VTEP switch Leaf-102 receives the ARP-request sent by vmBeef. Leaf-102 removes the headers used for VXLAN tunneling (outer Ethernet header, IP header, UDP header, and VXLAN header). Based on the VNI-to-VLAN mapping database, Leaf-102 knows that it has to switch received Broadcast Ethernet frame out of its interfaces participating in VLAN 20. Leaf-102 adds the 801.Q tag with VLAN Id 20 into the frame and forwards it out of the Interface e1/2 towards vmAbba.


Figure 1-4: ARP request processing

 The Capture 1-2 shows the ARP-request message captured from the Host-1 uplink to VTEP switch Leaf-101.



Ethernet II, Src: 10:00:00:10:be:ef, Dst: Broadcast (ff:ff:ff:ff:ff:ff)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0000 1010 = ID: 10
    Type: ARP (0x0806)
    Padding: 0000000000000000000000000000
    Trailer: 00000000
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: 10:00:00:10:be:ef
    Sender IP address: 192.168.11.12
    Target MAC address: 00:00:00_00:00:00
    Target IP address: 192.168.11.22

Capture 1-2: ARP request from vmBeef to vmAbba: vmBeef to Leaf-101.



Capture 1-3 shows the ARP-message captured from the link between the VTEP switch Leaf-101 and Spine switch Spine-11.


Ethernet II, Src: 5e:00:00:00:00:07 (5e:00:00:00:00:07), Dst: IPv4mcast_0a (01:00:5e:00:00:0a)
    Destination: IPv4mcast_0a (01:00:5e:00:00:0a)
    Source: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.100.101, Dst: 238.0.0.10
User Datagram Protocol, Src Port: 62378, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 10000
    Reserved: 0
Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
    Destination: Broadcast (ff:ff:ff:ff:ff:ff)
    Source: Private_10:be:ef (10:00:00:10:be:ef)
    Type: ARP (0x0806)
    Trailer: 000000000000000000000000000000000000
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: Private_10:be:ef (10:00:00:10:be:ef)
    Sender IP address: 192.168.11.12
    Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Target IP address: 192.168.11.22

Capture 1-3: ARP request from vmBeef to vmAbba: Leaf-101 to Spine-11.

Capture 1-4 shows the ARP Request message captured from the link between the VTEP switch Leaf-102 and vmAbba.

Ethernet II, Src: 10:00:00:10:be:ef, Dst: ff:ff:ff:ff:ff:ff
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 20
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0001 0100 = ID: 20
    Type: ARP (0x0806)
    Padding: 0000000000000000000000000000
    Trailer: 00000000
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: Private_10:be:ef (10:00:00:10:be:ef)
    Sender IP address: 192.168.11.12
    Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
    Target IP address: 192.168.11.22

Capture 1-4: ARP request from vmBeef to vmAbba: Captured from the link Leaf-102 to vmAbba.

VmAbba receives the ARP-request. It sends and an ARP-reply message as Unicast to vmBeef. The process of frame handling is illustrated in figure 1-5.

 Figure 1-5: ARP reply processing


Capture 1-5 shows the ARP Reply message captured from the link between the VTEP switch Leaf-102 and vmAbba.

Ethernet II, Src: 10:00:00:20:ab:ba, Dst: 10:00:00:10:be:ef
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 20
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0001 0100 = ID: 20
    Type: ARP (0x0806)
    Padding: 0000000000000000000000000000
    Trailer: 00000000
Address Resolution Protocol (reply)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: reply (2)
    Sender MAC address: 10:00:00:20:ab:ba
    Sender IP address: 192.168.11.22
    Target MAC address: 10:00:00:10:be:ef
    Target IP address: 192.168.11.12:ef
    Target IP address: 192.168.11.12
Capture 1-5: ARP reply from vmAbba to vmBeef: Captured from the link Host-2 and Leaf-102.

Capture 1-6 shows the ARP-Reply message captured from the link between the VTEP switch Leaf-101 and Spine-11.

Ethernet II, Src: 5e:00:00:01:00:07, Dst: 5e:00:00:00:00:07
Internet Protocol Version 4, Src: 192.168.100.102, Dst: 192.168.100.101
User Datagram Protocol, Src Port: 59206, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 10000
    Reserved: 0
Ethernet II, Src: 10:00:00:20:ab:ba, Dst: 10:00:00:10:be:ef
Address Resolution Protocol (reply)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: reply (2)
    Sender MAC address: 10:00:00:20:ab:ba
    Sender IP address: 192.168.11.22
    Target MAC address: 10:00:00:10:be:ef
    Target IP address: 192.168.11.12
Capture 1-6: ARP reply from vmAbba to vmBeef: Leaf-101 to Spine-11.

Capture 1-7 shows the ARP-Reply message captured from the link between the VTEP switch Leaf-101 and vmBeef.

Ethernet II, Src: 10:00:00:20:ab:ba, Dst: 10:00:00:10:be:ef
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0000 1010 = ID: 10
    Type: ARP (0x0806)
    Padding: 0000000000000000000000000000
    Trailer: 00000000
Address Resolution Protocol (reply)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: reply (2)
    Sender MAC address: Private_20:ab:ba (10:00:00:20:ab:ba)
    Sender IP address: 192.168.11.22
    Target MAC address: Private_10:be:ef (10:00:00:10:be:ef)
    Target IP address: 192.168.11.12

Capture 1-7: ARP reply from vmAbba to vmBeef: Captured from the link Leaf-101 to vmBeef.

ICMP Request/Reply

After resolving the MAC address of vmAbba, vmBeef sends an ICMP request to vmBeef. It sends the ICMP-request message with the destination IP address 192.168.11.22. The destination MAC address in Ethernet frame is previously resolved MAC address 1000.0020.abba.

VTEP switch Leaf-101 receives the frame and base on VLAN Id 10 in VLAN tag in 802.1Q header, Leaf-101 notices that the frame belongs to L2VNI 10000. Leaf-101 forwards frame based on the information found from the MAC address table of VLAN 10. MAC address entry information concerning to MAC address of vmAbba is taken from L2RIB which in turn has received from BGP. Leaf-101 encapsulates the frame inside a new Ethernet header, IP header, UDP header, and VXLAN header and forwards it towards Leaf-102 via Spine-11.

VTEP switch Leaf receives the Unicast frame, it removes the outer Ethernet header, outer IP header, UDP header, and VXLAN header and forwards the original frame tagged with 802.1Q tag with VLAN Id 20 to vmAbba.

Figure 1-6: ICMP request from vmBeef to vmAbba.

Capture 1-8 shows the ICMP Request message captured from the link between the VTEP switch Leaf-101 vmBeef.


Ethernet II, Src: 10:00:00:10:be:ef, Dst: 10:00:00:20:ab:ba
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0000 1010 = ID: 10
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.11.22
Internet Control Message Protocol

Capture 1-8: ICMP request from vmBeef to vmAbba: Capture from Leaf-101 to vmBeef.

Capture 1-9 shows the ICMP Request message captured from the link between the VTEP switch Leaf-101 and Spine-11.

Ethernet II, Src: 5e:00:00:00:00:07, Dst: 5e:00:00:01:00:07
Internet Protocol Version 4, Src: 192.168.100.101, Dst: 192.168.100.102
User Datagram Protocol, Src Port: 57986, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 10000
    Reserved: 0
Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: Private_20:ab:ba (10:00:00:20:ab:ba)
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.11.22
Internet Control Message Protocol

Capture 1-9: ICMP request from vmBeef to vmAbba: Capture from Leaf-101 to Spine-11.

Capture 1-10 shows the ICMP Request message captured from the link between the VTEP switch Leaf-102 and Host-2.

Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: Private_20:ab:ba (10:00:00:20:ab:ba)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 20
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0001 0100 = ID: 20
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.11.22
Internet Control Message Protocol

Capture 1-10: ICMP request from vmBeef to vmAbba: Capture from Leaf-102 to vmAbba.


When vmAbba receives the ICMP Request, its replies it by sending ARP-Reply message to vmBeef. The frame processing is the same than what was shown in ARP-Request process.

  Figure 1-7: ICMP Reply from vmAbba to vmBeef.


Capture 1-11 shows the ICMP Reply message captured from the link between the VTEP switch Leaf-102 and vmAbba.

Ethernet II, Src: Private_20:ab:ba (10:00:00:20:ab:ba), Dst: Private_10:be:ef (10:00:00:10:be:ef)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 20
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0001 0100 = ID: 20
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.11.22, Dst: 192.168.11.12
Internet Control Message Protocol

Capture 1-11: ICMP reply from vmAbba to vmBeef: Capture from the link Leaf-102 to vmAbba.


Capture 1-12 shows the ICMP Reply message captured from the link between the VTEP switch Leaf-101 and Spine-11.

Ethernet II, Src: 5e:00:00:01:00:07 (5e:00:00:01:00:07), Dst: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Internet Protocol Version 4, Src: 192.168.100.102, Dst: 192.168.100.101
User Datagram Protocol, Src Port: 57648, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 10000
    Reserved: 0
Ethernet II, Src: Private_20:ab:ba (10:00:00:20:ab:ba), Dst: Private_10:be:ef (10:00:00:10:be:ef)
Internet Protocol Version 4, Src: 192.168.11.22, Dst: 192.168.11.12
Internet Control Message Protocol

Capture 1-12: ICMP reply from vmAbba to vmBeef: Capture from the link Leaf-101 to Spine-11.

Capture 1-13 shows the ICMP Reply message captured from the link between the VTEP switch Leaf-101 and vmBeef.

Ethernet II, Src: Private_20:ab:ba (10:00:00:20:ab:ba), Dst: Private_10:be:ef (10:00:00:10:be:ef)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0000 1010 = ID: 10
    Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.11.22, Dst: 192.168.11.12
Internet Control Message Protocol

Capture 1-13: ICMP request from vmAbba to vmBeef: Captured from the link Leaf-101 to vmBeef.

Summary

This section shows how the local VTEP switch learns MAC addresses of its connected hosts and how this information is advertised to remote VTEP switches. This chapter also shows the Data Plane operation between the hosts connected to different VTEP switches in the same L2VNI (Layer 2 domain).



MAC-IP Address Learning Process (ARP for Intra-VNI Switching)


The previous section explains the process of MAC address information propagation in VXLAN Fabric. This section starts by explaining how the local VTEP switch Leaf-101 learns the MAC-IP information of its connected host vmBeef and how it how delivers the information to remote VTEP by using BGP EVPN. The second part of this section explains how VTEP switches use the MAC-IP information to reduce the BUM traffic in VXLAN fabric by using ARP-Suppression.

The MAC-IP learning process starts when the vmBeef comes up and sends an ARP message. This ARP message can be GARP which vmBeef informs its existence to a network and ensures the uniqueness of its IP address or it can be an ARP by which vmBeef try to resolve the MAC address of its Gateway. VTEP switch Leaf-101 installs the MAC-IP address information from the ARP payload into ARP-table. When the ARP suppression is enabled (per VNI), the MAC-IP binding information is also saved into local ARP Suppression Cache. The Host Mobility Manager (HMM) component installs the information into Local Host database and sends the MAC-IP information to BGP process where it is stored into BGP Loc-RIB. The information is advertised to remote VTEP switches by using BGP EVPN Route Type 2 Update (MAC/IP Route Advertisement). The receiving VTEP switch Leaf-102 installs the route first into BGP-Adj-RIB and from where the route is imported into BGP Loc-RIB based on import policy defined under specific EVPN Instance. From the BGP Loc-RIB, the information is stored into IP VRF in L2RIB. As the last step, the MAC-IP information is stored into ARP Suppression Cache. (if ARP suppression is enabled)

This section starts with the MAC-IP learning process overview and then explains the process with examples. Figure 1-8 illustrates the components and databases related to the MAC-IP learning process.


MAC-IP Address Learning Overview


Phase 1: ARP Table on Local VTEP


Virtual Machine Beef located in host-1 comes up. It expresses its’ existence to a network and validates the uniqueness of its IP-address by sending a GARP. VTEP switch Leaf-101 receives the GARP message from interface e1/2 and stores the MAC-IP address binding information from the Sender MAC and the Sender IP fields from the GARP payload into ARP table.


Phase 2-3: MAC-IP on Local VTEP


The Host Mobility Manager component (HMM) learns the MAC-IP information as a local route. HMM installs the information into Local Host Database and forwards the MAC-IP information into IP VRF of L2RIB (MAC-only information is installed into MAC VRF). The Local Host Database includes information about the IP address (/32), MAC address, SVI, and local interface, L2RIB has the same information without SVI information.

  

Phase 4: BGP Route Export on Local VTEP


VTEP switch Leaf-101 installs the MAC-IP route from the L2RIB into the BGP Loc-RIB. The MAC-IP information is advertised as a separate BGP EVPN Route Type 2 advertisement (dedicated updates for both MAC-only and MAC-IP NLRIs). The difference in carried NLRI information between MAC-Only and MAC-IP route advertisement is that later one has also host IP address and mask information as well as an additional MPLS Label Stack 2 information, that defines the L3VNI used in VRF TENANT77. There are also two additional Extended Communities; RT 65000:10077 and Router MAC 5e00.0000.0007 carried within the update.

Phase 5: BGP Route Import on Remote VTEP


VTEP switch Leaf-102 receives the BGP EVPN MAC route Advertisement and installs it to BGP Adj-RIB-In database without any modification. From there, Leaf-102 imports the route to its BGP Loc-RIB database based on RT import policy. When remote VTEP switch Leaf-102 imports the route from the BGP Adj-RIB into BGP Loc-RIB, it changes the RD to 192.168.77.102:32787 based on its BGP RID and VLAN Id. This process is the same than MAC-Only route import and is based on the same RT 65000:10000.

Phase 6: IP VRF on Remote VTEP


Remote VTEP Leaf-102 verifies the reachability of Next Hop IP address informed in NLRI and since it is a hit, the L2FWDER component installs the MAC-IP route into L2RIB as an IP VRF entry. Local topology ID is now 20 (based on VLAN 20) and the source of the information is BGP. Port information points to the NVE1 interface IP address of VTEP switch Leaf-101.

At this phase both VTEP switches has information of MAC-IP of vmBeef in their IP VRF of L2RIB as well as in BGP tables but only local VTEP switch Leaf-101 has installed the MAC-IP binding information into ARP table.

Figure 1-8: MAC-IP learning process.


MAC-IP Address Monitoring


Phase 1: ARP Table on Local VTEP

Example 1-16 shows the ARP table of VRF TENANT77. The default aging time for locally learned ARP-entries is in NX-OS is 1500 seconds, which is 300 seconds shorter than MAC-address aging timer. When the ARP aging timers exceed, the switch checks the presence of the host by sending an ARP-request to host. If the host response to ARP-request, the switch will reset the aging timer. If the host does not reply, the entry is removed from the ARP-table but kept in BGP EVPN table for an additional 1800 seconds (MAC aging timer) before the withdrawn message is sent. The MAC address aging timer should be bigger than the ARP aging timer. This is because the ARP refresh process will also update the MAC table and unnecessary flooding can be avoided.

Leaf-101# sh ip arp vrf TENANT77

<snipped>

IP ARP Table for context TENANT77
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
192.168.11.12   00:03:34  1000.0010.beef  Vlan10 

Example 1-16: sh ip arp vrf TENANT77


Phase 2-3: MAC-IP on Local VTEP

Example 1-17 shows the partial MAC-IP learning process on Leaf-101.

Leaf-101# show system internal l2rib event-history mac-ip

L2RIB MAC-IP Object Event Logs:

Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 12

Rcvd MAC-IP ROUTE msg: (10, 1000.0010.beef, 192.168.11.12), l2 vni 0, l3 vni 10077,

(10,1000.0010.beef,192.168.11.12):MAC-IP entry created

(10,1000.0010.beef,192.168.11.12,12):MAC-IP route created with flags 0, l3 vni 10077, seq 0

(10,1000.0010.beef,192.168.11.12,12): admin dist 7, soo 0, peerid 0, peer ifindex 0

(10,1000.0010.beef,192.168.11.12,12): esi (F), pc-ifindex 0

(10,1000.0010.beef,192.168.11.12,12):Encoding MAC-IP best route (ADD, client id 5), esi: (F)

Example 1-17: show system internal l2rib event-history mac-ip


Example 1-18 shows the information related to vmBeef MAC-IP binding in Local Host Database (HMM RIB) of VRF TENANT77.

Leaf-101# show fabric forwarding ip local-host-db vrf TENANT77

HMM host IPv4 routing table information for VRF TENANT77
<snipped>
    Host              MAC Address     SVI     Flags     Physical Interface
*   192.168.11.12/32  1000.0010.beef  Vlan10  0x420201   Ethernet1/2

Example 1-18: show fabric forwarding ip local-host-db vrf TENANT77


Example 1-19 shows that the information concerning the MAC-IP of vmBeef in IP VRF in L2RIB is produced by HMM component.

Leaf-101# show l2route mac-ip topology 10 detail

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated
Topology    Mac Address    Prod   Flags  Seq No  Host IP        Next-Hops     
----------- -------------- ------ ------ ------- -------------  -----------
10          1000.0010.beef HMM    --     0       192.168.11.12  Local         
            L3-Info: 10077

Example 1-19: show fabric forwarding ip local-host-db vrf TENANT77



Phase 4: BGP Route Export on Local VTEP

Example 1-20 shows the internal process how VTEP switch Leaf-101 receives the MAC-IP route information and installs it into RIB and BGP Loc-RIB. Note that BGP Extended Community Router MAC information is not shown in the output. The mask length is includes RD (8 octet) + MAC address (6 octet) + IP address (4 octet) = 18 octets = 144 bits. The octet count of the prefix can be seen from the RIB event “Adding Prefix”.


Leaf-101# sh bgp internal event-history events | i beef
BRIB:
[L2VPN EVPN] Installing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 (local) via 192.168.100.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Rou

RIB:
[L2VPN EVPN] Adding prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12] Route Length 16 Prefix Length 18:

EVT:
Received from L2RIB MAC-IP route: Add ESI 0000.0000.0000.0000.0000 topo 10000 mac 1000.0010.beef ip 192.168.11.12 L3 VNI 10077 flags 00000000 soo 0 seq 0, reorig :0

Example 1-20: Leaf-101# sh bgp internal event-history events | i beef


Example 1-21 shows the BGP Loc-RIB concerning the MAC-IP NLRI of vmBeef. Prefix information is explained belov:

§  Route Distinguisher
§  [2] - BGP EVPN Route-Type 2, MAC/IP Advertisement Route
§  [0] - Ethernet Segment Identifier (ESI), all zeroed out = single homed site
§  [0] - Ethernet Tag Id, EVPN routes must use value 0
§  [48] - Length of MAC address
§  [1000.0010.beef] - MAC address
§  [32] - Length of IP address
§  [192.168.11.12] - Carried IP address
§  /272 - Length of the MAC-IP VRF NLRI in bits: RD (8 octets) + MAC address (6 octets) + L2VNI Id (3 octets) + L3VNI Id (3 octets) + IP address (4 octets) ESI (10 octets) = 34 octets = 272 bits.
§   
The L2VNI information is shown in Received Label field. There are also three BGP Extended Community Path Attributes:

§  Route-Target: 65000:10000 - Used for export/Import policy (L2VNI)
§  Route-Target: 65000:10077 - Used for export/Import policy (L3VNI)
§  Encapsulation 8: Defines the encapsulation type VXLAN (Data Plane)
§  Router MAC: 5e00.0000.0007 - Used for Inner MAC Header source address for routed packets. This is needed because VXLAN is MAC in IP/UDP encapsulation tunneling mechanism and data payload over L3 border does not carry source host MAC address information. This is where the RMAC is used.


Leaf-101# sh bgp l2vpn evpn 192.168.11.12

BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777    (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 5
Paths: (1 available, best #1)
Flags: (0x000102) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path
  AS-Path: NONE, path locally originated
    192.168.100.101 (metric 0) from 0.0.0.0 (192.168.77.101)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007

  Path-id 1 advertised to peers:
    192.168.77.11 

Example 1-21: sh bgp l2vpn evpn 192.168.11.12


Phase 5: BGP AFI L2VPN EVPN MAC Route Import on Remote VTEP

Example 1-21 shows the internal process, where received MAC-IP route is installed into BGP Adj-RIB-In with RD 192.168.100.101:32777. This route is imported into BGP Loc-RIB with RD 192.168.100.102:32787 and send to L2RIB. Note that the example includes the installation process of L3RIB.

Leaf-102# sh bgp internal event-history events | i beef


RIB: [L2VPN EVPN]: Send to L2RIB 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144

RIB: [L2VPN EVPN] For 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144, added 1 next hops, suppress 0

RIB: [L2VPN EVPN] Adding 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 via 192.168.100.101 to NH list (flags2: 0x0)

RIB: [L2VPN EVPN] Add/delete 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144, flags=0x200, in_rib: no

IMP: [L2VPN EVPN] Created import destination entry for 192.168.77.102:3:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144

IMP: [L2VPN EVPN] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <default> RD 192.168.77.102:3

IMP: [L2VPN EVPN] Created import destination entry for 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144

IMP: [L2VPN EVPN] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <default> RD 192.168.77.102:32787

IMP: [IPv4 Unicast] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <TENANT77> RD 192.168.77.102:3

RIB: [L2VPN EVPN] Add/delete 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144, flags=0x200, evi_ctx invalid, in_rib: no

BRIB: [L2VPN EVPN] (192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 (192.168.77.11)): returning from bgp_brib_add, reeval=0new_path: 1, change: 1, undelete: 0, history: 0, force: 0, (pflags=0x40002010) rnh_flag_ch

BRIB: [L2VPN EVPN] (192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 (192.168.77.11)): bgp_brib_add: handling nexthop, path->flags2: 0x80000

BRIB: [L2VPN EVPN] Created new path to 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 via 192.168.77.111 (pflags=0x40000000, pflags2=0x0)

BRIB: [L2VPN EVPN] Installing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 (192.168.77.11) via 192.168.100.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 RT:65000:10077 ENC

Example 1-21: sh bgp internal event-history events | i beef

Example 1-22 shows the partial output of BGP Adj-RIB-In and BGP Loc-RIB tables. The L3VNI routing information is excluded for simplicity. The first part after Comment-1 includes information received via BGP EVPN Route Type 2 MAC-IP route Advertisement originated by VTEP switch Leaf-101. The only notable difference compared to what was seen in VTEP Leaf-101 BGP Loc-RIB is that the switch Spine (RR) has added a “Originator (Leaf-101)” and “Cluster List (Spine-11)” information to update message. The second part after Comment-2 shows the BGP Loc-RIB information imported from BGP Adj-RIB-In. If we compare information installed into BGP Adj-RIB-In and BGP Loc-RIB, we can see that during the import process from Adj-RIB-In into Loc-RIB the only changing NLRI information is Route Distinguisher, just like in case of MAC-only route import

Leaf-102# sh bgp l2vpn evpn 192.168.11.12

BGP routing table information for VRF default, address family L2VPN EVPN

<  Comment#1 BGP Adj-RIB-In update originated by Leaf-101 >
Route Distinguisher: 192.168.77.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 6
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 3 destination(s)
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

<  Comment#2 – BGP Loc-RIB imported from Adj-RIB >
Route Distinguisher: 192.168.77.102:32787    (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 7
Paths: (1 available, best #1)
Flags: (0x000212) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, in rib
             Imported from 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8
      Router MAC:5e00.0000.0007
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

< Comment#3 - L3VNI 10077 information removed for simplicity  >

Example 1-22: sh bgp l2vpn evpn 192.168.11.12


Phase 6: IP VRF on Remote VTEP

Example 1-23 shows the partial MAC-IP learning process.

Leaf-102# sh system internal l2rib event-history mac-ip

L2RIB MAC-IP Object Event Logs:

Rcvd MAC-IP ROUTE BASE msg: obj_type:13 oper_type:1 oper_sbtype: 0 producer: 5
Rcvd MAC-IP ROUTE msg:(20, 1000.0010.beef, 192.168.11.12), l2 vni 0, l3 vni 0,
Rcvd MAC-IP ROUTE msg: flags , admin_dist 0, seq 0, soo 0, peerid 0,
Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 1, pc-ifindex 0
NH: 192.168.100.101
(20,1000.0010.beef,192.168.11.12):MAC-IP entry created
(20,1000.0010.beef,192.168.11.12,5):MAC-IP route created with flags 0, l3 vni 0, seq 0
(20,1000.0010.beef,192.168.11.12,5): admin dist 20, soo 0, peerid 0, peer ifindex 0
(20,1000.0010.beef,192.168.11.12,5): esi (F), pc-ifindex 0

Example 1-23: sh system internal l2rib event-history mac-ip

Example 1-24 shows that the MAC-IP information in L2RIB is produced by BGP.

Leaf-102# show l2route mac-ip topology 20 detail

<snipped>
Topology    Mac Address    Prod   Flags Seq No  Host IP        Next-Hops     
----------- -------------- ------ ----- -----   ------         ------------
20          1000.0010.beef BGP    --     0      192.168.11.12  192.168.100.101
Example 1-24: sh system internal l2rib event-history mac-ip

At this phase, both VTEP switches have the MAC-IP address information of vmBeef.

ARP-Suppression


The previous section explains how the MAC-IP address information is propagated in BGP EVPN VXLAN fabric. This section describes how the VTEP switches use MAC-IP binding information to reduce the unnecessary Broadcast traffic in VXLAN fabric.

We are going start from the phase where vmBeef comes up and send GARP/ARP message to the network. Leaf-101 installs the MAC-IP binding information into ARP table of VRF TENANT77. Example 1-25 shows the ARP table and figure 1-9 illustrates the overall process.

 Figure 1-9: MAC-IP information in ARP table and ARP Suppress Cache..



Leaf-101# sh ip arp vrf TENANT77 | b Address

Address         Age       MAC Address     Interface       Flags
192.168.11.12   00:02:01  1000.0010.beef  Vlan10         
Example 1-25: sh system internal l2rib event-history mac-ip

When VNI based ARP-Suppression is enabled on local VTEP switches, the MAC-IP address binding information is also installed into local ARP Suppression Cache from the ARP table. (Example 1-26).

Leaf-101# sh ip arp suppression-cache detail

<snipped>

Ip Address    Age  Mac Address  Vlan Physical-ifindex  Flags Remote Vtep Addrs

192.168.11.12   00:03:06 1000.0010.beef   10 Ethernet1/2   L
Example 1-26: sh ip arp suppression-cache detail

When ARP-suppression enabled on remote VTEP switches, the ARP Suppression Cache information is taken from the IP VRF of L2RIB. Example 1-27 illustrates this on Leaf-102 perspective.

Leaf-102# show ip arp suppression-cache detail

<snipped>
Ip Address   Age   Mac Address Vlan Physical-ifindex  Flags Remote Vtep Addrs

192.168.11.12   00:03:33 1000.0010.beef   20 (null)     R      192.168.100.101
Example 1-27: show ip arp suppression-cache detail

Figure 1-10 illustrates the ARP operation with and without ARP suppression as well as with Unknown Unicast Suppression.

No Suppression: All ARP-Requests are flooded towards Mcast group defined for specific VNI and all VTEP switches joined to that group receives the ARP Request message and forwards it out of the ports participating in Broadcast domain defined by VNI Id in VXLAN header.

ARP Suppression: he Local VTEP switch checks if the requested MAC-IP binding information is stored into local ARP Suppression Cache. If the check is hit, switch sends an ARP reply back to the requester without flooding the actual ARP request to the network. If the ARP Suppression Cache check is a miss, then the ARP request is flooded to the network. ARP suppression should be enabled only after initial Intra-VNI reachability testing.

ARP and Unknown Unicast Suppression: Works the same way than ARP-Suppression in case that ARP Suppression check is hit but in case of a miss, the ARP Request is dropped. This option requires that there is no silent host in the VXLAN Fabric.

Figure 1-10: MAC-IP information in ARP table and ARP Suppress Cache.


At this phase, the network is able to work as a transparent Layer 2 switch for hosts participating in L2VNI 10000 and switch frames between the hosts connected to it.

 Figure 1-11: MAC-IP information in ARP table and ARP Suppress Cache.


Host route and Prefix Advertisement: Inter-VNI routing (L3VNI)


First two sections explain how the MAC and MAC-IP information of hosts are propagated over the VXLAN Fabric and how the information is used for Intra-VNI switching and MAC address resolution as well as reducing BUM traffic. This section explains how host routes are imported into L3RIB and how this information is used for Inter-VNI routing. In addition, this section explains the mechanism how MAC address information of silent hosts is resolved by using prefix route advertisement.

Host Route from the Inter-VNI routing perspective


Phase 1. Host Route in Local Routing Information Base (RIB)

Section “MAC-IP Learning Process” describes how the local VTEP switch installs the MAC-IP address binding information into ARP table and how the HMM component installs the information into IP VRF. In addition to this process, HMM component installs the MAC-IP information from the ARP-Table into L3RIB.  

Phase 2. Host Route BGP Process on Local VTEP

Section “MAC-IP Learning Process” also covers the process how the MAC-IP information is sent from the IP VRF to the Loc-RIB through the decision process and from there send to Adj-RIB-Out where it is advertised as a BGP EVPN Route type 2 Update to remote VTEP switches.

Phase 3. Host Route BGP Process on Remote VTEP

The section “MAC-IP Learning Process” did not explain how the MAC-IP routing information ends up into L3RIB of Remote VTEP switch. BGP EVPN Route type 2 Update concerning the MAC-IP NLRI of vmBeef includes also Route Target 65000:10077 (L3VNI). The received NLRI information is sent through the Import Policy Engine (import is based on RT 65000:10077) and Decision process into Loc-RIB as an L3VNI entry. During the Input Policy processing, the original RD 192.168.77.101:32777 is changes to VRF TENANT77 specific RD 192.168.77.102:3 (3 = VRF Id of VRF TENANT77). RD is used for the differentiated overlapping IP address in different VRFs.

Phase 4. Installing Host Route into RIB of Remote VTEP

The route is installed into L3 RIB from the BGP Loc-RIB. The RIB entry includes information about Next Hop address and tunnel id, encapsulation type (VXLAN), segment Id and route source (BGP).  At this phase, both local VTEP switches Leaf-101 and remote VTEP switch Leaf-102 are capable to route traffic to vmBeef (belonging to L2VNI 10000) from the hosts participating in different L2VNI.

Figure 1-12: Host route propagation over VXLAN Fabric.


Monitoring

Phase 1. Host Route in Local Routing Information Base (RIB)

Example 1-28 show the RIB of VRF TENANT77 in local VTEP switch Leaf-101. The route is learned from VLAN 10 and it is installed into RIB by HMM.

Leaf-101# show ip route 192.168.11.12 vrf TENANT77

IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.11.12/32, ubest/mbest: 1/0, attached
    *via 192.168.11.12, Vlan10, [190/0], 03:34:14, hmm

Example 1-28: show ip route 192.168.11.12 vrf TENANT77


Phase 2. Host Route BGP Process on Local VTEP

Example 1-29 shows the BGP Loc-RIB concerning the IP address of vmBeef. This same output has been earlier explained in detail in example 1-20.

Leaf-101# sh bgp l2vpn evpn 192.168.11.12

BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777    (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 16
Paths: (1 available, best #1)
Flags: (0x000102) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path
  AS-Path: NONE, path locally originated
    192.168.100.101 (metric 0) from 0.0.0.0 (192.168.77.101)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007

  Path-id 1 advertised to peers:
    192.168.77.11 
Example 1-29: sh bgp l2vpn evpn 192.168.11.12

Phase 3. Host Route BGP Process on Remote VTEP

Example 1-30 shows the L3 import process in remote Leaf-102. The received message is the same MAC/IP routing advertisement where the MAC-IP information was imported into IP VRF in L2RIB and sent to ARP Suppression Cache. The import into L2RIB is based on RT 65000:10000 while importing route into L3RIB of VRF TENANT77 is based on RT 65000:10077.

Leaf-102# sh bgp internal event-history events | i beef

IMP: [L2VPN EVPN] Created import destination entry for 192.168.77.102:3:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144

IMP: [L2VPN EVPN] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <default> RD 192.168.77.102:3

IMP: [L2VPN EVPN] Created import destination entry for 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144

IMP: [L2VPN EVPN] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <default> RD 192.168.77.102:3

IMP: [IPv4 Unicast] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <TENANT77> RD 192.168.77.102:3

BRIB: [L2VPN EVPN] Installing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 (192.168.77.11) via 192.168.100.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8

Example 1-30: sh bgp internal event-history events | i beef


Example 1-31 explains the BGP Adj-RIB-In and Loc-RIB. The section after the first comment is received NLRI Update in Adj-RIB-In. The section after the second comment is the same update imported through Input Policy Engine and decision process into Loc-RIB. The import is based on the RT 65000:10077. The RD is changed from 192.168.77.101:32777 to 192.168.77.102:3. Example 1-32 shows the VRF Id of VRF TENANT77.

Leaf-102# show bgp l2vpn evpn 192.168.11.12
< Comment-1: BGP Adj-RIB-In >

BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 22
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 3 destination(s)
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

<L2VNI snipped for simplicity>
< Comment-2: BGP Loc-RIB >

Route Distinguisher: 192.168.77.102:3    (L3VNI 10077)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 24
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer


Example 1-31: show bgp l2vpn evpn 192.168.11.12

Leaf-102# show vrf TENANT77

VRF-Name                           VRF-ID State   Reason                       
TENANT77                                3 Up      --          
Example 1-32: show vrf TENANT77

Phase 4. Installing Host Route into RIB of Remote VTEP

Example 1-33 shows the VRF TENANT77 RIB entry concerning the host route 192.168.11.12/32

Leaf-102# show ip route 192.168.11.12 vrf TENANT77

IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.11.12/32, ubest/mbest: 1/0
    *via 192.168.100.101%default, [200/0], 04:20:01, bgp-65000, internal, tag 65000 (evpn) segid: 10077 tunnelid: 0xc0a86465 encap: VXLAN

Example 1-33: show vrf TENANT77

Example 1-34 shows the BGP Recursive Next Hop database information concerning the Next Hop attached to 192.168.11.12
Leaf-102# show nve internal bgp rnh database vni 10077

--------------------------------------------
Total peer-vni msgs recvd from bgp: 10
Peer add requests: 6
Peer update requests: 0
Peer delete requests: 4
Peer add/update requests: 6
Peer add ignored (peer exists): 0
Peer update ignored (invalid opc): 0
Peer delete ignored (invalid opc): 0
Peer add/update ignored (malloc error): 0
Peer add/update ignored (vni not cp): 0
Peer delete ignored (vni not cp): 0
--------------------------------------------
Showing BGP RNH Database, size : 2 vni 10077

Flag codes: 0 - ISSU Done/ISSU N/A        1 - ADD_ISSU_PENDING        
            2 - DEL_ISSU_PENDING          3 - UPD_ISSU_PENDING
       

VNI    Peer-IP            Peer-MAC            Tunnel-ID  Encap     (A/S)  Flags
10077  192.168.100.101    5e00.0000.0007      0xc0a86465 vxlan     (1/0)    0
Example 1-34: show nve internal bgp rnh database vni 10077

Example 1-35 shows the status of the connection to NVE peer 192.168.77.101 (Leaf-101).
Leaf-102# show nve peers detail

Details of nve Peers:
----------------------------------------
Peer-Ip: 192.168.100.101
    NVE Interface       : nve1
    Peer State          : Up
    Peer Uptime         : 04:28:50
    Router-Mac          : 5e00.0000.0007
    Peer First VNI      : 10000
    Time since Create   : 04:28:50
    Configured VNIs     : 10000,10077,20000,30000
    Provision State     : peer-add-complete
    Learnt CP VNIs      : 10000,10077
    vni assignment mode : SYMMETRIC
    Peer Location       : N/A
Example 1-35: show nve peers detail


Data Plane operation
Figure 1-13 shows the Data Plane operation when vmBebe in L2VNI 30000 sends ICMP Request to vmBeef in L2VNI 10000.
Phase 1. Switching in VNI30000 on VTEP-102
Because the destination IP address is in a different subnet, vmBebe send an ICMP request message to its default gateway Leaf-102 using Anycast Gateway MAC (AGM) 0001.0001.0001 as a destination MAC address.

Phase 2. Routing from VNI30000 to VNI 10077 on VTEP-102
Local VTEP switch Leaf-102 receives the frame. The destination IP address is learned via BGP and installed into RIB with Next Hop IP address 192.168.100.101 (Leaf-101) and additional information used in Data Plane, such as L3VNI and Encapsulation type. Leaf-102 makes the recursive routing lookup for Next Hop address, encapsulates original packet with VXLAN header with VN Id 10077 (L3VNI), and routes packet towards Leaf-101 via Spine-11 (outer destination MAC belongs to Spine-11). Because VXLAN is a MAC in IP/UDP tunneling mechanism, there has to be the inner source and destination MAC address. The inner source MAC address is taken from the SVI used in Inter-VNI routing, in our case SVI VLAN 77. The inner destination address is RMAC received via BGP Update as BGP Extended Community.

Phase 3. Routing from VNI10077 to VNI 10000 on VTEP-101
When the VTEP switch Leaf-101 receives the VXLAN encapsulated packet, it removes the outer headers used in VXLAN tunneling. Since the VNI 10077 is attached to VRF TENANT77, the routing decision is based on RIB of VRF TENANT77. Leaf-101 routes the original ICMP request to VLAN 10 and switched out of the interface e1/2 with an additional 802.1Q Tag with VLAN Id 10.

This process describes the Symmetric Integrated Route and Bridge (IRB) model where the packet is first switched by the local VTEP, which then routes it over the VXLAN fabric by using common VNI for all VRF routed traffic in VXLAN header. The receiving VTEP switch removes VXLAN encapsulation and makes the routing decision based on the target IP address of the original IP packet. After routing decision, the packet is switched to the destination (bridge-route-route-bridge). The return traffic follows the same model.
Using symmetric IBR gives design flexibility since unlike in Asymmetric IRB, there is no need for adding all VNIs to all VTEP switches. Asymmetric IRB is based on a bridge-route-bridge model where there is no dedicated VNI for Inter-VNI routing. As an example: If we are using Asymmetric IRB in our VXLAN fabric, the vmBebe sends the packet to its default gateway (switched), just like in case of symmetric IRB. Local VTEP switch Leaf-102 makes routing decision but instead of using common VNI, it uses the VNI 10000 in VXLAN header, which is attached to VLAN 20 (Local VLAN for VNI 10000). This is the “routed” part. Receiving VTEP switch Leaf-101 removes the VXLAN header and based on the VLAN 10000 it switches the packet out of VLAN 10 (locally attached to VLAN 10).

Figure 1-13: Inter-VNI routing process.



Capture 1-12 is taken from the link between Spine-11 and Leaf-101 while pinging from vmBebe to vmBeef.
Ethernet II, Src: 5e:00:00:04:00:07 (5e:00:00:04:00:07), Dst: 5e:00:00:01:00:07 (5e:00:00:01:00:07)
Internet Protocol Version 4, Src: 192.168.100.102, Dst: 192.168.100.101
User Datagram Protocol, Src Port: 63384, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 10077
    Reserved: 0
Ethernet II, Src: 5e:00:00:04:00:07 (5e:00:00:04:00:07), Dst: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Internet Protocol Version 4, Src: 192.168.30.30, Dst: 192.168.11.12
Internet Control Message Protocol
Capture 1-12 ICMP request captured from the link between the Leaf-101 and Spine-11.

Summary
This section explains how the IP address of hosts are propagated across the VXLAN fabric and how those are installed into L3RIB.

Prefix Advertisement

Prefix advertisement is a simple process but why it is needed if all VTEP switches know MAC addresses and IP addresses of all connected hosts? One reason is, of course, the connectivity with VXLAN Fabric external networks. The other reason is related to the connectivity inside VXLAN Fabric, there might be silent hosts, which does not generate any traffic without request. In some cases, this might lead to a situation where hosts in one L2VNI does not have connectivity with to silent host in other L2VNI. 

The first example shows the processes when vmBeef in VNI 10000 connected to Leaf-101 pings the silent host vmBebe in VNI 30000 connected to Leaf-102. In this example, both VTEP switches have VNI 30000. IP prefix redistribution in this example is not needed. Figures 1-14 and 1-15 illustrate the whole process.

Phase 1: vmBeef start pinging to vmBebe

At this stage, vmBeef has resolved the MAC address of its default gateway. It sends the ICMP request towards 192.168.30.30. Since the destination vmBebe is in a different subnet than sender vmBeef, vmBeef sends the ICMP request to the default gateway. There is no response to first ICMP request.


Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.30.30
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0
    Checksum: 0x574b [correct]
    [Checksum Status: Good]
    Identifier (BE): 0 (0x0000)
    Identifier (LE): 0 (0x0000)
    Sequence number (BE): 0 (0x0000)
    Sequence number (LE): 0 (0x0000)
    [No response seen]
    Data (72 bytes
Capture 1-13: ICMP request captured from the link between the Leaf-101 vmBeef.

Phase 2: Local VTEP Leaf-101: ARP process

Because VTEP switch Leaf-101 has both VNI 10000 and 30000 configured locally. Even though there is no host route to vmBebe in the RIB, there is a routing entry for the local subnet 192.168.30.0/24 (VLAN 30 attached to VNI 30000) and the packet is routed from VNI 10000 to VNI 30000. After routing, Leaf-101 tries to figure out the MAC-IP binding information and it sends an ARP request to Mcast group used in VNI 30000. Example 1-36 shows the routing table of Leaf-101 and Capture 1-13 shows the ARP request message capture taken from the link between Leaf-101 and Spine-11.






Leaf-101# show ip route vrf TENANT77

IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.11.0/24, ubest/mbest: 1/0, attached
    *via 192.168.11.1, Vlan10, [0/0], 01:09:38, direct, tag 77
192.168.11.1/32, ubest/mbest: 1/0, attached
    *via 192.168.11.1, Vlan10, [0/0], 01:09:38, local, tag 77
192.168.11.22/32, ubest/mbest: 1/0
    *via 192.168.100.102%default, [200/0], 00:45:30, bgp-65000, internal, tag 65
000 (evpn) segid: 10077 tunnelid: 0xc0a86466 encap: VXLAN

192.168.30.0/24, ubest/mbest: 1/0, attached
    *via 192.168.30.1, Vlan30, [0/0], 00:02:36, direct
192.168.30.1/32, ubest/mbest: 1/0, attached
    *via 192.168.30.1, Vlan30, [0/0], 00:02:36, local
Example 1-36: show ip route vrf TENANT77

The ARP process is explained in ARP request/reply section (page 14.). Because this is switched packet inside L2VNI 30000 the source MAC address of the inner Ethernet header is an Anycast Gateway MAC (AGM) address of VLAN 30, which used commonly in every host SVI (not in SVI 77 which is used for routing). By using AGM, hosts do not how to resolve the MAC address of the gateway when moving from one VTEP to another. Destination MAC address is derived from the Mcast Group IP address.


Ethernet II, Src: 5e:00:00:00:00:07 (5e:00:00:00:00:07), Dst: IPv4mcast_0a (01:00:5e:00:00:0a)
Internet Protocol Version 4, Src: 192.168.100.101, Dst: 238.0.0.10
User Datagram Protocol, Src Port: 57522, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 30000
    Reserved: 0
Ethernet II, Src: EquipTra_01:00:01 (00:01:00:01:00:01), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: EquipTra_01:00:01 (00:01:00:01:00:01)
    Sender IP address: 192.168.30.1
    Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff)
    Target IP address: 192.168.30.30
Capture 1-14: ICMP request captured from the link between the Leaf-101 and Spine-11.



Phase 3: Remote VTEP Leaf-102: ARP process - Request

The remote VTEP switch Leaf-102 receives the ARP request. Based on the VNI 30000 in VXLAN header it knows that this packet belongs to VLAN 30. It removes the VXLAN encapsulation and forwards the ARP request out of all interfaces participating in VLAN 30. Leaf-102 insert 802.1Q TAG with VLAN id 30 to frame sent it out of interface e1/2.

Ethernet II, Src: EquipTra_01:00:01 (00:01:00:01:00:01), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0001 1110 = ID: 30
    Type: ARP (0x0806)
    Padding: 0000000000000000000000000000
    Trailer: 00000000
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: EquipTra_01:00:01 (00:01:00:01:00:01)
    Sender IP address: 192.168.30.1
    Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff)
    Target IP address: 192.168.30.30
Capture 1-15 ARP request send to vmBebe

Phase 4: vmBebe: ARP process - Reply

The ARP request reaches the vmBebe and since the ARP request target IP belongs to it, vmBebe reacts by sending an ARP reply. The source MAC address in received ARP request is AGM, which is also used by Leaf-102. When vmBebe send the ARP reply Unicast message by using MAC 0001.0001.0001 (AGW) as a destination, the message stops to Leaf-102. This means that Leaf-102 never forwards the ARP response message Leaf-101.

Ethernet II, Src: 30:00:00:30:be:be (30:00:00:30:be:be), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0001 1110 = ID: 30
    Type: ARP (0x0806)
    Padding: 0000000000000000000000000000
    Trailer: 00000000
Address Resolution Protocol (reply)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: reply (2)
    Sender MAC address: 30:00:00:30:be:be (30:00:00:30:be:be)
    Sender IP address: 192.168.30.30
    Target MAC address: EquipTra_01:00:01 (00:01:00:01:00:01)
    Target IP address: 192.168.30.1
Capture 1-16 ARP request send to vmBebe

Phase 5: remote VTEP switch Leaf-102: BGP Update

When the remote VTEP switch Leaf-102 receives the ARP reply, it learns the MAC-IP information of vmBebe from the ARP payload and generates two BGP EVPN route type 2 MAC advertisement route, where the other carries MAC address and the other one MAC-IP address information of vmBebe.


Ethernet II, Src: 5e:00:00:01:00:07 (5e:00:00:01:00:07), Dst: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Internet Protocol Version 4, Src: 192.168.77.11, Dst: 192.168.77.101
Transmission Control Protocol, Src Port: 179, Dst Port: 54583, Seq: 1, Ack: 232, Len: 141
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 141
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 118
    Path attributes
        Path Attribute - ORIGIN: IGP
        Path Attribute - AS_PATH: empty
        Path Attribute - LOCAL_PREF: 100
        Path Attribute - EXTENDED_COMMUNITIES
            Flags: 0xc0, Optional, Transitive, Complete
            Type Code: EXTENDED_COMMUNITIES (16)
            Length: 32
            Carried extended communities: (4 communities)
                Route Target: 65000:10077
                Route Target: 65000:30000
                Encapsulation: VXLAN Encapsulation
                Unknown subtype 0x03: 0x5e00 0x0004 0x0007
        Path Attribute - ORIGINATOR_ID: 192.168.77.102
        Path Attribute - CLUSTER_LIST: 192.168.77.111
        Path Attribute - MP_REACH_NLRI
            Type Code: MP_REACH_NLRI (14)
            Length: 51
            Address family identifier (AFI): Layer-2 VPN (25)
            Subsequent address family identifier (SAFI): EVPN (70)
            Next hop network address (4 bytes)
            Number of Subnetwork points of attachment (SNPA): 0
            Network layer reachability information (42 bytes)
                EVPN NLRI: MAC Advertisement Route
                    Route Type: MAC Advertisement Route (2)
                    Length: 40
                    Route Distinguisher: 0001c0a84d66801d      (192.168.77.102:32797)
                    ESI: 00 00 00 00 00 00 00 00 00
                    Ethernet Tag ID: 0
                    MAC Address Length: 48
                    MAC Address: 30:00:00:30:be:be (30:00:00:30:be:be)
                    IP Address Length: 32
                    IPv4 address: 192.168.30.30
                    MPLS Label Stack 1: 1875, (BOGUS: Bottom of Stack NOT set!)
                    MPLS Label Stack 2: 629 (bottom)
Capture 1-17 ARP request send to vmBebe

Phase 6: Local VTEP switch Leaf-102: BGP Update

Local VTEP switch Leaf-101 receives the BGP EVPN Updates and installs the routing information into MAC  and IP VRF tables in L2RIB of VNI 30000. This is explained in section “MAC/IP address learning process”. Right after the L2RIB updates, Leaf-101 is able to route packet sent by vmBeef to vmBebe.

Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.30.30
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0
    Checksum: 0x574b [correct]
    [Checksum Status: Good]
    Identifier (BE): 0 (0x0000)
    Identifier (LE): 0 (0x0000)
    Sequence number (BE): 0 (0x0000)
    Sequence number (LE): 0 (0x0000)
    [No response seen]
    Data (72 bytes
Capture 1-18: ICMP request captured from the link between the Leaf-101 and Spine-11.

Figure 1-14: Silent host discovery process, Phases 1-3



Figure 1-15: Silent host discovery process, Phases 4-6



What if all VNIs are not implemented in each VTEP switch. In the scenario where the VTEP switch Leaf-101 has only VNI 10000, it does not have any L2/L3 address information about silent host vmBeef, which means that Leaf-101 is not able to switch or route the packet to any hosts in network 192.168.30.0/24. The resolution for this is prefix advertisement in Leaf-102.
At starting point, VTEP switch Leaf-102 redistributes the local network 192.168.30.0/24 to BGP via route-map. The update is sent as BGP EVPN route type 5. Example 1-37 shows the BGP RIB (Both Adj-RIB-In and Loc-RIB) of Leaf-101concerning the NLRI for 192.168.30.0/24. BGP EVPN Route Type 5 update carries only RT 65000:10077 and it is used for importing routes into Loc-RIB from Adj-RIB of VRF TENANT77. Received Label field defines the L3VNI. The original RD carried in NLRI is generated based on BGP RID and VRF Id.

Leaf-101# show bgp l2vpn evpn 192.168.30.0

BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:3
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224, version 505
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 2 destination(s)
  AS-Path: NONE, path sourced internal to AS
    192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin incomplete, MED 0, localpref 100, weight 0
      Received label 10077
      Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0004.0007
      Originator: 192.168.77.102 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

Route Distinguisher: 192.168.77.101:3    (L3VNI 10077)
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224, version 506
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 192.168.77.102:3:[5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224
  AS-Path: NONE, path sourced internal to AS
    192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin incomplete, MED 0, localpref 100, weight 0
      Received label 10077
      Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0004.0007
      Originator: 192.168.77.102 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

Example 1-37: show bgp l2vpn evpn 192.168.30.0

Capture 1-19 shows the BGP EVPN Prefix Advertisement (route type 5). Note that Extended Community Unknown Subtype 0x03 defines the RMAC.

Ethernet II, Src: 5e:00:00:01:00:07, Dst: 5e:00:00:00:00:07
Internet Protocol Version 4, Src: 192.168.77.11, Dst: 192.168.77.101
Transmission Control Protocol, Src Port: 179, Dst Port: 54583, Seq: 294, Ack: 246, Len: 134
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 134
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 111
    Path attributes
        Path Attribute - ORIGIN: INCOMPLETE
        Path Attribute - AS_PATH: empty
        Path Attribute - MULTI_EXIT_DISC: 0 0
        Path Attribute - LOCAL_PREF: 100
        Path Attribute - EXTENDED_COMMUNITIES
            Flags: 0xc0, Optional, Transitive, Complete
            Type Code: EXTENDED_COMMUNITIES (16)
            Length: 24
            Carried extended communities: (3 communities)
                Route Target: 65000:10077
                Encapsulation: VXLAN
                Unknown subtype 0x03: 0x5e00 0x0004 0x0007
        Path Attribute - ORIGINATOR_ID: 192.168.77.102
        Path Attribute - CLUSTER_LIST: 192.168.77.111
        Path Attribute - MP_REACH_NLRI
            Flags: 0x90, Optional, Extended-Length, Non-transitive, Complete
            Type Code: MP_REACH_NLRI (14)
            Length: 45
            Address family identifier (AFI): Layer-2 VPN (25)
            Subsequent address family identifier (SAFI): EVPN (70)
            Next hop network address (4 bytes)
            Number of Subnetwork points of attachment (SNPA): 0
            Network layer reachability information (36 bytes)
                EVPN NLRI: IP Prefix route
                    Route Type: IP Prefix route (5)
                    Length: 34
                    Route Distinguisher: 192.168.77.102:3
                    ESI: 00 00 00 00 00 00 00 00 00
                    Ethernet Tag ID: 0
                    IP prefix length: 24
                    IPv4 address: 192.168.30.0
                    IPv4 Gateway address: 0.0.0.0
                    MPLS Label Stack: 629 (bottom)
Capture 1-19: ICMP request captured from the link between the Leaf-101 and Spine-11.


Leaf-101 verifies the reachability of Next Hop reported in MP_NLRI_REACH. Leaf-101 has an entry for reported NH in its BGP RNH DB and it installs route into RIB from the BGP Loc-RIB (example 1-38). Example 1-34 shows the example of BGP RNH output.

Leaf-101# show ip route 192.168.30.0 vrf TENANT77

IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.30.0/24, ubest/mbest: 1/0
    *via 192.168.100.102%default, [200/0], 00:10:27, bgp-65000, internal, tag 65000 (evpn) segid: 10077 tunnelid: 0xc0a86466 encap: VXLAN
Example 1-38: show ip route 192.168.30.0 vrf TENANT77

Figure 1-16: BGP EVPN Route type 5 – Prefix advertisement.


Data Plane testing

Phase 1: vmBeef start pinging to vmBebe

At this stage, vmBeef has resolved the MAC address of its default gateway. It sends an ICMP request to 192.168.30.30. Since the destination is in a different subnet than vmBeef, it sends the packet to its default gateway.

Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.30.30
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0
    Checksum: 0x574b [correct]
    [Checksum Status: Good]
    Identifier (BE): 0 (0x0000)
    Identifier (LE): 0 (0x0000)
    Sequence number (BE): 0 (0x0000)
    Sequence number (LE): 0 (0x0000)
    [No response seen]
    Data (72 bytes
Capture 1-19: ICMP request sent by vmBeef: capture from the link vmBeef-Leaf-101.

Phase 2: Local VTEP Leaf-101: Routing

VTEP switch Leaf-101 receives the ICMP packet from vmBeef with the destination IP address 192.168.30.30. In the previous example, Leaf-101 has both VNI 10000 (subnet 192.168.11.0/24) and VNI 30000 (192.168.30.0/24) implemented. That is why Leaf-101 started the address resolution process by sending ARP to Mcast Group specific to VNI 30000. In this scenario, there is no VNI 30000 implemented in Leaf-101. Instead of ARP process, Leaf-101 now routes the packet based on the longest match 192.168.30.0/24 found in its RIB. It routes packet towards the next hop address 192.168.100.102 (Leaf-102). The real next hop is resolved through the recursive route lookup. Leaf-101 encapsulates the ICMP request with VXLAN header with L3VNI Id 10077. Capture 1-20 shows VXLAN encapsulated packet taken from the link between Leaf-101 and Spine-11.

Ethernet II, Src: 5e:00:00:00:00:07 (5e:00:00:00:00:07), Dst: 5e:00:00:01:00:07 (5e:00:00:01:00:07)
Internet Protocol Version 4, Src: 192.168.100.101, Dst: 192.168.100.102
User Datagram Protocol, Src Port: 58173, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 10077
    Reserved: 0
Ethernet II, Src: 5e:00:00:00:00:07 (5e:00:00:00:00:07), Dst: 5e:00:00:04:00:07 (5e:00:00:04:00:07)
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.30.30
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0
    Checksum: 0x2861 [correct]
    [Checksum Status: Good]
    Identifier (BE): 5 (0x0005)
    Identifier (LE): 1280 (0x0500)
    Sequence number (BE): 0 (0x0000)
    Sequence number (LE): 0 (0x0000)
    [No response seen]
    Data (72 bytes)
Capture 1-20: ICMP request captured from the link between the Leaf-101 and Spine-11.

Phase 3-4: Remote VTEP Leaf-102: ARP request

Remote VTEP switch Leaf-102 receives the ICMP request. Based on VNI 10077 in VXLAN header, it knows that this packet belongs to VRF TENANT and has to be routed based on its RIB. It removes the VXLAN header and does routing lookup. The packet is routed based on the longest prefix match 192.168.30.0/24 (local VLAN 30). Because Leaf-102 does not have MAC-IP binding information for IP 192.168.30.30, it proceeds with ARP request that it sent out to VLAN 30 (attached to network 192.168.30.0/24). Capture 1-21 is from trunk link between Leaf-102 and vmBebe.

Ethernet II, Src: EquipTra_01:00:01 (00:01:00:01:00:01), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0001 1110 = ID: 30
    Type: ARP (0x0806)
    Padding: 0000000000000000000000000000
    Trailer: 00000000
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: request (1)
    Sender MAC address: EquipTra_01:00:01 (00:01:00:01:00:01)
    Sender IP address: 192.168.30.1
    Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff)
    Target IP address: 192.168.30.30
Capture 1-21: ARP request captured from the trunk link vmBebe and Leaf-101.


Phase 5: vmBebe: ARP Reply

VmBebe receives the ARP request and responds to it by sending ARP reply message as a unicast to VTEP switch Leaf-102.

Ethernet II, Src: 30:00:00:30:be:be (30:00:00:30:be:be), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
    000. .... .... .... = Priority: Best Effort (default) (0)
    ...0 .... .... .... = DEI: Ineligible
    .... 0000 0001 1110 = ID: 30
    Type: ARP (0x0806)
    Padding: 0000000000000000000000000000
    Trailer: 00000000
Address Resolution Protocol (reply)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: reply (2)
    Sender MAC address: 30:00:00:30:be:be (30:00:00:30:be:be)
    Sender IP address: 192.168.30.30
    Target MAC address: EquipTra_01:00:01 (00:01:00:01:00:01)
    Target IP address: 192.168.30.1
Capture 1-22: ARP reply captured from the link vmBebe and Leaf-101.

Phase 6: Remote VTEP Leaf-102: ICMP Request forwarding

Now Leaf-102 is able to forward the ICMP request to vmBebe

Ethernet II, Src: 5e:00:00:04:00:07 (5e:00:00:04:00:07), Dst: 30:00:00:30:be:be (30:00:00:30:be:be)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.30.30
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
Capture 1-23: ICMP request captured from the link between the Leaf-101 and Spine-11.


Phase 7: vmBebe: ICMP reply

VmBebe receives the ICMP Request and sends an ICMP reply back to vmBeef.

Ethernet II, Src: 30:00:00:30:be:be (30:00:00:30:be:be), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
Internet Protocol Version 4, Src: 192.168.30.30, Dst: 192.168.11.12
Internet Control Message Protocol
    Type: 0 (Echo (ping) reply)
Capture 1-24: ICMP request captured from the link between the Leaf-101 and Spine-11.

Phase 8-9: Remote VTEP Leaf-102: Routing decision and ICMP reply

The ICMP reply is sent to Leaf-101 by Leaf-102 over VNI 10077.

Ethernet II, Src: 5e:00:00:01:00:07 (5e:00:00:01:00:07), Dst: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Internet Protocol Version 4, Src: 192.168.100.102, Dst: 192.168.100.101
User Datagram Protocol, Src Port: 60112, Dst Port: 4789
Virtual eXtensible Local Area Network
    Flags: 0x0800, VXLAN Network ID (VNI)
    Group Policy ID: 0
    VXLAN Network Identifier (VNI): 10077
    Reserved: 0
Ethernet II, Src: 5e:00:00:04:00:07 (5e:00:00:04:00:07), Dst: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Internet Protocol Version 4, Src: 192.168.30.30, Dst: 192.168.11.12
Internet Control Message Protocol
    Type: 0 (Echo (ping) reply)
Capture 1-25: ICMP Reply captured from the link between the Leaf-101 and Spine-11.

Phase 10-11: Local VTEP Leaf-101: Routing decision and ICMP reply

VTEP switch Leaf-101 receives the ICMP reply packet. It removes the VXLAN encapsulation. Based on VNI 10077 it knows that packet belongs to VRF TENANT77 and route lookup has to be dome based on VRF TENANT77 RIB. The destination IP address 192.168.11.12 belongs to VLAN 10. Leaf-101 has the MAC-IP binding information for 192.168.11.12, so it switches the packet out of the interface e1/2.

Ethernet II, Src: 5e:00:00:00:00:07 (5e:00:00:00:00:07), Dst: Private_10:be:ef (10:00:00:10:be:ef)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
Internet Protocol Version 4, Src: 192.168.30.30, Dst: 192.168.11.12
Internet Control Message Protocol
    Type: 0 (Echo (ping) reply)
Capture 1-25: ICMP request captured from the link between the Leaf-101 and Spine-11.


Figure 1-17: Silent host discovery process, Phases 1-4.


Figure 1-18: Silent host discovery process, Phases 5-11.


Just like in the previous example where Leaf-101 has both VNIs 10000 and 30000 implemented locally, we are using Symmetric IRB model in this scenario. The packet is switched in local VLAN 10, and then it is routed over the VXLAN Fabric with VNI 10077 (L3VNI). In remote VTEP switch Leaf-102, the packet is first routed based on RIB of VRF TENANT77 and then switched in local VLAN 30.
During the process, Leaf-102 learns the MAC-IP information of vmBebe. This information is advertised to VTEP switch Leaf-101 which in turns install the routing information in its BGP RIB.

Example 1-39 show the BGP entries stored Adj-RIB-In. Entries concerning host route 192.168.30.30/32 and subnet 192.168.30.0/24 with RD 192.168.77.101:3 are routes that are actually imported into BGP Loc-RIB of Leaf-101.

Leaf-101# sh bgp l2vpn evpn

BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 56, Local Router ID is 192.168.77.101
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 192.168.77.101:32777    (L2VNI 10000)
*>l[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216
                      192.168.100.101                   100      32768 i
*>i[2]:[0]:[0]:[48]:[1000.0020.abba]:[0]:[0.0.0.0]/216
                      192.168.100.102                   100          0 i
*>l[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272
                      192.168.100.101                   100      32768 i

Route Distinguisher: 192.168.77.102:3
*>i[5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224
                      192.168.100.102          0        100          0 ?

Route Distinguisher: 192.168.77.102:32787
*>i[2]:[0]:[0]:[48]:[1000.0020.abba]:[0]:[0.0.0.0]/216
                      192.168.100.102                   100          0 i

Route Distinguisher: 192.168.77.102:32797
*>i[2]:[0]:[0]:[48]:[3000.0030.bebe]:[0]:[0.0.0.0]/216
                      192.168.100.102                   100          0 i
*>i[2]:[0]:[0]:[48]:[3000.0030.bebe]:[32]:[192.168.30.30]/272
                      192.168.100.102                   100          0 i

Route Distinguisher: 192.168.77.101:3    (L3VNI 10077)
*>i[2]:[0]:[0]:[48]:[3000.0030.bebe]:[32]:[192.168.30.30]/272
                      192.168.100.102                   100          0 i
*>i[5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224
                      192.168.100.102          0        100          0 ?0        100          0 ?      
Example 1-39: sh bgp l2vpn evpn

Example 1-40 shows that host route 192.168.30.30 is installed from the BGP Adj-RIB-In to Loc-RIB based on RT 65000:10077. During the process, the Input Policy engine changes the RD 192.168.77.102:32797 (L2VNI) to 192.168.77.101:3 (3 = VRF Id of VRF TENANT77).


Leaf-101# sh bgp l2vpn evpn 192.168.30.30

BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:32797
BGP routing table entry for [2]:[0]:[0]:[48]:[3000.0030.bebe]:[32]:[192.168.30.30]/272, version 65
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 2 destination(s)
  AS-Path: NONE, path sourced internal to AS
    192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 30000 10077
      Extcommunity: RT:65000:10077 RT:65000:30000 ENCAP:8 Router MAC:5e00.0004.0007
      Originator: 192.168.77.102 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

Route Distinguisher: 192.168.77.101:3    (L3VNI 10077)
BGP routing table entry for [2]:[0]:[0]:[48]:[3000.0030.bebe]:[32]:[192.168.30.30]/272, version 46
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 192.168.77.102:32797:[2]:[0]:[0]:[48]:[3000.0030.bebe]:[32]:[192.168.30.30]/272
  AS-Path: NONE, path sourced internal to AS
    192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 30000 10077
      Extcommunity: RT:65000:10077 RT:65000:30000 ENCAP:8 Router MAC:5e00.0004.0007
      Originator: 192.168.77.102 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer
Example 1-40: sh bgp l2vpn evpn 192.168.30.30

Also, the BGP EVPN route type 5 (Prefix Route) is installed from the BGP Adj-RIB-In into Loc-RIB.

Leaf-101# sh bgp l2vpn evpn 192.168.30.0

BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:3
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224, version 63
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 2 destination(s)
  AS-Path: NONE, path sourced internal to AS
    192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin incomplete, MED 0, localpref 100, weight 0
      Received label 10077
      Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0004.0007
      Originator: 192.168.77.102 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

Route Distinguisher: 192.168.77.101:3    (L3VNI 10077)
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224, version 5
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 192.168.77.102:3:[5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224
  AS-Path: NONE, path sourced internal to AS
    192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin incomplete, MED 0, localpref 100, weight 0
      Received label 10077
      Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0004.0007
      Originator: 192.168.77.102 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

Example 1-41: sh bgp l2vpn evpn 192.168.30.0

Example 1-42 both host route 192.168.30.30/32 and prefix route 192.168.30.0/24 are installed from the BGP Loc-RIB into VRF TENANT77 specific L3RIB.

Leaf-101# show ip route vrf TENANT77

IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>

192.168.11.0/24, ubest/mbest: 1/0, attached
    *via 192.168.11.1, Vlan10, [0/0], 01:05:03, direct, tag 77
192.168.11.1/32, ubest/mbest: 1/0, attached
    *via 192.168.11.1, Vlan10, [0/0], 01:05:03, local, tag 77
192.168.11.12/32, ubest/mbest: 1/0, attached
    *via 192.168.11.12, Vlan10, [190/0], 00:17:15, hmm
192.168.30.0/24, ubest/mbest: 1/0
    *via 192.168.100.102%default, [200/0], 01:02:54, bgp-65000, internal, tag 65000 (evpn) segid: 10077 tunnelid: 0xc0a86466 encap: VXLAN

192.168.30.30/32, ubest/mbest: 1/0
    *via 192.168.100.102%default, [200/0], 00:17:10, bgp-65000, internal, tag 65000 (evpn) segid: 10077 tunnelid: 0xc0a86466 encap: VXLAN
Example 1-42: show ip route vrf TENANT77


Summary

This chapter describes the BGP EVPN Control and Data Plane Layer 2 (switching) and Layer 3 (Routing) operation. It also explains the various components used in BGP EVPN VXLAN Fabric (such as L2RIB, MAC table, MAC VRF, IP VRF, L3RIB, ARP table, ARP Suppression Cache, BGP Adj-RIB-IN, Loc-RIB, Adj-RIB-Out) as well as interoperability between the different components.

References

draft-ietf-bess-evpn-inter-subnet-forwarding-05  - Integrated Routing and Bridging in EVPN: https://tools.ietf.org/html/draft-ietf-bess-evpn-inter-subnet-forwarding-05
RFC 4721 - A Border Gateway Protocol 4 (BGP-4): https://tools.ietf.org/html/rfc4271
RFC 4760 - Multiprotocol Extensions for BGP-4: https://tools.ietf.org/html/rfc4760
RFC 7432 - BGP MPLS-Based Ethernet VPN: https://tools.ietf.org/html/rfc7432
Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective: ISBN-10: 1-58714-467-0 – Krattiger Lukas, Shyam Kapadia, and Jansen Davis

4 comments:

  1. hi Toni,
    please allow to ask for Prefix Advertisement part, you actually discussed two scenarios,
    1.two VTEPs and each VTEP has all VNI
    2.two VTEPs and they has different VNI
    for the first part the packet goes vlan10-----vlan30---------------vlan30
    for the second part the packet goes like this vlan10-----vlan77-------vlan77---vlan30

    I remember vlan 77 is created for routing only and does not have an IP.
    if our Vxlan network has all VTEPS and each of VTEPs has all VNIs, then, I believe there is no need to configure Vlan 77, am I correct?

    All the Best
    Michael

    ReplyDelete
    Replies
    1. Hi Michael,
      In theory, you do not need a separate ”routing vlan” if all VLANs are implemented in every VTEP. In reality, this is probably not the case because there are also external connections and service segments of which L3 interface are implemented in service/external leaf.
      Cheers - Toni

      Delete
  2. Why do you show the IP-VRF in the L2RIB? The MAC-VRF has both the MAC-only and MAC-IP Type 2 routes.

    ReplyDelete

  3. Thanks for sharing this valuable resource with us. I'm sure it will be a valuable asset for many people.Also, have a look on these CISCO products:

    WS-C3650-24TS-L
    WS-C3560-24TS-E
    WS-C3560CX-8PC-S

    ReplyDelete