Document Status: Unfinished
Edited: Monday, 7 January 2019
This chapter covers the following topics:
Edited: Monday, 7 January 2019
This chapter covers the following topics:
MAC address learning process (Intra-VNI switching): This section describes how the local VTEP switch learns the MAC addresses of its’ directly connected hosts from the ingress frame and installs the information into the MAC VRF in Layer 2 Routing Information Base (L2RIB) by the L2 forwarding component (L2FWDER). This section also shows how the local VTEP switch advertises the MAC address information to the remote VTEP switch by using BGP EVPN Route Type 2 advertisement (MAC Advertisement Route) and how the Remote VTEP switch installs information into MAC VRF in L2RIB and from there into MAC address table. Intra-L2VNI (Switching) Data Plane operation is explained at the end of the section with various frame capture examples. The white “MAC line” represents these processes in figure 7-1.
MAC-IP address learning process (ARP for Intra-VNI switching and ): This section gives a detailed description how the local VTEP switch learns the IP addresses of its’ locally connected hosts from ARP messages generated by the host and how the Host Mobility Manager component (HMM) installs the information into the IP VRF. This section also shows how the local VTEP switch advertises the IP address information to the remote VTEP switch by using BGP EVPN Route Type 2 (MAC Advertisement Route) advertisement and how the remote VTEP switch installs this information into IP VRF in L2RIB as well as into L3RIB of VRF TENANT77. In addition, this section explains how the ARP Suppression mechanism use MAC-IP binding information to reduce BUM (Broadcast, Unknown Unicast, and Multicast) traffic in VXLAN Fabric. The grey “IP line” represents these processes in figure 7-1.
Prefix advertisement: This section covers how the local VTEP switch redistributes its Anycast Gateway (AGW) subnets into BGP and advertises this information to the remote VTEP switch by using BGP EVPN Route Type 5 (IP Prefix Route) advertisement. This section also explains how the information is used to discover silent hosts. This section also describes how the remote VTEP installs the route from the BGP into local L3RIB. The black “Prefix line” represents these processes in figure 7-1.
|
MAC Address Learning Process (Intra-VNI Switching)
Overview
Phase 1: MAC Address Table on Local VTEP
Virtual Machine Beef comes up. It expresses its’ existence to a network and validates the uniqueness of its IP-address by sending a Gratuitous ARP (GARP). VTEP switch Leaf-101 receives the GARP message from interface e1/2 and stores the MAC address information from the Source MAC address field of Ethernet header into MAC address table of VLAN 10.
Phase 2: MAC VRF on Local VTEP
The L2FWDER component notices the new MAC address from the interface e1/2. L2FWDER then installs the MAC address into MAC VRF (also called EVI instance) located in L2 Routing Information Base (L2RIB) of VRF TENANT77. MAC VRF in L2RIB contains the MAC address and source port information as well as information about topology id (=VLAN Id). Flag field of the learned entry is marked with Local Flag (locally learned MAC address).
Why do we have two almost similar L2 Databases in VTEP switches (MAC table vs. MAC VRF)? Routes can be sent to BGP only if the route is in the RIB. In addition, routes from BGP can be installed into RIB but not directly into MAC address table.
Phase 3: BGP MAC Route Export on Local VTEP
VTEP switch Leaf-101 exports the MAC route from the L2RIB into BGP Loc-RIB, from where it is sent through the Output Policy Engine to Adj-RIB-Out (Pre). From the Adj-RIB-Out (Pre) route is installed through the policy into Adj-RIB-Out (Post) with the Path Attributes based on the BGP peer type (iBGP/eBGP/RR-Client). From the Adj-RIB-Out Post, the MAC Advertisement Route (Route Type 2) Update message is sent to Spine-11 (Route-Reflector). The RR Spine-11 forwards the message to its RR-Client Leaf-102. Figure 1-1 illustrates the whole process while following figures are simplified and Adj-RIB-In/Out are shown as one entity without Pre-Post sub-DBs.
In addition to MAC address and Next Hop information, the NLRI includes the Route Distinguisher (RD), which is a kind of prefix. RD for MAC route is formed from the sender VTEP switch BGP RID + Vlan Id where MAC address belongs to. In Leaf-101, RD value 192.168.77.101:32777 is attached to all outgoing MAC route advertisement concerning VLAN 10. Spine switches use RD information to differentiate possible overlapping MAC/IP information (Spine switches are not L2VNI/VRF aware).
There is also MPLS Label Stack 1 field in NLRI, which includes the L2VNI Identifier. Leaf-101 local VLAN 10 is mapped to VNI 10000 (= MPLS Label Stack 1: 10000). VNI Id is used in Data Plane in VXLAN header.
The update message has two BGP Extended Community Attributes. First one, the Route-Target attribute is used for route export/import policy by VTEP switches. The second one, Encapsulation type defines the encapsulation used in Data Plane (Type 8 = VXLAN).
Phase 4: BGP AFI L2EVPN MAC Route Import on Remote VTEP
VTEP switch Leaf-102 receives the MAC route Advertisement and installs it into Adj-RIB-In Pre database without modification. Routes are imported based on EVPN import policies into Adj-RIB-In Post. During this import process, the RDs are changed from the received RD to RD defined under EVPN Instance. Routes moved into Adj-RIB-In Post are then run through the BGP Best Path decision process and the best route is installed into Loc-RIB.
Phase 5: MAC VRF on Remote VTEP
From the Loc-RIB, route information is imported into L2RIB (MAC VRF). Based on the L2VNI Id carried in MPLS Label Stack 1 field, MAC route is installed into MAC VRF with topology Id 20 (VLAN 20). The source of the information is BGP. Port information points to the remote NVE1 interface IP address of VTEP switch Leaf-101.
Phase 6: MAC Address Table on Remote VTEP
As the last step, the remote VTEP Leaf-102 L2FWDER component installs the MAC reachability information from the MAC VRF into its VLAN 20 MAC address table. The Next-Hop points to Leaf-101 NVE1 interface.
Now both Leaf-101 and Leaf-102 has up to date information on their databases concerning the reachability information of host vmBeef MAC address and they are able to send frames to vmBeef.
As the last step, the remote VTEP Leaf-102 L2FWDER component installs the MAC reachability information from the MAC VRF into its VLAN 20 MAC address table. The Next-Hop points to Leaf-101 NVE1 interface.
Figure 1-3: BGP EVPN Control Plane Operational MAC advertisement.
|
Monitoring
Phase 1: MAC Address Table on Local VTEP
Example 1-1 shows the MAC address table of local VTEP switch Leaf-10. The MAC address 1000.0010.beef is located behind port e1/2 and it belongs to VLAN 10. The default MAC entry aging time is 1800 seconds.
Leaf-101# show system internal l2fwder mac
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
* 10 1000.0010.beef dynamic 00:03:27 F F Eth1/2
Example 1-1: show system internal l2fwder mac
Phase 2: MAC VRF on Local VTEP
Example 1-2 illustrates the process of how the L2FWDER component notices the new MAC address entering from the interface e1/2 (interface index 1a00200). Example 1-3 verifies the if_index to interface mapping. The received frame has 802.1Q tag, where VLAN Id is set to10. Based on VLAN Id, the L2FWDER component is able to install the MAC reachability information into right MAC VRF. Example 1-4 verifies the VLAN to VNI topology mapping. Example 1-5 illustrates the actual content of MAC VRF in L2RIB.
Leaf-101# show system internal l2fwder event-history events | i beef
l2fwder_dbg_ev, 690 l2fwder_vxlan_mac_update, 886MAC move 1000.0010.beef (10) 0x0 -> 0x1a000200
l2fwder_dbg_ev, 690 l2fwder_l2rib_add_delete_local_mac_routes, 154Adding route topo-id: 10, macaddr: 1000.0010.beef, nhifindx: 0x1a000200
l2fwder_dbg_ev, 690 l2fwder_l2rib_mac_update, 736MAC move 1000.0010.beef (10) 0x0 -> 0x1a000200
l2fwder_construct_and_send_macmv_ntf_per_cookie, 5261 mac 1000.0010.beef vlan 1 new if_index = 1a000200, old if_index = 0, is_del=0
Example 1-2: show system internal l2fwder event-history events | i beef
Example 1-3 shows from top to down how the L2RIB is updated.
Leaf-101# sh system internal l2rib event-history mac | i beef
Rcvd MAC ROUTE msg: (10, 1000.0010.beef), vni 0, admin_dist 0, seq 0, soo 0,
(10,1000.0010.beef):Mobility check for new rte from prod: 3
(10,1000.0010.beef):Current non-del-pending route local:no, remote:no, linked mac-ip count:1
(10,1000.0010.beef):Clearing routelist flags: Del_Pend,
(10,1000.0010.beef,3):Is local route. is_mac_remote_at_the_delete: 0
(10,1000.0010.beef,3):MAC route created with seq 0, flags L, (),
(10,1000.0010.beef,3): soo 0, peerid 0, pc-ifindex 0
(10,1000.0010.beef,3):Encoding MAC best route (ADD, client id 5)
(10,1000.0010.beef,3):vni:10000 rt_flags:L, admin_dist:6, seq_num:0 ecmp_label:0 soo:0(--)
(10,1000.0010.beef,3):res:Regular esi:(F) peerid:0 nve_ifhdl:1224736769 mh_pc_ifidx:0 nh_count:1
(10,1000.0010.beef,3):NH[0]:Eth1/2
Example 1-3: show system internal l2rib event-history mac | i beef
Example 1-4 show that the if-index 0x1a000200 points to interface e1/2.
Leaf-101# show interface snmp-ifindex | i 0x1a000200
Eth1/2 436208128 (0x1a000200)
Example 1-4: show interface snmp-ifindex | i 0x1a000200
Example 1-5 shows that the VLAN 10 is attached to L2VNI 10000.
Leaf-101# show vlan id 10 vn-segment
VLAN Segment-id
---- -----------
10 10000
Example 1-5: show vlan id 10 vn-segment
Example 1-6 illustrate the MAC VRF entry in L2RIB of L2VNI 10000.
Leaf-101# show l2route evpn mac evi 10
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen
Topology Mac Address Prod Flags Seq No Next-Hops
----------- -------------- ------ ------------- ---------- ----------------
10 1000.0010.beef Local L, 0 Eth1/2
Example 1-6: show l2route evpn mac evi 10
Phase 3: BGP MAC route processing on Local VTEP
Example 1-7 shows how the BGP process of local VTEP switch Leaf-101 receives the MAC route sent from L2RIB. Leaf-101 installs the MAC route information into BGP Loc-RIB with required information related to BGP EVPN Route-Type 2 advertisement (L2VNI Identifier, Route-Target and Encapsulation type). The bit count /112 at the end of address is the sum of bits for RD (8 octets) + MAC address (6 octets) = 14 octets = 112 bits.
Leaf-101# sh bgp internal event-history events | i beef
BRIB: [L2VPN EVPN] Installing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 (local) via 192.168.100.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 ENCAP:8
EVT: Received from L2RIB MAC route: Add ESI 0000.0000.0000.0000.0000 topo 10000 mac 1000.0010.beef flags 0x000002 soo 0 seq 0 reorig: 0
Example 1-7: show bgp l2vpn evpn 1000.0010.beef
Example 1-8 shows the BGP Loc-RIB entry concerning the NLRI of vmBeef. The address information in BGP entry are explained above:
§ Route Distinguisher 192.168.77.101:32777
§ [2] - BGP EVPN Route-Type 2, MAC/IP Advertisement Route
§ [0] - Ethernet Segment Identifier (ESI), all zeroed out = single homed site
§ [0] - Ethernet Tag Id, EVPN routes must use value 0
§ [48] - Length of MAC address
§ [1000.0010.beef] - MAC address
§ [0] - Length of IP address
§ [0.0.0.0] - Carried IP address
§ /216 - Length of the MAC VRF NLRI in bits: RD (8 octets) + MAC address (6 octets) + L2VNI Id (3 octets) + ESI (10 octets) = 27 octets = 216 bits.
The L2VNI information is shown in the Received Label field. There are also two BGP Extended Community Path Attributes:
§ Route-Target: 65000:10000 - Used for export/Import policy (Control Plane)
§ Encapsulation 8: Defines the encapsulation type VXLAN (Data Plane).
Leaf-101# show bgp l2vpn evpn 1000.0010.beef
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777 (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216, version 28
Paths: (1 available, best #1)
Flags: (0x000102) on xmit-list, is not in l2rib/evpn
Advertised path-id 1
Path type: local, path is valid, is best path
AS-Path: NONE, path locally originated
192.168.100.101 (metric 0) from 0.0.0.0 (192.168.77.101)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 10000
Extcommunity: RT:65000:10000 ENCAP:8
Path-id 1 advertised to peers:
192.168.77.11
<- Comment: For the simplicity, the MAC-IP entry removed from this output->
Example 1-8: show bgp l2vpn evpn 1000.0010.beef
Capture 1-1 shows the BGP EVPN Update message sent by Leaf-101. Note that the Next Hop address and the MPLS Label Stack (L2VNI ID) are only visible in HEX portion of the capture:
Next Hop: HEX c0 a8 64 65 = BIN 192.168.100.101
MPLS Label Stack 1: HEX 00 27 10 = 10000 (L2VNI id)
Border Gateway Protocol - UPDATE Message
Type: UPDATE Message (2)
Path attributes
Path Attribute - ORIGIN: IGP
Path Attribute - AS_PATH: empty
Path Attribute - LOCAL_PREF: 100
Path Attribute - EXTENDED_COMMUNITIES
Type Code: EXTENDED_COMMUNITIES (16)
Carried extended communities: (2 communities)
Route Target: 65000:10000 [Transitive 2-Octet AS-Specific]
Type: Transitive 2-Octet AS-Specific (0x00)
Subtype (AS2): Route Target (0x02)
2-Octet AS: 65000
4-Octet AN: 10000
Encapsulation: VXLAN Encapsulation [Transitive Opaque]
Type: Transitive Opaque (0x03)
Subtype (Opaque): Encapsulation (0x0c)
Tunnel type: VXLAN Encapsulation (8)
Path Attribute - MP_REACH_NLRI
Flags: 0x90, Optional, Extended-Length, Non-transitive, Complete
Length: 44
Address family identifier (AFI): Layer-2 VPN (25)
Subsequent address family identifier (SAFI): EVPN (70)
Next hop network address (4 bytes)
Number of Subnetwork points of attachment (SNPA): 0
Network layer reachability information (35 bytes)
EVPN NLRI: MAC Advertisement Route
Route Type: MAC Advertisement Route (2)
Length: 33
Route Distinguisher: 0001c0a84d658009 (192.168.77.101:32777)
ESI: 00 00 00 00 00 00 00 00 00
ESI Type: ESI 9 bytes value (0)
ESI 9 bytes value: 00 00 00 00 00 00 00 00 00
Ethernet Tag ID: 0
MAC Address Length: 48
MAC Address: Private_10:be:ef (10:00:00:10:be:ef)
IP Address Length: 0
IP Address: NOT INCLUDED
MPLS Label Stack 1: 625, (BOGUS: Bottom of Stack NOT set!)
0000 5e 00 00 01 00 07 5e 00 00 00 00 07 08 00 45 c0 ^.....^.......E.
0010 00 9c 74 69 00 00 40 06 e9 71 c0 a8 4d 65 c0 a8 ..ti..@..q..Me..
0020 4d 0b 66 ea 00 b3 52 ff 40 6e f4 86 72 ab 80 18 M.f...R.@n..r...
0030 0e 42 7f 75 00 00 01 01 08 0a 00 0f 04 b0 00 0f .B.u............
0040 02 c5 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................
0050 ff ff 00 68 02 00 00 00 51 40 01 01 00 40 02 00 ...h....Q@...@..
0060 40 05 04 00 00 00 64 c0 10 10 00 02 fd e8 00 00 @.....d.........
0070 27 10 03 0c 00 00 00 00 00 08 90 0e 00 2c 00 19 '............,..
0080 46 04 c0 a8 64 65 00 02 21 00 01 c0 a8 4d 65 80 F...de..!....Me.
0090 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30 ...............0
00a0 10 00 00 10 be ef 00 00 27 10 ........'.
Capture 1-1: BGP EVPN Update concerning the MAC address of vmBeef
Phase 4: BGP MAC Route Import on Remote VTEP
Example 1-9 shows the partial output of BGP Adj-RIB-In and BGP Loc-RIB tables of remote VTEP switch Leaf-102 concerning the MAC address of vmBeef NLRI. The first part after Comment-1 shows update entries stored into Adj-RIB-In. The only difference compared to what was seen in VTEP Leaf-101 BGP Loc-RIB is that the switch Spine (RR) has added an “Originator (Leaf-101)” and “Cluster List (Spine-11)” information to update message. The second part after Comment-2 shows the BGP Loc-RIB information imported from BGP Adj-RIB-In through the Policy Engine and decision process. If we compare NLRI information between BGP Adj-RIB-In and BGP Loc-RIB, we can see, that during the import process, the only changed NLRI information is Route Distinguisher. IP address part is changed to correspond the BGP RID of Leaf-102 and the later part has changed from 32777 to 32787 because of the different VLAN id attached to L2VNI 10000 in Leaf-102 (VLAN 10 in Leaf-101 and VLAN 20 in Leaf-102)
Leaf-102# show bgp l2vpn evpn 1000.0010.beef
<Comment-1: this BGP Adj-RIB-In received from Spine-11>
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216, version 277
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported to 1 destination(s)
AS-Path: NONE, path sourced internal to AS
192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000
Extcommunity: RT:65000:10000 ENCAP:8
Originator: 192.168.77.101 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
<MAC-IP part Snipped>
<Comment-2: this BGP Loc-RIB Imported from BGP Adj-RIB-In>
Route Distinguisher: 192.168.77.102:32787 (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216, version 278
Paths: (1 available, best #1)
Flags: (0x000212) on xmit-list, is in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path, in rib
Imported from 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216
AS-Path: NONE, path sourced internal to AS
192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000
Extcommunity: RT:65000:10000 ENCAP:8
Originator: 192.168.77.101 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
Example 1-9: show bgp l2vpn evpn 1000.0010.beef
Example 1-10 shows the BGP Import process (only partial output for simplicity). The VTEP Leaf-102 receives the BGP EVPN Update. It installs the route into BGP Adj-RIB-In. It validates the Next Hop and then the route is imported into BGP Loc-RIB. From the BGP Loc-RIB route is sent to L2RIB.
Leaf-102# sh bgp internal event-history events | i beef
<Comment-3: Route is sent to L2RIB from BGP Loc-RIB>
RIB: [L2VPN EVPN]: Send to L2RIB 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112
RIB: [L2VPN EVPN] For 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112, added 1 next hops, suppress 0
RIB: [L2VPN EVPN] Adding 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 via 192.168.100.101 to NH list (flags2: 0x0)
RIB: [L2VPN EVPN] Add/delete 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112, flags=0x200, in_rib: no
IMP: [L2VPN EVPN] Created import destination entry for 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112
<Comment-2: Route is installed into BGP Loc-RIB from the BGP Adj-RIB-In>
IMP: [L2VPN EVPN] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 to <default> RD 192.168.77.102:32787
BRIB: [L2VPN EVPN] (192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 (192.168.77.11)): returning from bgp_brib_add, reeval=0new_path: 1, change: 1, undelete: 0, history: 0, force: 0, (pfl
ags=0x40002010) rnh_flag_change 0
BRIB: [L2VPN EVPN] (192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 (192.168.77.11)): bgp_brib_add: handling nexthop, path->flags2: 0x80000
BRIB: [L2VPN EVPN] Created new path to 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 via 192.168.77.111 (pflags=0x40000000, pflags2=0x0)
<Comment-1: Route is installed into BGP Adj-RIB-In>
BRIB: [L2VPN EVPN] Installing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/112 (192.168.77.11) via 192.168.100.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 ENCAP:8
Example 1-10: sh bgp internal event-history events | i beef
Phase 5: MAC VRF on Remote VTEP
As shown in previous example 1-10, the MAC route information is sent from BGP Loc-RIB to L2RIB. Example 1-11 shows the operation of L2FWDER and example 1-12 shows the installation process. Example 1-13 verifies the VLAN to VNI topology mapping. Example 1-14 illustrates the actual content of MAC VRF in L2RIB.
Leaf-102# show system internal l2fwder event-history events | i beef
l2fwder_dbg_ev, 690 l2fwder_l2rib_add_remote_entry, 299Add remote mac entry mac: 1000.0010.beef vni: 20 sw_bd 20 vtep ip: 192.168.100.101
l2fwder_dbg_ev, 690 l2fwder_l2rib_msg_cb, 453MAC address: 1000.0010.beef
Example 1-11: show system internal l2fwder event-history events | i beef
Example 1-12 shows from top to down how the L2RIB is updated.
Leaf-102# sh system internal l2rib event-history mac | i beef
Rcvd MAC ROUTE msg: (20, 1000.0010.beef), vni 0, admin_dist 0, seq 0, soo 0,
(20,1000.0010.beef):Mobility check for new rte from prod: 5
(20,1000.0010.beef):Current non-del-pending route local:no, remote:yes, linked mac-ip count:1
(20,1000.0010.beef):Mobility type: remote-to-remote:
(20,1000.0010.beef): New route ESI: (F), SOO: 0, Seq num: 0Existing route ESI: (F), SOO: 0, Seq num: 0 , rt_type: 1
20,1000.0010.beef,5):Using seq number from Recv-based BGP route
(20,1000.0010.beef,5):Setting Recv flag
(20,1000.0010.beef,5):MAC route modified (rc=0) with seq num:0, flags: (SplRcv), soo:0, peerid:1, MH<truncated>
(20,1000.0010.beef,5):Encoding MAC route (ADD, client id 0)
(20,1000.0010.beef,5):vni:10000 rt_flags: admin_dist:20, seq_num:0 ecmp_label:0 soo:0(--)
(20,1000.0010.beef,5):res:Regular esi:(F) peerid:1 nve_ifhdl:1224736769 mh_pc_ifidx:0 nh_count:1
(20,1000.0010.beef,5):NH[0]:192.168.100.101
Example 1-12: sh system internal l2rib event-history mac | i beef
Example 1-13 shows that the VLAN 20 is attached to L2VNI 10000.
Leaf-102# sh vlan id 20 vn-segment
VLAN Segment-id
---- -----------
20 10000
Example 1-13: sh vlan id 20 vn-segment
Example 1-14 shows the MAC VRF entry in L2RIB of L2VNI 10000.
Leaf-102# show l2route evpn mac evi 20
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen
Topology Mac Address Prod Flags Seq No Next-Hops
----------- -------------- ------ ------------- ---------- ----------------
20 1000.0010.beef BGP SplRcv 0 192.168.100.101
Example 1-14: show l2route evpn mac evi 20
Phase 6: MAC Address Table on Remote VTEP
Example 1-15 shows the updated MAC address table of VTEP switch Leaf-102.
Leaf-102# show system internal l2fwder mac | i beef
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
* 20 1000.0010.beef static - F F (0x47000001) nve-peer1
Example 1-15: show system internal l2fwder mac | i beef
Data Plane testing
ARP Request/Reply
Both Virtual Machines vmBeef and vmAbba belongs to the same subnet 192.168.11.0/24. VmBeef starts pinging to vmAbba. VmBeef does not yet have the MAC address information of host vmAbba in its ARP table so it starts address resolution process (figure 1-4). It sends an ARP-request message where it asks who has the IP address 192.168.11.22. The destination MAC address of ARP request is L2 Broadcast address ff.ff.ff.ff.ff.ff.
When Leaf-101 receives the frame from its port e1/2, it checks the VLAN Id from the 802.1Q tag and based on it, Leaf-101 knows that the Broadcast frame belongs to and has to be switched inside Local VLAN 10 and global L2VNI 10000. Leaf-101 removes the 802.1Q tag from the original Ethernet frame and add the VXLAN header, UDP header, IP header, and outer Ethernet header. The outer Ethernet header gets it source MAC address from the Leaf-101 NVE 1 interface while the destination MAC address is derived from the destination Multicast Group IP address 238.0.0.10. The IP header destination address is 238.0.0.10, which is Mcast group address used in VNI 10000 for BUM traffic. The source IP address is taken from the NVE 1 Interface. UDP destination port is 4789, which is reserved for VXLAN and the UDP source port is generated based on inner frame payload. Note that the UDP source port is the only changing variable when doing ECMP load balancing between the equal-cost links based on 5-tuple input (destination IP/source IP, Layer 4 Protocol and source port/destination port). The VXLAN header Virtual Network Identifier (VNI) is taken from the VLAN to VNI database where VLAN 10 belongs to (VNI 10000). Leaf-101 forwards the packet out of all Interfaces belonging to Outgoing Interface List (OIL) of Mcast Group 238.0.0.10.
VTEP switch Leaf-102 receives the ARP-request sent by vmBeef. Leaf-102 removes the headers used for VXLAN tunneling (outer Ethernet header, IP header, UDP header, and VXLAN header). Based on the VNI-to-VLAN mapping database, Leaf-102 knows that it has to switch received Broadcast Ethernet frame out of its interfaces participating in VLAN 20. Leaf-102 adds the 801.Q tag with VLAN Id 20 into the frame and forwards it out of the Interface e1/2 towards vmAbba.
Figure 1-4: ARP request processing
|
Ethernet II, Src: 10:00:00:10:be:ef, Dst: Broadcast (ff:ff:ff:ff:ff:ff)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0000 1010 = ID: 10
Type: ARP (0x0806)
Padding: 0000000000000000000000000000
Trailer: 00000000
Address Resolution Protocol (request)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
Sender MAC address: 10:00:00:10:be:ef
Sender IP address: 192.168.11.12
Target MAC address: 00:00:00_00:00:00
Target IP address: 192.168.11.22
Capture 1-2: ARP request from vmBeef to vmAbba: vmBeef to Leaf-101.
Capture 1-3 shows the ARP-message captured from the link between the VTEP switch Leaf-101 and Spine switch Spine-11.
Ethernet II, Src: 5e:00:00:00:00:07 (5e:00:00:00:00:07), Dst: IPv4mcast_0a (01:00:5e:00:00:0a)
Destination: IPv4mcast_0a (01:00:5e:00:00:0a)
Source: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.100.101, Dst: 238.0.0.10
User Datagram Protocol, Src Port: 62378, Dst Port: 4789
Virtual eXtensible Local Area Network
Flags: 0x0800, VXLAN Network ID (VNI)
Group Policy ID: 0
VXLAN Network Identifier (VNI): 10000
Reserved: 0
Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Destination: Broadcast (ff:ff:ff:ff:ff:ff)
Source: Private_10:be:ef (10:00:00:10:be:ef)
Type: ARP (0x0806)
Trailer: 000000000000000000000000000000000000
Address Resolution Protocol (request)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
Sender MAC address: Private_10:be:ef (10:00:00:10:be:ef)
Sender IP address: 192.168.11.12
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: 192.168.11.22
Capture 1-3: ARP request from vmBeef to vmAbba: Leaf-101 to Spine-11.
Capture 1-4 shows the ARP Request message captured from the link between the VTEP switch Leaf-102 and vmAbba.
Ethernet II, Src: 10:00:00:10:be:ef, Dst: ff:ff:ff:ff:ff:ff
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 20
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0001 0100 = ID: 20
Type: ARP (0x0806)
Padding: 0000000000000000000000000000
Trailer: 00000000
Address Resolution Protocol (request)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
Sender MAC address: Private_10:be:ef (10:00:00:10:be:ef)
Sender IP address: 192.168.11.12
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: 192.168.11.22
Capture 1-4: ARP request from vmBeef to vmAbba: Captured from the link Leaf-102 to vmAbba.
VmAbba receives the ARP-request. It sends and an ARP-reply message as Unicast to vmBeef. The process of frame handling is illustrated in figure 1-5.
Capture 1-5 shows the ARP Reply message captured from the link between the VTEP switch Leaf-102 and vmAbba.
Ethernet II, Src: 10:00:00:20:ab:ba, Dst: 10:00:00:10:be:ef
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 20
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0001 0100 = ID: 20
Type: ARP (0x0806)
Padding: 0000000000000000000000000000
Trailer: 00000000
Address Resolution Protocol (reply)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: reply (2)
Sender MAC address: 10:00:00:20:ab:ba
Sender IP address: 192.168.11.22
Target MAC address: 10:00:00:10:be:ef
Target IP address: 192.168.11.12:ef
Target IP address: 192.168.11.12
|
Capture 1-5: ARP reply from vmAbba to vmBeef: Captured from the link Host-2 and Leaf-102.
Capture 1-6 shows the ARP-Reply message captured from the link between the VTEP switch Leaf-101 and Spine-11.
Ethernet II, Src: 5e:00:00:01:00:07, Dst: 5e:00:00:00:00:07
Internet Protocol Version 4, Src: 192.168.100.102, Dst: 192.168.100.101
User Datagram Protocol, Src Port: 59206, Dst Port: 4789
Virtual eXtensible Local Area Network
Flags: 0x0800, VXLAN Network ID (VNI)
Group Policy ID: 0
VXLAN Network Identifier (VNI): 10000
Reserved: 0
Ethernet II, Src: 10:00:00:20:ab:ba, Dst: 10:00:00:10:be:ef
Address Resolution Protocol (reply)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: reply (2)
Sender MAC address: 10:00:00:20:ab:ba
Sender IP address: 192.168.11.22
Target MAC address: 10:00:00:10:be:ef
Target IP address: 192.168.11.12
|
Capture 1-6: ARP reply from vmAbba to vmBeef: Leaf-101 to Spine-11.
Capture 1-7 shows the ARP-Reply message captured from the link between the VTEP switch Leaf-101 and vmBeef.
Ethernet II, Src: 10:00:00:20:ab:ba, Dst: 10:00:00:10:be:ef
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0000 1010 = ID: 10
Type: ARP (0x0806)
Padding: 0000000000000000000000000000
Trailer: 00000000
Address Resolution Protocol (reply)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: reply (2)
Sender MAC address: Private_20:ab:ba (10:00:00:20:ab:ba)
Sender IP address: 192.168.11.22
Target MAC address: Private_10:be:ef (10:00:00:10:be:ef)
Target IP address: 192.168.11.12
Capture 1-7: ARP reply from vmAbba to vmBeef: Captured from the link Leaf-101 to vmBeef.
ICMP Request/Reply
After resolving the MAC address of vmAbba, vmBeef sends an ICMP request to vmBeef. It sends the ICMP-request message with the destination IP address 192.168.11.22. The destination MAC address in Ethernet frame is previously resolved MAC address 1000.0020.abba.
VTEP switch Leaf-101 receives the frame and base on VLAN Id 10 in VLAN tag in 802.1Q header, Leaf-101 notices that the frame belongs to L2VNI 10000. Leaf-101 forwards frame based on the information found from the MAC address table of VLAN 10. MAC address entry information concerning to MAC address of vmAbba is taken from L2RIB which in turn has received from BGP. Leaf-101 encapsulates the frame inside a new Ethernet header, IP header, UDP header, and VXLAN header and forwards it towards Leaf-102 via Spine-11.
VTEP switch Leaf receives the Unicast frame, it removes the outer Ethernet header, outer IP header, UDP header, and VXLAN header and forwards the original frame tagged with 802.1Q tag with VLAN Id 20 to vmAbba.
Figure 1-6: ICMP request from vmBeef to vmAbba.
|
Capture 1-8 shows the ICMP Request message captured from the link between the VTEP switch Leaf-101 vmBeef.
Ethernet II, Src: 10:00:00:10:be:ef, Dst: 10:00:00:20:ab:ba
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0000 1010 = ID: 10
Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.11.22
Internet Control Message Protocol
Capture 1-8: ICMP request from vmBeef to vmAbba: Capture from Leaf-101 to vmBeef.
Capture 1-9 shows the ICMP Request message captured from the link between the VTEP switch Leaf-101 and Spine-11.
Ethernet II, Src: 5e:00:00:00:00:07, Dst: 5e:00:00:01:00:07
Internet Protocol Version 4, Src: 192.168.100.101, Dst: 192.168.100.102
User Datagram Protocol, Src Port: 57986, Dst Port: 4789
Virtual eXtensible Local Area Network
Flags: 0x0800, VXLAN Network ID (VNI)
Group Policy ID: 0
VXLAN Network Identifier (VNI): 10000
Reserved: 0
Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: Private_20:ab:ba (10:00:00:20:ab:ba)
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.11.22
Internet Control Message Protocol
Capture 1-9: ICMP request from vmBeef to vmAbba: Capture from Leaf-101 to Spine-11.
Capture 1-10 shows the ICMP Request message captured from the link between the VTEP switch Leaf-102 and Host-2.
Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: Private_20:ab:ba (10:00:00:20:ab:ba)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 20
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0001 0100 = ID: 20
Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.11.22
Internet Control Message Protocol
Capture 1-10: ICMP request from vmBeef to vmAbba: Capture from Leaf-102 to vmAbba.
When vmAbba receives the ICMP Request, its replies it by sending ARP-Reply message to vmBeef. The frame processing is the same than what was shown in ARP-Request process.
Figure 1-7: ICMP Reply from vmAbba to vmBeef.
|
Capture 1-11 shows the ICMP Reply message captured from the link between the VTEP switch Leaf-102 and vmAbba.
Ethernet II, Src: Private_20:ab:ba (10:00:00:20:ab:ba), Dst: Private_10:be:ef (10:00:00:10:be:ef)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 20
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0001 0100 = ID: 20
Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.11.22, Dst: 192.168.11.12
Internet Control Message Protocol
Capture 1-11: ICMP reply from vmAbba to vmBeef: Capture from the link Leaf-102 to vmAbba.
Capture 1-12 shows the ICMP Reply message captured from the link between the VTEP switch Leaf-101 and Spine-11.
Ethernet II, Src: 5e:00:00:01:00:07 (5e:00:00:01:00:07), Dst: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Internet Protocol Version 4, Src: 192.168.100.102, Dst: 192.168.100.101
User Datagram Protocol, Src Port: 57648, Dst Port: 4789
Virtual eXtensible Local Area Network
Flags: 0x0800, VXLAN Network ID (VNI)
Group Policy ID: 0
VXLAN Network Identifier (VNI): 10000
Reserved: 0
Ethernet II, Src: Private_20:ab:ba (10:00:00:20:ab:ba), Dst: Private_10:be:ef (10:00:00:10:be:ef)
Internet Protocol Version 4, Src: 192.168.11.22, Dst: 192.168.11.12
Internet Control Message Protocol
Capture 1-12: ICMP reply from vmAbba to vmBeef: Capture from the link Leaf-101 to Spine-11.
Capture 1-13 shows the ICMP Reply message captured from the link between the VTEP switch Leaf-101 and vmBeef.
Ethernet II, Src: Private_20:ab:ba (10:00:00:20:ab:ba), Dst: Private_10:be:ef (10:00:00:10:be:ef)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0000 1010 = ID: 10
Type: IPv4 (0x0800)
Internet Protocol Version 4, Src: 192.168.11.22, Dst: 192.168.11.12
Internet Control Message Protocol
Capture 1-13: ICMP request from vmAbba to vmBeef: Captured from the link Leaf-101 to vmBeef.
Summary
This section shows how the local VTEP switch learns MAC addresses of its connected hosts and how this information is advertised to remote VTEP switches. This chapter also shows the Data Plane operation between the hosts connected to different VTEP switches in the same L2VNI (Layer 2 domain).
MAC-IP Address Learning Process (ARP for Intra-VNI Switching)
The previous section explains the process of MAC address information propagation in VXLAN Fabric. This section starts by explaining how the local VTEP switch Leaf-101 learns the MAC-IP information of its connected host vmBeef and how it how delivers the information to remote VTEP by using BGP EVPN. The second part of this section explains how VTEP switches use the MAC-IP information to reduce the BUM traffic in VXLAN fabric by using ARP-Suppression.
The MAC-IP learning process starts when the vmBeef comes up and sends an ARP message. This ARP message can be GARP which vmBeef informs its existence to a network and ensures the uniqueness of its IP address or it can be an ARP by which vmBeef try to resolve the MAC address of its Gateway. VTEP switch Leaf-101 installs the MAC-IP address information from the ARP payload into ARP-table. When the ARP suppression is enabled (per VNI), the MAC-IP binding information is also saved into local ARP Suppression Cache. The Host Mobility Manager (HMM) component installs the information into Local Host database and sends the MAC-IP information to BGP process where it is stored into BGP Loc-RIB. The information is advertised to remote VTEP switches by using BGP EVPN Route Type 2 Update (MAC/IP Route Advertisement). The receiving VTEP switch Leaf-102 installs the route first into BGP-Adj-RIB and from where the route is imported into BGP Loc-RIB based on import policy defined under specific EVPN Instance. From the BGP Loc-RIB, the information is stored into IP VRF in L2RIB. As the last step, the MAC-IP information is stored into ARP Suppression Cache. (if ARP suppression is enabled)
This section starts with the MAC-IP learning process overview and then explains the process with examples. Figure 1-8 illustrates the components and databases related to the MAC-IP learning process.
MAC-IP Address Learning Overview
Phase 1: ARP Table on Local VTEP
Virtual Machine Beef located in host-1 comes up. It expresses its’ existence to a network and validates the uniqueness of its IP-address by sending a GARP. VTEP switch Leaf-101 receives the GARP message from interface e1/2 and stores the MAC-IP address binding information from the Sender MAC and the Sender IP fields from the GARP payload into ARP table.
Phase 2-3: MAC-IP on Local VTEP
The Host Mobility Manager component (HMM) learns the MAC-IP information as a local route. HMM installs the information into Local Host Database and forwards the MAC-IP information into IP VRF of L2RIB (MAC-only information is installed into MAC VRF). The Local Host Database includes information about the IP address (/32), MAC address, SVI, and local interface, L2RIB has the same information without SVI information.
Phase 4: BGP Route Export on Local VTEP
VTEP switch Leaf-101 installs the MAC-IP route from the L2RIB into the BGP Loc-RIB. The MAC-IP information is advertised as a separate BGP EVPN Route Type 2 advertisement (dedicated updates for both MAC-only and MAC-IP NLRIs). The difference in carried NLRI information between MAC-Only and MAC-IP route advertisement is that later one has also host IP address and mask information as well as an additional MPLS Label Stack 2 information, that defines the L3VNI used in VRF TENANT77. There are also two additional Extended Communities; RT 65000:10077 and Router MAC 5e00.0000.0007 carried within the update.
Phase 5: BGP Route Import on Remote VTEP
VTEP switch Leaf-102 receives the BGP EVPN MAC route Advertisement and installs it to BGP Adj-RIB-In database without any modification. From there, Leaf-102 imports the route to its BGP Loc-RIB database based on RT import policy. When remote VTEP switch Leaf-102 imports the route from the BGP Adj-RIB into BGP Loc-RIB, it changes the RD to 192.168.77.102:32787 based on its BGP RID and VLAN Id. This process is the same than MAC-Only route import and is based on the same RT 65000:10000.
Phase 6: IP VRF on Remote VTEP
Remote VTEP Leaf-102 verifies the reachability of Next Hop IP address informed in NLRI and since it is a hit, the L2FWDER component installs the MAC-IP route into L2RIB as an IP VRF entry. Local topology ID is now 20 (based on VLAN 20) and the source of the information is BGP. Port information points to the NVE1 interface IP address of VTEP switch Leaf-101.
At this phase both VTEP switches has information of MAC-IP of vmBeef in their IP VRF of L2RIB as well as in BGP tables but only local VTEP switch Leaf-101 has installed the MAC-IP binding information into ARP table.
Figure 1-8: MAC-IP learning process.
|
MAC-IP Address Monitoring
Phase 1: ARP Table on Local VTEP
Example 1-16 shows the ARP table of VRF TENANT77. The default aging time for locally learned ARP-entries is in NX-OS is 1500 seconds, which is 300 seconds shorter than MAC-address aging timer. When the ARP aging timers exceed, the switch checks the presence of the host by sending an ARP-request to host. If the host response to ARP-request, the switch will reset the aging timer. If the host does not reply, the entry is removed from the ARP-table but kept in BGP EVPN table for an additional 1800 seconds (MAC aging timer) before the withdrawn message is sent. The MAC address aging timer should be bigger than the ARP aging timer. This is because the ARP refresh process will also update the MAC table and unnecessary flooding can be avoided.
Leaf-101# sh ip arp vrf TENANT77
<snipped>
IP ARP Table for context TENANT77
Total number of entries: 1
Address Age MAC Address Interface Flags
192.168.11.12 00:03:34 1000.0010.beef Vlan10
Example 1-16: sh ip arp vrf TENANT77
Phase 2-3: MAC-IP on Local VTEP
Example 1-17 shows the partial MAC-IP learning process on Leaf-101.
Leaf-101# show system internal l2rib event-history mac-ip
L2RIB MAC-IP Object Event Logs:
Rcvd MAC-IP ROUTE BASE msg: obj_type: 13 oper_type: 1 oper_sbtype: 0 producer: 12
Rcvd MAC-IP ROUTE msg: (10, 1000.0010.beef, 192.168.11.12), l2 vni 0, l3 vni 10077,
(10,1000.0010.beef,192.168.11.12):MAC-IP entry created
(10,1000.0010.beef,192.168.11.12,12):MAC-IP route created with flags 0, l3 vni 10077, seq 0
(10,1000.0010.beef,192.168.11.12,12): admin dist 7, soo 0, peerid 0, peer ifindex 0
(10,1000.0010.beef,192.168.11.12,12): esi (F), pc-ifindex 0
(10,1000.0010.beef,192.168.11.12,12):Encoding MAC-IP best route (ADD, client id 5), esi: (F)
Example 1-17: show system internal l2rib event-history mac-ip
Example 1-18 shows the information related to vmBeef MAC-IP binding in Local Host Database (HMM RIB) of VRF TENANT77.
Leaf-101# show fabric forwarding ip local-host-db vrf TENANT77
HMM host IPv4 routing table information for VRF TENANT77
<snipped>
Host MAC Address SVI Flags Physical Interface
* 192.168.11.12/32 1000.0010.beef Vlan10 0x420201 Ethernet1/2
Example 1-18: show fabric forwarding ip local-host-db vrf TENANT77
Example 1-19 shows that the information concerning the MAC-IP of vmBeef in IP VRF in L2RIB is produced by HMM component.
Leaf-101# show l2route mac-ip topology 10 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated
Topology Mac Address Prod Flags Seq No Host IP Next-Hops
----------- -------------- ------ ------ ------- ------------- -----------
10 1000.0010.beef HMM -- 0 192.168.11.12 Local
L3-Info: 10077
Example 1-19: show fabric forwarding ip local-host-db vrf TENANT77
Phase 4: BGP Route Export on Local VTEP
Example 1-20 shows the internal process how VTEP switch Leaf-101 receives the MAC-IP route information and installs it into RIB and BGP Loc-RIB. Note that BGP Extended Community Router MAC information is not shown in the output. The mask length is includes RD (8 octet) + MAC address (6 octet) + IP address (4 octet) = 18 octets = 144 bits. The octet count of the prefix can be seen from the RIB event “Adding Prefix”.
Leaf-101# sh bgp internal event-history events | i beef
BRIB:
[L2VPN EVPN] Installing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 (local) via 192.168.100.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Rou
RIB:
[L2VPN EVPN] Adding prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12] Route Length 16 Prefix Length 18:
EVT:
Received from L2RIB MAC-IP route: Add ESI 0000.0000.0000.0000.0000 topo 10000 mac 1000.0010.beef ip 192.168.11.12 L3 VNI 10077 flags 00000000 soo 0 seq 0, reorig :0
Example 1-20: Leaf-101# sh bgp internal event-history events | i beef
Example 1-21 shows the BGP Loc-RIB concerning the MAC-IP NLRI of vmBeef. Prefix information is explained belov:
§ Route Distinguisher
§ [2] - BGP EVPN Route-Type 2, MAC/IP Advertisement Route
§ [0] - Ethernet Segment Identifier (ESI), all zeroed out = single homed site
§ [0] - Ethernet Tag Id, EVPN routes must use value 0
§ [48] - Length of MAC address
§ [1000.0010.beef] - MAC address
§ [32] - Length of IP address
§ [192.168.11.12] - Carried IP address
§ /272 - Length of the MAC-IP VRF NLRI in bits: RD (8 octets) + MAC address (6 octets) + L2VNI Id (3 octets) + L3VNI Id (3 octets) + IP address (4 octets) ESI (10 octets) = 34 octets = 272 bits.
§
The L2VNI information is shown in Received Label field. There are also three BGP Extended Community Path Attributes:
§ Route-Target: 65000:10000 - Used for export/Import policy (L2VNI)
§ Route-Target: 65000:10077 - Used for export/Import policy (L3VNI)
§ Encapsulation 8: Defines the encapsulation type VXLAN (Data Plane)
§ Router MAC: 5e00.0000.0007 - Used for Inner MAC Header source address for routed packets. This is needed because VXLAN is MAC in IP/UDP encapsulation tunneling mechanism and data payload over L3 border does not carry source host MAC address information. This is where the RMAC is used.
Leaf-101# sh bgp l2vpn evpn 192.168.11.12
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777 (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 5
Paths: (1 available, best #1)
Flags: (0x000102) on xmit-list, is not in l2rib/evpn
Advertised path-id 1
Path type: local, path is valid, is best path
AS-Path: NONE, path locally originated
192.168.100.101 (metric 0) from 0.0.0.0 (192.168.77.101)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 10000 10077
Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
Path-id 1 advertised to peers:
192.168.77.11
Example 1-21: sh bgp l2vpn evpn 192.168.11.12
Phase 5: BGP AFI L2VPN EVPN MAC Route Import on Remote VTEP
Example 1-21 shows the internal process, where received MAC-IP route is installed into BGP Adj-RIB-In with RD 192.168.100.101:32777. This route is imported into BGP Loc-RIB with RD 192.168.100.102:32787 and send to L2RIB. Note that the example includes the installation process of L3RIB.
Leaf-102# sh bgp internal event-history events | i beef
RIB: [L2VPN EVPN]: Send to L2RIB 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144
RIB: [L2VPN EVPN] For 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144, added 1 next hops, suppress 0
RIB: [L2VPN EVPN] Adding 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 via 192.168.100.101 to NH list (flags2: 0x0)
RIB: [L2VPN EVPN] Add/delete 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144, flags=0x200, in_rib: no
IMP: [L2VPN EVPN] Created import destination entry for 192.168.77.102:3:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144
IMP: [L2VPN EVPN] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <default> RD 192.168.77.102:3
IMP: [L2VPN EVPN] Created import destination entry for 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144
IMP: [L2VPN EVPN] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <default> RD 192.168.77.102:32787
IMP: [IPv4 Unicast] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <TENANT77> RD 192.168.77.102:3
RIB: [L2VPN EVPN] Add/delete 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144, flags=0x200, evi_ctx invalid, in_rib: no
BRIB: [L2VPN EVPN] (192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 (192.168.77.11)): returning from bgp_brib_add, reeval=0new_path: 1, change: 1, undelete: 0, history: 0, force: 0, (pflags=0x40002010) rnh_flag_ch
BRIB: [L2VPN EVPN] (192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 (192.168.77.11)): bgp_brib_add: handling nexthop, path->flags2: 0x80000
BRIB: [L2VPN EVPN] Created new path to 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 via 192.168.77.111 (pflags=0x40000000, pflags2=0x0)
BRIB: [L2VPN EVPN] Installing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 (192.168.77.11) via 192.168.100.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 RT:65000:10077 ENC
Example 1-21: sh bgp internal event-history events | i beef
Example 1-22 shows the partial output of BGP Adj-RIB-In and BGP Loc-RIB tables. The L3VNI routing information is excluded for simplicity. The first part after Comment-1 includes information received via BGP EVPN Route Type 2 MAC-IP route Advertisement originated by VTEP switch Leaf-101. The only notable difference compared to what was seen in VTEP Leaf-101 BGP Loc-RIB is that the switch Spine (RR) has added a “Originator (Leaf-101)” and “Cluster List (Spine-11)” information to update message. The second part after Comment-2 shows the BGP Loc-RIB information imported from BGP Adj-RIB-In. If we compare information installed into BGP Adj-RIB-In and BGP Loc-RIB, we can see that during the import process from Adj-RIB-In into Loc-RIB the only changing NLRI information is Route Distinguisher, just like in case of MAC-only route import
Leaf-102# sh bgp l2vpn evpn 192.168.11.12
BGP routing table information for VRF default, address family L2VPN EVPN
< Comment#1 BGP Adj-RIB-In update originated by Leaf-101 >
Route Distinguisher: 192.168.77.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 6
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported to 3 destination(s)
AS-Path: NONE, path sourced internal to AS
192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000 10077
Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
Originator: 192.168.77.101 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
< Comment#2 – BGP Loc-RIB imported from Adj-RIB >
Route Distinguisher: 192.168.77.102:32787 (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 7
Paths: (1 available, best #1)
Flags: (0x000212) on xmit-list, is in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path, in rib
Imported from 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272
AS-Path: NONE, path sourced internal to AS
192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000 10077
Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8
Router MAC:5e00.0000.0007
Originator: 192.168.77.101 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
< Comment#3 - L3VNI 10077 information removed for simplicity >
Example 1-22: sh bgp l2vpn evpn 192.168.11.12
Phase 6: IP VRF on Remote VTEP
Example 1-23 shows the partial MAC-IP learning process.
Leaf-102# sh system internal l2rib event-history mac-ip
L2RIB MAC-IP Object Event Logs:
Rcvd MAC-IP ROUTE BASE msg: obj_type:13 oper_type:1 oper_sbtype: 0 producer: 5
Rcvd MAC-IP ROUTE msg:(20, 1000.0010.beef, 192.168.11.12), l2 vni 0, l3 vni 0,
Rcvd MAC-IP ROUTE msg: flags , admin_dist 0, seq 0, soo 0, peerid 0,
Rcvd MAC-IP ROUTE msg: res 0, esi (F), ifindex 0, nh_count 1, pc-ifindex 0
NH: 192.168.100.101
(20,1000.0010.beef,192.168.11.12):MAC-IP entry created
(20,1000.0010.beef,192.168.11.12,5):MAC-IP route created with flags 0, l3 vni 0, seq 0
(20,1000.0010.beef,192.168.11.12,5): admin dist 20, soo 0, peerid 0, peer ifindex 0
(20,1000.0010.beef,192.168.11.12,5): esi (F), pc-ifindex 0
Example 1-23: sh system internal l2rib event-history mac-ip
Example 1-24 shows that the MAC-IP information in L2RIB is produced by BGP.
Leaf-102# show l2route mac-ip topology 20 detail
<snipped>
Topology Mac Address Prod Flags Seq No Host IP Next-Hops
----------- -------------- ------ ----- ----- ------ ------------
20 1000.0010.beef BGP -- 0 192.168.11.12 192.168.100.101
|
Example 1-24: sh system internal l2rib event-history mac-ip
At this phase, both VTEP switches have the MAC-IP address information of vmBeef.
ARP-Suppression
The previous section explains how the MAC-IP address information is propagated in BGP EVPN VXLAN fabric. This section describes how the VTEP switches use MAC-IP binding information to reduce the unnecessary Broadcast traffic in VXLAN fabric.
We are going start from the phase where vmBeef comes up and send GARP/ARP message to the network. Leaf-101 installs the MAC-IP binding information into ARP table of VRF TENANT77. Example 1-25 shows the ARP table and figure 1-9 illustrates the overall process.
Figure 1-9: MAC-IP information in ARP table and ARP Suppress Cache..
|
Leaf-101# sh ip arp vrf TENANT77 | b Address
Address Age MAC Address Interface Flags
192.168.11.12 00:02:01 1000.0010.beef Vlan10
|
Example 1-25: sh system internal l2rib event-history mac-ip
When VNI based ARP-Suppression is enabled on local VTEP switches, the MAC-IP address binding information is also installed into local ARP Suppression Cache from the ARP table. (Example 1-26).
Leaf-101# sh ip arp suppression-cache detail
<snipped>
Ip Address Age Mac Address Vlan Physical-ifindex Flags Remote Vtep Addrs
192.168.11.12 00:03:06 1000.0010.beef 10 Ethernet1/2 L
|
Example 1-26: sh ip arp suppression-cache detail
When ARP-suppression enabled on remote VTEP switches, the ARP Suppression Cache information is taken from the IP VRF of L2RIB. Example 1-27 illustrates this on Leaf-102 perspective.
Leaf-102# show ip arp suppression-cache detail
<snipped>
Ip Address Age Mac Address Vlan Physical-ifindex Flags Remote Vtep Addrs
192.168.11.12 00:03:33 1000.0010.beef 20 (null) R 192.168.100.101
|
Example 1-27: show ip arp suppression-cache detail
Figure 1-10 illustrates the ARP operation with and without ARP suppression as well as with Unknown Unicast Suppression.
No Suppression: All ARP-Requests are flooded towards Mcast group defined for specific VNI and all VTEP switches joined to that group receives the ARP Request message and forwards it out of the ports participating in Broadcast domain defined by VNI Id in VXLAN header.
ARP Suppression: he Local VTEP switch checks if the requested MAC-IP binding information is stored into local ARP Suppression Cache. If the check is hit, switch sends an ARP reply back to the requester without flooding the actual ARP request to the network. If the ARP Suppression Cache check is a miss, then the ARP request is flooded to the network. ARP suppression should be enabled only after initial Intra-VNI reachability testing.
ARP and Unknown Unicast Suppression: Works the same way than ARP-Suppression in case that ARP Suppression check is hit but in case of a miss, the ARP Request is dropped. This option requires that there is no silent host in the VXLAN Fabric.
Figure 1-10: MAC-IP information in ARP table and ARP Suppress Cache.
|
At this phase, the network is able to work as a transparent Layer 2 switch for hosts participating in L2VNI 10000 and switch frames between the hosts connected to it.
Host route and Prefix Advertisement: Inter-VNI routing (L3VNI)
First two sections explain how the MAC and MAC-IP information of hosts are propagated over the VXLAN Fabric and how the information is used for Intra-VNI switching and MAC address resolution as well as reducing BUM traffic. This section explains how host routes are imported into L3RIB and how this information is used for Inter-VNI routing. In addition, this section explains the mechanism how MAC address information of silent hosts is resolved by using prefix route advertisement.
Host Route from the Inter-VNI routing perspective
Phase 1. Host Route in Local Routing Information Base (RIB)
Section “MAC-IP Learning Process” describes how the local VTEP switch installs the MAC-IP address binding information into ARP table and how the HMM component installs the information into IP VRF. In addition to this process, HMM component installs the MAC-IP information from the ARP-Table into L3RIB.
Phase 2. Host Route BGP Process on Local VTEP
Section “MAC-IP Learning Process” also covers the process how the MAC-IP information is sent from the IP VRF to the Loc-RIB through the decision process and from there send to Adj-RIB-Out where it is advertised as a BGP EVPN Route type 2 Update to remote VTEP switches.
Phase 3. Host Route BGP Process on Remote VTEP
The section “MAC-IP Learning Process” did not explain how the MAC-IP routing information ends up into L3RIB of Remote VTEP switch. BGP EVPN Route type 2 Update concerning the MAC-IP NLRI of vmBeef includes also Route Target 65000:10077 (L3VNI). The received NLRI information is sent through the Import Policy Engine (import is based on RT 65000:10077) and Decision process into Loc-RIB as an L3VNI entry. During the Input Policy processing, the original RD 192.168.77.101:32777 is changes to VRF TENANT77 specific RD 192.168.77.102:3 (3 = VRF Id of VRF TENANT77). RD is used for the differentiated overlapping IP address in different VRFs.
Phase 4. Installing Host Route into RIB of Remote VTEP
The route is installed into L3 RIB from the BGP Loc-RIB. The RIB entry includes information about Next Hop address and tunnel id, encapsulation type (VXLAN), segment Id and route source (BGP). At this phase, both local VTEP switches Leaf-101 and remote VTEP switch Leaf-102 are capable to route traffic to vmBeef (belonging to L2VNI 10000) from the hosts participating in different L2VNI.
Figure 1-12: Host route propagation over VXLAN Fabric.
|
Monitoring
Phase 1. Host Route in Local Routing Information Base (RIB)
Example 1-28 show the RIB of VRF TENANT77 in local VTEP switch Leaf-101. The route is learned from VLAN 10 and it is installed into RIB by HMM.
Leaf-101# show ip route 192.168.11.12 vrf TENANT77
IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.11.12/32, ubest/mbest: 1/0, attached
*via 192.168.11.12, Vlan10, [190/0], 03:34:14, hmm
|
Example 1-28: show ip route 192.168.11.12 vrf TENANT77
Phase 2. Host Route BGP Process on Local VTEP
Example 1-29 shows the BGP Loc-RIB concerning the IP address of vmBeef. This same output has been earlier explained in detail in example 1-20.
Leaf-101# sh bgp l2vpn evpn 192.168.11.12
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777 (L2VNI 10000)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 16
Paths: (1 available, best #1)
Flags: (0x000102) on xmit-list, is not in l2rib/evpn
Advertised path-id 1
Path type: local, path is valid, is best path
AS-Path: NONE, path locally originated
192.168.100.101 (metric 0) from 0.0.0.0 (192.168.77.101)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 10000 10077
Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
Path-id 1 advertised to peers:
192.168.77.11
|
Example 1-29: sh bgp l2vpn evpn 192.168.11.12
Phase 3. Host Route BGP Process on Remote VTEP
Example 1-30 shows the L3 import process in remote Leaf-102. The received message is the same MAC/IP routing advertisement where the MAC-IP information was imported into IP VRF in L2RIB and sent to ARP Suppression Cache. The import into L2RIB is based on RT 65000:10000 while importing route into L3RIB of VRF TENANT77 is based on RT 65000:10077.
Leaf-102# sh bgp internal event-history events | i beef
IMP: [L2VPN EVPN] Created import destination entry for 192.168.77.102:3:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144
IMP: [L2VPN EVPN] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <default> RD 192.168.77.102:3
IMP: [L2VPN EVPN] Created import destination entry for 192.168.77.102:32787:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144
IMP: [L2VPN EVPN] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <default> RD 192.168.77.102:3
IMP: [IPv4 Unicast] Importing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 to <TENANT77> RD 192.168.77.102:3
BRIB: [L2VPN EVPN] Installing prefix 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/144 (192.168.77.11) via 192.168.100.101 label 10000 (0x0/0x0) into BRIB with extcomm Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8
Example 1-30: sh bgp internal event-history events | i beef
Example 1-31 explains the BGP Adj-RIB-In and Loc-RIB. The section after the first comment is received NLRI Update in Adj-RIB-In. The section after the second comment is the same update imported through Input Policy Engine and decision process into Loc-RIB. The import is based on the RT 65000:10077. The RD is changed from 192.168.77.101:32777 to 192.168.77.102:3. Example 1-32 shows the VRF Id of VRF TENANT77.
Leaf-102# show bgp l2vpn evpn 192.168.11.12
< Comment-1: BGP Adj-RIB-In >
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 22
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported to 3 destination(s)
AS-Path: NONE, path sourced internal to AS
192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000 10077
Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
Originator: 192.168.77.101 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
<L2VNI snipped for simplicity>
< Comment-2: BGP Loc-RIB >
Route Distinguisher: 192.168.77.102:3 (L3VNI 10077)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272, version 24
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported from 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272
AS-Path: NONE, path sourced internal to AS
192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000 10077
Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
Originator: 192.168.77.101 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
|
Example 1-31: show bgp l2vpn evpn 192.168.11.12
Leaf-102# show vrf TENANT77
VRF-Name VRF-ID State Reason
TENANT77 3 Up --
|
Example 1-32: show vrf TENANT77
Phase 4. Installing Host Route into RIB of Remote VTEP
Example 1-33 shows the VRF TENANT77 RIB entry concerning the host route 192.168.11.12/32
Leaf-102# show ip route 192.168.11.12 vrf TENANT77
IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.11.12/32, ubest/mbest: 1/0
*via 192.168.100.101%default, [200/0], 04:20:01, bgp-65000, internal, tag 65000 (evpn) segid: 10077 tunnelid: 0xc0a86465 encap: VXLAN
|
Example 1-33: show vrf TENANT77
Example 1-34 shows the BGP Recursive Next Hop database information concerning the Next Hop attached to 192.168.11.12
Leaf-102# show nve internal bgp rnh database vni 10077
--------------------------------------------
Total peer-vni msgs recvd from bgp: 10
Peer add requests: 6
Peer update requests: 0
Peer delete requests: 4
Peer add/update requests: 6
Peer add ignored (peer exists): 0
Peer update ignored (invalid opc): 0
Peer delete ignored (invalid opc): 0
Peer add/update ignored (malloc error): 0
Peer add/update ignored (vni not cp): 0
Peer delete ignored (vni not cp): 0
--------------------------------------------
Showing BGP RNH Database, size : 2 vni 10077
Flag codes: 0 - ISSU Done/ISSU N/A 1 - ADD_ISSU_PENDING
2 - DEL_ISSU_PENDING 3 - UPD_ISSU_PENDING
VNI Peer-IP Peer-MAC Tunnel-ID Encap (A/S) Flags
10077 192.168.100.101 5e00.0000.0007 0xc0a86465 vxlan (1/0) 0
|
Example 1-34: show nve internal bgp rnh database vni 10077
Example 1-35 shows the status of the connection to NVE peer 192.168.77.101 (Leaf-101).
Leaf-102# show nve peers detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 192.168.100.101
NVE Interface : nve1
Peer State : Up
Peer Uptime : 04:28:50
Router-Mac : 5e00.0000.0007
Peer First VNI : 10000
Time since Create : 04:28:50
Configured VNIs : 10000,10077,20000,30000
Provision State : peer-add-complete
Learnt CP VNIs : 10000,10077
vni assignment mode : SYMMETRIC
Peer Location : N/A
|
Example 1-35: show nve peers detail
Data Plane operation
Figure 1-13 shows the Data Plane operation when vmBebe in L2VNI 30000 sends ICMP Request to vmBeef in L2VNI 10000.
Phase 1. Switching in VNI30000 on VTEP-102
Because the destination IP address is in a different subnet, vmBebe send an ICMP request message to its default gateway Leaf-102 using Anycast Gateway MAC (AGM) 0001.0001.0001 as a destination MAC address.
Phase 2. Routing from VNI30000 to VNI 10077 on VTEP-102
Local VTEP switch Leaf-102 receives the frame. The destination IP address is learned via BGP and installed into RIB with Next Hop IP address 192.168.100.101 (Leaf-101) and additional information used in Data Plane, such as L3VNI and Encapsulation type. Leaf-102 makes the recursive routing lookup for Next Hop address, encapsulates original packet with VXLAN header with VN Id 10077 (L3VNI), and routes packet towards Leaf-101 via Spine-11 (outer destination MAC belongs to Spine-11). Because VXLAN is a MAC in IP/UDP tunneling mechanism, there has to be the inner source and destination MAC address. The inner source MAC address is taken from the SVI used in Inter-VNI routing, in our case SVI VLAN 77. The inner destination address is RMAC received via BGP Update as BGP Extended Community.
Phase 3. Routing from VNI10077 to VNI 10000 on VTEP-101
When the VTEP switch Leaf-101 receives the VXLAN encapsulated packet, it removes the outer headers used in VXLAN tunneling. Since the VNI 10077 is attached to VRF TENANT77, the routing decision is based on RIB of VRF TENANT77. Leaf-101 routes the original ICMP request to VLAN 10 and switched out of the interface e1/2 with an additional 802.1Q Tag with VLAN Id 10.
This process describes the Symmetric Integrated Route and Bridge (IRB) model where the packet is first switched by the local VTEP, which then routes it over the VXLAN fabric by using common VNI for all VRF routed traffic in VXLAN header. The receiving VTEP switch removes VXLAN encapsulation and makes the routing decision based on the target IP address of the original IP packet. After routing decision, the packet is switched to the destination (bridge-route-route-bridge). The return traffic follows the same model.
Using symmetric IBR gives design flexibility since unlike in Asymmetric IRB, there is no need for adding all VNIs to all VTEP switches. Asymmetric IRB is based on a bridge-route-bridge model where there is no dedicated VNI for Inter-VNI routing. As an example: If we are using Asymmetric IRB in our VXLAN fabric, the vmBebe sends the packet to its default gateway (switched), just like in case of symmetric IRB. Local VTEP switch Leaf-102 makes routing decision but instead of using common VNI, it uses the VNI 10000 in VXLAN header, which is attached to VLAN 20 (Local VLAN for VNI 10000). This is the “routed” part. Receiving VTEP switch Leaf-101 removes the VXLAN header and based on the VLAN 10000 it switches the packet out of VLAN 10 (locally attached to VLAN 10).
Figure 1-13: Inter-VNI routing process.
|
Capture 1-12 is taken from the link between Spine-11 and Leaf-101 while pinging from vmBebe to vmBeef.
Ethernet II, Src: 5e:00:00:04:00:07 (5e:00:00:04:00:07), Dst: 5e:00:00:01:00:07 (5e:00:00:01:00:07)
Internet Protocol Version 4, Src: 192.168.100.102, Dst: 192.168.100.101
User Datagram Protocol, Src Port: 63384, Dst Port: 4789
Virtual eXtensible Local Area Network
Flags: 0x0800, VXLAN Network ID (VNI)
Group Policy ID: 0
VXLAN Network Identifier (VNI): 10077
Reserved: 0
Ethernet II, Src: 5e:00:00:04:00:07 (5e:00:00:04:00:07), Dst: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Internet Protocol Version 4, Src: 192.168.30.30, Dst: 192.168.11.12
Internet Control Message Protocol
|
Capture 1-12 ICMP request captured from the link between the Leaf-101 and Spine-11.
Summary
This section explains how the IP address of hosts are propagated across the VXLAN fabric and how those are installed into L3RIB.
Prefix Advertisement
Prefix advertisement is a simple process but why it is needed if all VTEP switches know MAC addresses and IP addresses of all connected hosts? One reason is, of course, the connectivity with VXLAN Fabric external networks. The other reason is related to the connectivity inside VXLAN Fabric, there might be silent hosts, which does not generate any traffic without request. In some cases, this might lead to a situation where hosts in one L2VNI does not have connectivity with to silent host in other L2VNI.
The first example shows the processes when vmBeef in VNI 10000 connected to Leaf-101 pings the silent host vmBebe in VNI 30000 connected to Leaf-102. In this example, both VTEP switches have VNI 30000. IP prefix redistribution in this example is not needed. Figures 1-14 and 1-15 illustrate the whole process.
Phase 1: vmBeef start pinging to vmBebe
At this stage, vmBeef has resolved the MAC address of its default gateway. It sends the ICMP request towards 192.168.30.30. Since the destination vmBebe is in a different subnet than sender vmBeef, vmBeef sends the ICMP request to the default gateway. There is no response to first ICMP request.
Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.30.30
Internet Control Message Protocol
Type: 8 (Echo (ping) request)
Code: 0
Checksum: 0x574b [correct]
[Checksum Status: Good]
Identifier (BE): 0 (0x0000)
Identifier (LE): 0 (0x0000)
Sequence number (BE): 0 (0x0000)
Sequence number (LE): 0 (0x0000)
[No response seen]
Data (72 bytes
|
Capture 1-13: ICMP request captured from the link between the Leaf-101 vmBeef.
Phase 2: Local VTEP Leaf-101: ARP process
Because VTEP switch Leaf-101 has both VNI 10000 and 30000 configured locally. Even though there is no host route to vmBebe in the RIB, there is a routing entry for the local subnet 192.168.30.0/24 (VLAN 30 attached to VNI 30000) and the packet is routed from VNI 10000 to VNI 30000. After routing, Leaf-101 tries to figure out the MAC-IP binding information and it sends an ARP request to Mcast group used in VNI 30000. Example 1-36 shows the routing table of Leaf-101 and Capture 1-13 shows the ARP request message capture taken from the link between Leaf-101 and Spine-11.
Leaf-101# show ip route vrf TENANT77
IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.11.0/24, ubest/mbest: 1/0, attached
*via 192.168.11.1, Vlan10, [0/0], 01:09:38, direct, tag 77
192.168.11.1/32, ubest/mbest: 1/0, attached
*via 192.168.11.1, Vlan10, [0/0], 01:09:38, local, tag 77
192.168.11.22/32, ubest/mbest: 1/0
*via 192.168.100.102%default, [200/0], 00:45:30, bgp-65000, internal, tag 65
000 (evpn) segid: 10077 tunnelid: 0xc0a86466 encap: VXLAN
192.168.30.0/24, ubest/mbest: 1/0, attached
*via 192.168.30.1, Vlan30, [0/0], 00:02:36, direct
192.168.30.1/32, ubest/mbest: 1/0, attached
*via 192.168.30.1, Vlan30, [0/0], 00:02:36, local
|
Example 1-36: show ip route vrf TENANT77
The ARP process is explained in ARP request/reply section (page 14.). Because this is switched packet inside L2VNI 30000 the source MAC address of the inner Ethernet header is an Anycast Gateway MAC (AGM) address of VLAN 30, which used commonly in every host SVI (not in SVI 77 which is used for routing). By using AGM, hosts do not how to resolve the MAC address of the gateway when moving from one VTEP to another. Destination MAC address is derived from the Mcast Group IP address.
Ethernet II, Src: 5e:00:00:00:00:07 (5e:00:00:00:00:07), Dst: IPv4mcast_0a (01:00:5e:00:00:0a)
Internet Protocol Version 4, Src: 192.168.100.101, Dst: 238.0.0.10
User Datagram Protocol, Src Port: 57522, Dst Port: 4789
Virtual eXtensible Local Area Network
Flags: 0x0800, VXLAN Network ID (VNI)
Group Policy ID: 0
VXLAN Network Identifier (VNI): 30000
Reserved: 0
Ethernet II, Src: EquipTra_01:00:01 (00:01:00:01:00:01), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
Sender MAC address: EquipTra_01:00:01 (00:01:00:01:00:01)
Sender IP address: 192.168.30.1
Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff)
Target IP address: 192.168.30.30
|
Capture 1-14: ICMP request captured from the link between the Leaf-101 and Spine-11.
Phase 3: Remote VTEP Leaf-102: ARP process - Request
The remote VTEP switch Leaf-102 receives the ARP request. Based on the VNI 30000 in VXLAN header it knows that this packet belongs to VLAN 30. It removes the VXLAN encapsulation and forwards the ARP request out of all interfaces participating in VLAN 30. Leaf-102 insert 802.1Q TAG with VLAN id 30 to frame sent it out of interface e1/2.
Ethernet II, Src: EquipTra_01:00:01 (00:01:00:01:00:01), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0001 1110 = ID: 30
Type: ARP (0x0806)
Padding: 0000000000000000000000000000
Trailer: 00000000
Address Resolution Protocol (request)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
Sender MAC address: EquipTra_01:00:01 (00:01:00:01:00:01)
Sender IP address: 192.168.30.1
Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff)
Target IP address: 192.168.30.30
|
Capture 1-15 ARP request send to vmBebe
Phase 4: vmBebe: ARP process - Reply
The ARP request reaches the vmBebe and since the ARP request target IP belongs to it, vmBebe reacts by sending an ARP reply. The source MAC address in received ARP request is AGM, which is also used by Leaf-102. When vmBebe send the ARP reply Unicast message by using MAC 0001.0001.0001 (AGW) as a destination, the message stops to Leaf-102. This means that Leaf-102 never forwards the ARP response message Leaf-101.
Ethernet II, Src: 30:00:00:30:be:be (30:00:00:30:be:be), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0001 1110 = ID: 30
Type: ARP (0x0806)
Padding: 0000000000000000000000000000
Trailer: 00000000
Address Resolution Protocol (reply)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: reply (2)
Sender MAC address: 30:00:00:30:be:be (30:00:00:30:be:be)
Sender IP address: 192.168.30.30
Target MAC address: EquipTra_01:00:01 (00:01:00:01:00:01)
Target IP address: 192.168.30.1
|
Capture 1-16 ARP request send to vmBebe
Phase 5: remote VTEP switch Leaf-102: BGP Update
When the remote VTEP switch Leaf-102 receives the ARP reply, it learns the MAC-IP information of vmBebe from the ARP payload and generates two BGP EVPN route type 2 MAC advertisement route, where the other carries MAC address and the other one MAC-IP address information of vmBebe.
Ethernet II, Src: 5e:00:00:01:00:07 (5e:00:00:01:00:07), Dst: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Internet Protocol Version 4, Src: 192.168.77.11, Dst: 192.168.77.101
Transmission Control Protocol, Src Port: 179, Dst Port: 54583, Seq: 1, Ack: 232, Len: 141
Border Gateway Protocol - UPDATE Message
Marker: ffffffffffffffffffffffffffffffff
Length: 141
Type: UPDATE Message (2)
Withdrawn Routes Length: 0
Total Path Attribute Length: 118
Path attributes
Path Attribute - ORIGIN: IGP
Path Attribute - AS_PATH: empty
Path Attribute - LOCAL_PREF: 100
Path Attribute - EXTENDED_COMMUNITIES
Flags: 0xc0, Optional, Transitive, Complete
Type Code: EXTENDED_COMMUNITIES (16)
Length: 32
Carried extended communities: (4 communities)
Route Target: 65000:10077
Route Target: 65000:30000
Encapsulation: VXLAN Encapsulation
Unknown subtype 0x03: 0x5e00 0x0004 0x0007
Path Attribute - ORIGINATOR_ID: 192.168.77.102
Path Attribute - CLUSTER_LIST: 192.168.77.111
Path Attribute - MP_REACH_NLRI
Type Code: MP_REACH_NLRI (14)
Length: 51
Address family identifier (AFI): Layer-2 VPN (25)
Subsequent address family identifier (SAFI): EVPN (70)
Next hop network address (4 bytes)
Number of Subnetwork points of attachment (SNPA): 0
Network layer reachability information (42 bytes)
EVPN NLRI: MAC Advertisement Route
Route Type: MAC Advertisement Route (2)
Length: 40
Route Distinguisher: 0001c0a84d66801d (192.168.77.102:32797)
ESI: 00 00 00 00 00 00 00 00 00
Ethernet Tag ID: 0
MAC Address Length: 48
MAC Address: 30:00:00:30:be:be (30:00:00:30:be:be)
IP Address Length: 32
IPv4 address: 192.168.30.30
MPLS Label Stack 1: 1875, (BOGUS: Bottom of Stack NOT set!)
MPLS Label Stack 2: 629 (bottom)
|
Capture 1-17 ARP request send to vmBebe
Phase 6: Local VTEP switch Leaf-102: BGP Update
Local VTEP switch Leaf-101 receives the BGP EVPN Updates and installs the routing information into MAC and IP VRF tables in L2RIB of VNI 30000. This is explained in section “MAC/IP address learning process”. Right after the L2RIB updates, Leaf-101 is able to route packet sent by vmBeef to vmBebe.
Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.30.30
Internet Control Message Protocol
Type: 8 (Echo (ping) request)
Code: 0
Checksum: 0x574b [correct]
[Checksum Status: Good]
Identifier (BE): 0 (0x0000)
Identifier (LE): 0 (0x0000)
Sequence number (BE): 0 (0x0000)
Sequence number (LE): 0 (0x0000)
[No response seen]
Data (72 bytes
|
Capture 1-18: ICMP request captured from the link between the Leaf-101 and Spine-11.
Figure 1-14: Silent host discovery process, Phases 1-3
|
Figure 1-15: Silent host discovery process, Phases 4-6
|
What if all VNIs are not implemented in each VTEP switch. In the scenario where the VTEP switch Leaf-101 has only VNI 10000, it does not have any L2/L3 address information about silent host vmBeef, which means that Leaf-101 is not able to switch or route the packet to any hosts in network 192.168.30.0/24. The resolution for this is prefix advertisement in Leaf-102.
At starting point, VTEP switch Leaf-102 redistributes the local network 192.168.30.0/24 to BGP via route-map. The update is sent as BGP EVPN route type 5. Example 1-37 shows the BGP RIB (Both Adj-RIB-In and Loc-RIB) of Leaf-101concerning the NLRI for 192.168.30.0/24. BGP EVPN Route Type 5 update carries only RT 65000:10077 and it is used for importing routes into Loc-RIB from Adj-RIB of VRF TENANT77. Received Label field defines the L3VNI. The original RD carried in NLRI is generated based on BGP RID and VRF Id.
Leaf-101# show bgp l2vpn evpn 192.168.30.0
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:3
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224, version 505
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported to 2 destination(s)
AS-Path: NONE, path sourced internal to AS
192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin incomplete, MED 0, localpref 100, weight 0
Received label 10077
Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0004.0007
Originator: 192.168.77.102 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
Route Distinguisher: 192.168.77.101:3 (L3VNI 10077)
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224, version 506
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported from 192.168.77.102:3:[5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224
AS-Path: NONE, path sourced internal to AS
192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin incomplete, MED 0, localpref 100, weight 0
Received label 10077
Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0004.0007
Originator: 192.168.77.102 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
|
Example 1-37: show bgp l2vpn evpn 192.168.30.0
Capture 1-19 shows the BGP EVPN Prefix Advertisement (route type 5). Note that Extended Community Unknown Subtype 0x03 defines the RMAC.
Ethernet II, Src: 5e:00:00:01:00:07, Dst: 5e:00:00:00:00:07
Internet Protocol Version 4, Src: 192.168.77.11, Dst: 192.168.77.101
Transmission Control Protocol, Src Port: 179, Dst Port: 54583, Seq: 294, Ack: 246, Len: 134
Border Gateway Protocol - UPDATE Message
Marker: ffffffffffffffffffffffffffffffff
Length: 134
Type: UPDATE Message (2)
Withdrawn Routes Length: 0
Total Path Attribute Length: 111
Path attributes
Path Attribute - ORIGIN: INCOMPLETE
Path Attribute - AS_PATH: empty
Path Attribute - MULTI_EXIT_DISC: 0 0
Path Attribute - LOCAL_PREF: 100
Path Attribute - EXTENDED_COMMUNITIES
Flags: 0xc0, Optional, Transitive, Complete
Type Code: EXTENDED_COMMUNITIES (16)
Length: 24
Carried extended communities: (3 communities)
Route Target: 65000:10077
Encapsulation: VXLAN
Unknown subtype 0x03: 0x5e00 0x0004 0x0007
Path Attribute - ORIGINATOR_ID: 192.168.77.102
Path Attribute - CLUSTER_LIST: 192.168.77.111
Path Attribute - MP_REACH_NLRI
Flags: 0x90, Optional, Extended-Length, Non-transitive, Complete
Type Code: MP_REACH_NLRI (14)
Length: 45
Address family identifier (AFI): Layer-2 VPN (25)
Subsequent address family identifier (SAFI): EVPN (70)
Next hop network address (4 bytes)
Number of Subnetwork points of attachment (SNPA): 0
Network layer reachability information (36 bytes)
EVPN NLRI: IP Prefix route
Route Type: IP Prefix route (5)
Length: 34
Route Distinguisher: 192.168.77.102:3
ESI: 00 00 00 00 00 00 00 00 00
Ethernet Tag ID: 0
IP prefix length: 24
IPv4 address: 192.168.30.0
IPv4 Gateway address: 0.0.0.0
MPLS Label Stack: 629 (bottom)
|
Capture 1-19: ICMP request captured from the link between the Leaf-101 and Spine-11.
Leaf-101 verifies the reachability of Next Hop reported in MP_NLRI_REACH. Leaf-101 has an entry for reported NH in its BGP RNH DB and it installs route into RIB from the BGP Loc-RIB (example 1-38). Example 1-34 shows the example of BGP RNH output.
Leaf-101# show ip route 192.168.30.0 vrf TENANT77
IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.30.0/24, ubest/mbest: 1/0
*via 192.168.100.102%default, [200/0], 00:10:27, bgp-65000, internal, tag 65000 (evpn) segid: 10077 tunnelid: 0xc0a86466 encap: VXLAN
|
Example 1-38: show ip route 192.168.30.0 vrf TENANT77
Figure 1-16: BGP EVPN Route type 5 – Prefix advertisement.
|
Data Plane testing
Phase 1: vmBeef start pinging to vmBebe
At this stage, vmBeef has resolved the MAC address of its default gateway. It sends an ICMP request to 192.168.30.30. Since the destination is in a different subnet than vmBeef, it sends the packet to its default gateway.
Ethernet II, Src: Private_10:be:ef (10:00:00:10:be:ef), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.30.30
Internet Control Message Protocol
Type: 8 (Echo (ping) request)
Code: 0
Checksum: 0x574b [correct]
[Checksum Status: Good]
Identifier (BE): 0 (0x0000)
Identifier (LE): 0 (0x0000)
Sequence number (BE): 0 (0x0000)
Sequence number (LE): 0 (0x0000)
[No response seen]
Data (72 bytes
|
Capture 1-19: ICMP request sent by vmBeef: capture from the link vmBeef-Leaf-101.
Phase 2: Local VTEP Leaf-101: Routing
VTEP switch Leaf-101 receives the ICMP packet from vmBeef with the destination IP address 192.168.30.30. In the previous example, Leaf-101 has both VNI 10000 (subnet 192.168.11.0/24) and VNI 30000 (192.168.30.0/24) implemented. That is why Leaf-101 started the address resolution process by sending ARP to Mcast Group specific to VNI 30000. In this scenario, there is no VNI 30000 implemented in Leaf-101. Instead of ARP process, Leaf-101 now routes the packet based on the longest match 192.168.30.0/24 found in its RIB. It routes packet towards the next hop address 192.168.100.102 (Leaf-102). The real next hop is resolved through the recursive route lookup. Leaf-101 encapsulates the ICMP request with VXLAN header with L3VNI Id 10077. Capture 1-20 shows VXLAN encapsulated packet taken from the link between Leaf-101 and Spine-11.
Ethernet II, Src: 5e:00:00:00:00:07 (5e:00:00:00:00:07), Dst: 5e:00:00:01:00:07 (5e:00:00:01:00:07)
Internet Protocol Version 4, Src: 192.168.100.101, Dst: 192.168.100.102
User Datagram Protocol, Src Port: 58173, Dst Port: 4789
Virtual eXtensible Local Area Network
Flags: 0x0800, VXLAN Network ID (VNI)
Group Policy ID: 0
VXLAN Network Identifier (VNI): 10077
Reserved: 0
Ethernet II, Src: 5e:00:00:00:00:07 (5e:00:00:00:00:07), Dst: 5e:00:00:04:00:07 (5e:00:00:04:00:07)
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.30.30
Internet Control Message Protocol
Type: 8 (Echo (ping) request)
Code: 0
Checksum: 0x2861 [correct]
[Checksum Status: Good]
Identifier (BE): 5 (0x0005)
Identifier (LE): 1280 (0x0500)
Sequence number (BE): 0 (0x0000)
Sequence number (LE): 0 (0x0000)
[No response seen]
Data (72 bytes)
|
Capture 1-20: ICMP request captured from the link between the Leaf-101 and Spine-11.
Phase 3-4: Remote VTEP Leaf-102: ARP request
Remote VTEP switch Leaf-102 receives the ICMP request. Based on VNI 10077 in VXLAN header, it knows that this packet belongs to VRF TENANT and has to be routed based on its RIB. It removes the VXLAN header and does routing lookup. The packet is routed based on the longest prefix match 192.168.30.0/24 (local VLAN 30). Because Leaf-102 does not have MAC-IP binding information for IP 192.168.30.30, it proceeds with ARP request that it sent out to VLAN 30 (attached to network 192.168.30.0/24). Capture 1-21 is from trunk link between Leaf-102 and vmBebe.
Ethernet II, Src: EquipTra_01:00:01 (00:01:00:01:00:01), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0001 1110 = ID: 30
Type: ARP (0x0806)
Padding: 0000000000000000000000000000
Trailer: 00000000
Address Resolution Protocol (request)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
Sender MAC address: EquipTra_01:00:01 (00:01:00:01:00:01)
Sender IP address: 192.168.30.1
Target MAC address: Broadcast (ff:ff:ff:ff:ff:ff)
Target IP address: 192.168.30.30
|
Capture 1-21: ARP request captured from the trunk link vmBebe and Leaf-101.
Phase 5: vmBebe: ARP Reply
VmBebe receives the ARP request and responds to it by sending ARP reply message as a unicast to VTEP switch Leaf-102.
Ethernet II, Src: 30:00:00:30:be:be (30:00:00:30:be:be), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
000. .... .... .... = Priority: Best Effort (default) (0)
...0 .... .... .... = DEI: Ineligible
.... 0000 0001 1110 = ID: 30
Type: ARP (0x0806)
Padding: 0000000000000000000000000000
Trailer: 00000000
Address Resolution Protocol (reply)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: reply (2)
Sender MAC address: 30:00:00:30:be:be (30:00:00:30:be:be)
Sender IP address: 192.168.30.30
Target MAC address: EquipTra_01:00:01 (00:01:00:01:00:01)
Target IP address: 192.168.30.1
|
Capture 1-22: ARP reply captured from the link vmBebe and Leaf-101.
Phase 6: Remote VTEP Leaf-102: ICMP Request forwarding
Now Leaf-102 is able to forward the ICMP request to vmBebe
Ethernet II, Src: 5e:00:00:04:00:07 (5e:00:00:04:00:07), Dst: 30:00:00:30:be:be (30:00:00:30:be:be)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
Internet Protocol Version 4, Src: 192.168.11.12, Dst: 192.168.30.30
Internet Control Message Protocol
Type: 8 (Echo (ping) request)
|
Capture 1-23: ICMP request captured from the link between the Leaf-101 and Spine-11.
Phase 7: vmBebe: ICMP reply
VmBebe receives the ICMP Request and sends an ICMP reply back to vmBeef.
Ethernet II, Src: 30:00:00:30:be:be (30:00:00:30:be:be), Dst: EquipTra_01:00:01 (00:01:00:01:00:01)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 30
Internet Protocol Version 4, Src: 192.168.30.30, Dst: 192.168.11.12
Internet Control Message Protocol
Type: 0 (Echo (ping) reply)
|
Capture 1-24: ICMP request captured from the link between the Leaf-101 and Spine-11.
Phase 8-9: Remote VTEP Leaf-102: Routing decision and ICMP reply
The ICMP reply is sent to Leaf-101 by Leaf-102 over VNI 10077.
Ethernet II, Src: 5e:00:00:01:00:07 (5e:00:00:01:00:07), Dst: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Internet Protocol Version 4, Src: 192.168.100.102, Dst: 192.168.100.101
User Datagram Protocol, Src Port: 60112, Dst Port: 4789
Virtual eXtensible Local Area Network
Flags: 0x0800, VXLAN Network ID (VNI)
Group Policy ID: 0
VXLAN Network Identifier (VNI): 10077
Reserved: 0
Ethernet II, Src: 5e:00:00:04:00:07 (5e:00:00:04:00:07), Dst: 5e:00:00:00:00:07 (5e:00:00:00:00:07)
Internet Protocol Version 4, Src: 192.168.30.30, Dst: 192.168.11.12
Internet Control Message Protocol
Type: 0 (Echo (ping) reply)
|
Capture 1-25: ICMP Reply captured from the link between the Leaf-101 and Spine-11.
Phase 10-11: Local VTEP Leaf-101: Routing decision and ICMP reply
VTEP switch Leaf-101 receives the ICMP reply packet. It removes the VXLAN encapsulation. Based on VNI 10077 it knows that packet belongs to VRF TENANT77 and route lookup has to be dome based on VRF TENANT77 RIB. The destination IP address 192.168.11.12 belongs to VLAN 10. Leaf-101 has the MAC-IP binding information for 192.168.11.12, so it switches the packet out of the interface e1/2.
Ethernet II, Src: 5e:00:00:00:00:07 (5e:00:00:00:00:07), Dst: Private_10:be:ef (10:00:00:10:be:ef)
802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 10
Internet Protocol Version 4, Src: 192.168.30.30, Dst: 192.168.11.12
Internet Control Message Protocol
Type: 0 (Echo (ping) reply)
|
Capture 1-25: ICMP request captured from the link between the Leaf-101 and Spine-11.
Figure 1-17: Silent host discovery process, Phases 1-4.
|
Figure 1-18: Silent host discovery process, Phases 5-11.
|
Just like in the previous example where Leaf-101 has both VNIs 10000 and 30000 implemented locally, we are using Symmetric IRB model in this scenario. The packet is switched in local VLAN 10, and then it is routed over the VXLAN Fabric with VNI 10077 (L3VNI). In remote VTEP switch Leaf-102, the packet is first routed based on RIB of VRF TENANT77 and then switched in local VLAN 30.
During the process, Leaf-102 learns the MAC-IP information of vmBebe. This information is advertised to VTEP switch Leaf-101 which in turns install the routing information in its BGP RIB.
Example 1-39 show the BGP entries stored Adj-RIB-In. Entries concerning host route 192.168.30.30/32 and subnet 192.168.30.0/24 with RD 192.168.77.101:3 are routes that are actually imported into BGP Loc-RIB of Leaf-101.
Leaf-101# sh bgp l2vpn evpn
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 56, Local Router ID is 192.168.77.101
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 192.168.77.101:32777 (L2VNI 10000)
*>l[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216
192.168.100.101 100 32768 i
*>i[2]:[0]:[0]:[48]:[1000.0020.abba]:[0]:[0.0.0.0]/216
192.168.100.102 100 0 i
*>l[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272
192.168.100.101 100 32768 i
Route Distinguisher: 192.168.77.102:3
*>i[5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224
192.168.100.102 0 100 0 ?
Route Distinguisher: 192.168.77.102:32787
*>i[2]:[0]:[0]:[48]:[1000.0020.abba]:[0]:[0.0.0.0]/216
192.168.100.102 100 0 i
Route Distinguisher: 192.168.77.102:32797
*>i[2]:[0]:[0]:[48]:[3000.0030.bebe]:[0]:[0.0.0.0]/216
192.168.100.102 100 0 i
*>i[2]:[0]:[0]:[48]:[3000.0030.bebe]:[32]:[192.168.30.30]/272
192.168.100.102 100 0 i
Route Distinguisher: 192.168.77.101:3 (L3VNI 10077)
*>i[2]:[0]:[0]:[48]:[3000.0030.bebe]:[32]:[192.168.30.30]/272
192.168.100.102 100 0 i
*>i[5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224
192.168.100.102 0 100 0 ?0 100 0 ?
|
Example 1-39: sh bgp l2vpn evpn
Example 1-40 shows that host route 192.168.30.30 is installed from the BGP Adj-RIB-In to Loc-RIB based on RT 65000:10077. During the process, the Input Policy engine changes the RD 192.168.77.102:32797 (L2VNI) to 192.168.77.101:3 (3 = VRF Id of VRF TENANT77).
Leaf-101# sh bgp l2vpn evpn 192.168.30.30
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:32797
BGP routing table entry for [2]:[0]:[0]:[48]:[3000.0030.bebe]:[32]:[192.168.30.30]/272, version 65
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported to 2 destination(s)
AS-Path: NONE, path sourced internal to AS
192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin IGP, MED not set, localpref 100, weight 0
Received label 30000 10077
Extcommunity: RT:65000:10077 RT:65000:30000 ENCAP:8 Router MAC:5e00.0004.0007
Originator: 192.168.77.102 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
Route Distinguisher: 192.168.77.101:3 (L3VNI 10077)
BGP routing table entry for [2]:[0]:[0]:[48]:[3000.0030.bebe]:[32]:[192.168.30.30]/272, version 46
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported from 192.168.77.102:32797:[2]:[0]:[0]:[48]:[3000.0030.bebe]:[32]:[192.168.30.30]/272
AS-Path: NONE, path sourced internal to AS
192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin IGP, MED not set, localpref 100, weight 0
Received label 30000 10077
Extcommunity: RT:65000:10077 RT:65000:30000 ENCAP:8 Router MAC:5e00.0004.0007
Originator: 192.168.77.102 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
|
Example 1-40: sh bgp l2vpn evpn 192.168.30.30
Also, the BGP EVPN route type 5 (Prefix Route) is installed from the BGP Adj-RIB-In into Loc-RIB.
Leaf-101# sh bgp l2vpn evpn 192.168.30.0
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:3
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224, version 63
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported to 2 destination(s)
AS-Path: NONE, path sourced internal to AS
192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin incomplete, MED 0, localpref 100, weight 0
Received label 10077
Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0004.0007
Originator: 192.168.77.102 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
Route Distinguisher: 192.168.77.101:3 (L3VNI 10077)
BGP routing table entry for [5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224, version 5
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path
Imported from 192.168.77.102:3:[5]:[0]:[0]:[24]:[192.168.30.0]:[0.0.0.0]/224
AS-Path: NONE, path sourced internal to AS
192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
Origin incomplete, MED 0, localpref 100, weight 0
Received label 10077
Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0004.0007
Originator: 192.168.77.102 Cluster list: 192.168.77.111
Path-id 1 not advertised to any peer
Example 1-41: sh bgp l2vpn evpn 192.168.30.0
Example 1-42 both host route 192.168.30.30/32 and prefix route 192.168.30.0/24 are installed from the BGP Loc-RIB into VRF TENANT77 specific L3RIB.
Leaf-101# show ip route vrf TENANT77
IP Route Table for VRF "TENANT77"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.11.0/24, ubest/mbest: 1/0, attached
*via 192.168.11.1, Vlan10, [0/0], 01:05:03, direct, tag 77
192.168.11.1/32, ubest/mbest: 1/0, attached
*via 192.168.11.1, Vlan10, [0/0], 01:05:03, local, tag 77
192.168.11.12/32, ubest/mbest: 1/0, attached
*via 192.168.11.12, Vlan10, [190/0], 00:17:15, hmm
192.168.30.0/24, ubest/mbest: 1/0
*via 192.168.100.102%default, [200/0], 01:02:54, bgp-65000, internal, tag 65000 (evpn) segid: 10077 tunnelid: 0xc0a86466 encap: VXLAN
192.168.30.30/32, ubest/mbest: 1/0
*via 192.168.100.102%default, [200/0], 00:17:10, bgp-65000, internal, tag 65000 (evpn) segid: 10077 tunnelid: 0xc0a86466 encap: VXLAN
|
Example 1-42: show ip route vrf TENANT77
Summary
This chapter describes the BGP EVPN Control and Data Plane Layer 2 (switching) and Layer 3 (Routing) operation. It also explains the various components used in BGP EVPN VXLAN Fabric (such as L2RIB, MAC table, MAC VRF, IP VRF, L3RIB, ARP table, ARP Suppression Cache, BGP Adj-RIB-IN, Loc-RIB, Adj-RIB-Out) as well as interoperability between the different components.
References
draft-ietf-bess-evpn-inter-subnet-forwarding-05 - Integrated Routing and Bridging in EVPN: https://tools.ietf.org/html/draft-ietf-bess-evpn-inter-subnet-forwarding-05
RFC 7432 - BGP MPLS-Based Ethernet VPN: https://tools.ietf.org/html/rfc7432
Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective: ISBN-10: 1-58714-467-0 – Krattiger Lukas, Shyam Kapadia, and Jansen Davis
hi Toni,
ReplyDeleteplease allow to ask for Prefix Advertisement part, you actually discussed two scenarios,
1.two VTEPs and each VTEP has all VNI
2.two VTEPs and they has different VNI
for the first part the packet goes vlan10-----vlan30---------------vlan30
for the second part the packet goes like this vlan10-----vlan77-------vlan77---vlan30
I remember vlan 77 is created for routing only and does not have an IP.
if our Vxlan network has all VTEPS and each of VTEPs has all VNIs, then, I believe there is no need to configure Vlan 77, am I correct?
All the Best
Michael
Hi Michael,
DeleteIn theory, you do not need a separate ”routing vlan” if all VLANs are implemented in every VTEP. In reality, this is probably not the case because there are also external connections and service segments of which L3 interface are implemented in service/external leaf.
Cheers - Toni
Why do you show the IP-VRF in the L2RIB? The MAC-VRF has both the MAC-only and MAC-IP Type 2 routes.
ReplyDelete
ReplyDeleteThanks for sharing this valuable resource with us. I'm sure it will be a valuable asset for many people.Also, have a look on these CISCO products:
WS-C3650-24TS-L
WS-C3560-24TS-E
WS-C3560CX-8PC-S