Now you can also download my VXLAN book from the Leanpub.com
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)
Does it really matter if the NVE1 interface of a VTEP switch and BGP EVPN use the same Loopback interface IP address as a source or should there be a dedicated Loopback interface for BGP EVPN? In this post, I am trying to give an answer by showing the difference in BGP EVPN convergence process for both of these design options.
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)
Does it really matter if the NVE1 interface of a VTEP switch and BGP EVPN use the same Loopback interface IP address as a source or should there be a dedicated Loopback interface for BGP EVPN? In this post, I am trying to give an answer by showing the difference in BGP EVPN convergence process for both of these design options.
Loopback addressing
Figure 10-1 shows the example topology and the Loopback addresses used therein. The Loopback 0 is used in Inter-Switch links between the Spine and Leaf switches (Unnumbered physical links). The interfaces NVE1 in vPC Peer switches Leaf-102 and Leaf-103 use their Loopback 100 interface primary IP address as a Physical IP (VIP) and the secondary IP address as a Virtual/Anycast IP (VIP). BGP EVPN peering is done by using the Loopback 77 IP addresses. All of these Loopback IP addresses are advertised by OSPF.
vPC domain
Leaf-102 and Leaf-103 are vPC peer switches in vPC domain 23. vPC Peer-Link is established over PortChannel 23 and vPC Peer-Keepalive Link is Layer 3 link between switches. Both Leaf switches have one vPC Member Port belonging to PortChannel 10.
Graceful Insertion and removal (GIR)
GIR is a method, which helps to maintain network availability while doing device-specific software- or hardware maintenance tasks. In the first demonstration, BGP EVPN peering between Spine-11 and Leaf-103 is established between Loopback 77 interfaces.
Now we take the Leaf-103 out of service by using command system mode maintenance (example 10-1).
Leaf-103(config)# system mode maintenance
Following configuration will be applied:
ip pim isolate
router bgp 65000
isolate
router ospf UNDERLAY-NET
isolate
vpc domain 23
shutdown
NOTE: If you have vPC orphan interfaces, please ensure 'vpc orphan-port suspend' is configured under them, before proceeding further
Do you want to continue (yes/no)? [no] yes
Generating before_maintenance snapshot before going into maintenance mode
Starting to apply commands...
Applying : ip pim isolate
Applying : router bgp 65000
Applying : isolate
Applying : router ospf UNDERLAY-NET
Applying : isolate
Applying : vpc domain 23
Applying : shutdown2018 Aug 24 10:31:21 Leaf-103 %$ VDC-1 %$ %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going down, suspending all vPCs on secondary. If vfc is bound to vPC, then only ethernet vlans of that VPC shall be down.
2018 Aug 24 10:31:21 Leaf-103 %$ VDC-1 %$ %VPC-2-VPC_SHUTDOWN: vPC shutdown status is ON
Maintenance mode operation successful.
Leaf-103(maint-mode)(config)# 2018 Aug 24 10:31:25 Leaf-103 %$ VDC-1 %$ %MMODE-2-MODE_CHANGED: System changed to "maintenance" mode.
Leaf-103(maint-mode)(config)# 2018 Aug 24 10:31:51 Leaf-103 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: NVE: send reinit to bring down nve1 - nve
Leaf-103(maint-mode)(config)#
|
Example 10-1: Removing Leaf-103 by using GIR
Now we are going to do some verifications.
As can be seen from the example 10-2, vPC related PortChannels and related physical interfaces are suspended, Loopback 100 interface is disabled and the interface NVE1 is down.
Leaf-103(maint-mode)(config)# sh int statu | i Po10|Po23|Lo0|Lo77|Lo100|nve1
Eth1/4 ** Po23 member - v suspndByV trunk full auto 10g
Eth1/5 ** Po23 member - v suspndByV trunk full auto 10g
Po10 -- suspndByV trunk full auto --
Po23 -- suspndByV trunk full auto --
Lo0 ** RID/Underlay ** connected routed auto auto --
Lo77 ** BGP peering ** connected routed auto auto --
Lo100 ** VTEP/Overlay ** disabled routed auto auto --
nve1 -- down -- auto auto --
|
Example 10-2: Interface state verification (Leaf-103)
OSPF neighbor relations remains UP but PTP link is advertised with metric 65535.
Spine-11# sh ip ospf neighbors
OSPF Process ID UNDERLAY-NET VRF default
Total number of neighbors: 3
Neighbor ID Pri State Up Time Address Interface
Leaf-101 1 FULL/ - 04:26:45 192.168.0.101 Eth1/1
Leaf-102 1 FULL/ - 04:25:45 192.168.0.102 Eth1/2
Leaf-103 1 FULL/ - 04:25:46 192.168.0.103 Eth1/3
Spine-11# show ip ospf database router 192.168.0.103 detail
OSPF Router with ID (192.168.0.11) (Process ID UNDERLAY-NET VRF default)
Router Link States (Area 0.0.0.0)
LS age: 1708
Options: 0x2 (No TOS-capability, No DC)
LS Type: Router Links
Link State ID: 192.168.0.103
Advertising Router: Leaf-103
LS Seq Number: 0x8000000d
Checksum: 0xdd42
Length: 60
Number of links: 3
Link connected to: a Stub Network
(Link ID) Network/Subnet Number: 192.168.0.103
(Link Data) Network Mask: 255.255.255.255
Number of TOS metrics: 0
TOS 0 Metric: 1
Link connected to: a Router (point-to-point)
(Link ID) Neighboring Router ID: 192.168.0.11
(Link Data) Router Interface address: 0.0.0.2
Number of TOS metrics: 0
TOS 0 Metric: 65535
Link connected to: a Stub Network
(Link ID) Network/Subnet Number: 192.168.77.103
(Link Data) Network Mask: 255.255.255.255
Number of TOS metrics: 0
TOS 0 Metric: 1
|
Example 10-3: OSPF reaction to GIR
BGP neighbor peering between Spine-11 and Leaf-103 stays UP but Leaf-103 has withdrawn all routes as we can see from the figure 10-3 (there is zero received prefix from Leaf-103).
Spine-11# sh bgp l2vpn evpn summary
<snipped>
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.77.101 4 65000 372 331 249 0 0 04:27:15 2
192.168.77.102 4 65000 5351 5325 249 0 0 04:26:18 4
192.168.77.103 4 65000 5347 5328 249 0 0 04:26:17 0
|
Example 10-3: BGP reaction to GIR
From the routing perspective, BGP and OSPF peering remains up and they just manipulate the routing updates. So the recovery is simple, OSPF and BGP just generate new routing updates. From the vPC domain perspective, all related interfaces will be brought UP.
Now I am going to do the “Insertion” process by using command no system mode maintenance, which brings Leaf-103 back to service (example 10-4).
Leaf-103(maint-mode)(config)# no system mode maintenance
Following configuration will be applied:
vpc domain 23
no shutdown
router ospf UNDERLAY-NET
no isolate
router bgp 65000
no isolate
no ip pim isolate
Do you want to continue (yes/no)? [no] yes
Starting to apply commands...
Applying : vpc domain 23
Applying : no shutdown2018 Aug 24 11:37:40 Leaf-103 %$ VDC-1 %$ %VPC-2-VPC_SHUTDOWN: vPC shutdown status is OFF
Applying : router ospf UNDERLAY-NET
Applying : no isolate
Applying : router bgp 65000
Applying : no isolate
Applying : no ip pim isolate
Maintenance mode operation successful.
The after_maintenance snapshot will be generated in 120 seconds
After that time, please use 'show snapshots compare before_maintenance after_maintenance' to check the health of the system
Leaf-103(config)# 2018 Aug 24 11:37:54 Leaf-103 %$ VDC-1 %$ %MMODE-2-MODE_CHANGED: System changed to "normal" mode.
|
Example 10-4: Bringing Leaf-103 back to service.
Example-1 summary: BGP EVPN peering with dedicated Loopback addresses
The main point of the previous example is to show that BGP peering remains UP while removing Leaf-103 from service by using GIR. So there is no need for first bringing up the BGP peering before exchanging routing updates, which speeds up the recovery process.
Now I am going to change the BGP EVPN peering. Instead of using dedicated Loopback Interface for BGP, I am going to use the same Loopback Interface that is used by NVE1 interface Loopback 100 (Figure 10-3).
Example 10-5 shows the configuration of Leaf-103 related to BGP. Now we are using Loopback 100 instead of Loopback 77.
router bgp 65000
router-id 192.168.77.103
timers bgp 3 9
address-family ipv4 unicast
address-family l2vpn evpn
advertise-pip
neighbor 192.168.77.11
remote-as 65000
description ** Spine-11 BGP-RR **
update-source loopback100
address-family l2vpn evpn
send-community extended
|
Example 10-5: BGP peering using Loopback 100.
In Spine-11, the BGP peering is changed towards 192.168.100.103 (Loopback 100 in Leaf-103).
router bgp 65000
router-id 192.168.77.111
address-family ipv4 unicast
address-family l2vpn evpn
<snipped>
neighbor 192.168.100.103
remote-as 65000
update-source loopback77
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client
|
Example 10-6: configuring BGP peering using Loopback 100.
As can be seen from output taken from Spine-11 (in example 10-7) peering is now up and there are five routes received from Leaf-103.
Spine-11# sh bgp l2 evpn summ
<snipped>
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.77.101 4 65000 449 398 318 0 0 05:25:15 2
192.168.77.102 4 65000 6514 6480 318 0 0 05:24:17 4
192.168.77.103 4 65000 6318 6304 0 0 0 00:09:54 Idle
192.168.100.103 4 65000 166 147 318 0 0 00:00:13 5
|
Example 10-7: BGP peering using Loopback 100 - verification.
Now we repeat the GIR process in Leaf-103 and check if there are any major changes in the process.
At the end of the output, we can see that the interface NVE1 is brought down.
Leaf-103(config)# system mode maintenance
Following configuration will be applied:
ip pim isolate
router bgp 65000
isolate
router ospf UNDERLAY-NET
isolate
vpc domain 23
shutdown
NOTE: If you have vPC orphan interfaces, please ensure 'vpc orphan-port suspend' is configured under them, before proceeding further
Do you want to continue (yes/no)? [no] yes
Generating before_maintenance snapshot before going into maintenance mode
Starting to apply commands...
Applying : ip pim isolate
Applying : router bgp 65000
Applying : isolate
Applying : router ospf UNDERLAY-NET
Applying : isolate
Applying : vpc domain 23
Applying : shutdown2018 Aug 24 12:15:46 Leaf-103 %$ VDC-1 %$ %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going down, suspending all vPCs on secondary. If vfc is bound to vPC, then only ethernet vlans of that VPC shall be down.
2018 Aug 24 12:15:46 Leaf-103 %$ VDC-1 %$ %VPC-2-VPC_SHUTDOWN: vPC shutdown status is ON
Maintenance mode operation successful.
Leaf-103(maint-mode)(config)# 2018 Aug 24 12:15:50 Leaf-103 %$ VDC-1 %$ %MMODE-2-MODE_CHANGED: System changed to "maintenance" mode.
2018 Aug 24 12:16:16 Leaf-103 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: NVE: send reinit to bring down nve1 - nve
|
Example 10-8: GIR in Leaf-103.
And the Loopback interface 100 is disabled.
Leaf-103(maint-mode)(config)# sh int statu | i Lo100
Lo100 ** VTEP/Overlay ** disabled routed auto auto --
|
Example 10-9: GIR in Leaf-103.
This cause the BGP neighbor state goes to the Idle state, which means that the BGP neighbor relation between Spine-11 and Leaf-103 is down.
Spine-11# sh bgp l2 evpn summ | i 192.168.100.103
192.168.100.103 4 65000 262 250 0 0 0 00:05:30 Idle
|
Example 10-10: BGP peering with the Leaf-103 change to IDLE.
Now the BGP recovery process has to go through the BGP neighbor negotiation process first and it increases the recovery time. The complexity of BGP neighbor negotiation process is shown in figure 10-4 by using BGP-FSM.
The BGP-FSM is explained in my post “Border Gateway Protocol – Finite State Machine (BGP-FSM)” published in July 2017.
|
Figure 10-4: BGP-FSM
Example-2 summary: BGP EVPN peering and NVE1 using the same Loopback interface.
The answer to the question presented at the beginning of the post:
“Does it really matter if the NVE1 interface of a VTEP switch and BGP EVPN use the same Loopback interface IP address as a source or should there be a dedicated Loopback interface for BGP EVPN?”
And the answer is: YES, by using a dedicated Loopback interface, the BGP peering remains up during the GIR process and speed up the recovery process.
One IMPORTANT thing related to Loopback Interface selection! When the router boots up, it will enable Loopback Interfaces in numerical order starting from Loopback 0. If we get back to our example lab, we can see that there is one thing, which should have been done slightly different if we want to tune the convergence. To be able to speed up the BGP recovery process, the Loopback Interface number used by NVE1 should be smaller than Loopback interface number used by BGP peering. This is because of the NVE1 IP address is used as a next-hop-address in BGP EVPN Update messages sent by VTEP switches and BGP is not able to advertise routes until the next-hop (meaning the NVE1 source Loopback Interface) of the route is reachable.
Figure 10-5: Loopback interface “enabling” order during device boot
One last thing about Loopback addresses and their roles in VXLAN BGP EVPN Fabric is that the Loopback address used as a BGP RID is also used as a part for Route Distinguisher (RD) in BGP EVPN Updates (The process of is explained in my post “VXLAN Part VII: VXLAN BGP EVPN –Control Plane operation” posted on May 2018).
Conclusion
Even though the impact of the Loopback Interface numbering and usage to convergence time in VXLAN BGP EVPN fabric is a minor, the relationship between them is good to understand.
---------------------------------------------------------
Author: Toni Pasanen CCIE#28158
Published: 24-August 2018
Edited: August 25-August 2018 | Toni Pasanen
---------------------------------------------------------
References:
Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective
ISBN-10: 1-58714-467-0 – Krattiger Lukas, Shyam Kapadia, and Jansen Davis
Nexus 9000/3000 Graceful Insertion and Removal (GIR): White Paper – SEP 2016: https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-737899.html
This is excellent information. It is amazing and wonderful to visit your site.
ReplyDeleteBest Mobile Network Signal Repeater
Nice to hear that you like the content. And especially thanks for your very kind words!
DeleteIn Figure 10-1 Leaf-103 has ip address 192.168.0.103 on lo0 interface and Leaf-103 has the same ip address 192.168.0.103 on lo0 interface. I think you wanted to write 192.168.0.102 on lo0 for Leaf-102?
ReplyDeleteOne more typo:
Delete'As can be seen from output taken from Spine-11 (in example 10-7) peering is no up and there are five routes received from Leaf-103.'
no up - seems like should be 'now up'
Hi DukeN3D, Thanks for pointing out those typos. I am blind for my own typos :).
DeleteIt's Ok, have the same sins on my own and always ask for peer review of my config drafts.
DeleteCan you please share the link to these nice stencils, looks like they are for Visio?
I have done icons with MS PowerPoint. I can send them via Linkedin messaging tool.
DeleteCisco recommendation is to use a single loopback for RIDs, IGP, BGP EVPN and IP unnumbered if used. Additional loopbacks for VTEP, RP and Multi-site VIP.
ReplyDeleteI just want to say that your blog is packed full of wonderful information. It's a great help.
ReplyDeleteThanks Matt!
DeleteI have cleared multiple dought after reading this blog. Great job
ReplyDelete