Friday, 24 August 2018

VXLAN Part X: Recovery issue when BGP EVPN peering uses the same loopback interface as a source than VXLAN NVE1 interface

Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

Does it really matter if the NVE1 interface of a VTEP switch and BGP EVPN use the same Loopback interface IP address as a source or should there be a dedicated Loopback interface for BGP EVPN? In this post, I am trying to give an answer by showing the difference in BGP EVPN convergence process for both of these design options.

  Figure 10-1: VXLAN BGP EVPN Example Topology and IP addressing


Loopback addressing

Figure 10-1 shows the example topology and the Loopback addresses used therein. The Loopback 0 is used in Inter-Switch links between the Spine and Leaf switches (Unnumbered physical links). The interfaces NVE1 in vPC Peer switches Leaf-102 and Leaf-103 use their Loopback 100 interface primary IP address as a Physical IP (VIP) and the secondary IP address as a Virtual/Anycast IP (VIP). BGP EVPN peering is done by using the Loopback 77 IP addresses. All of these Loopback IP addresses are advertised by OSPF.


vPC domain

Leaf-102 and Leaf-103 are vPC peer switches in vPC domain 23. vPC Peer-Link is established over PortChannel 23 and vPC Peer-Keepalive Link is Layer 3 link between switches. Both Leaf switches have one vPC Member Port belonging to PortChannel 10.

Graceful Insertion and removal (GIR)

GIR is a method, which helps to maintain network availability while doing device-specific software- or hardware maintenance tasks. In the first demonstration, BGP EVPN peering between Spine-11 and Leaf-103 is established between Loopback 77 interfaces.
Now we take the Leaf-103 out of service by using command system mode maintenance (example 10-1).

Leaf-103(config)# system mode maintenance

Following configuration will be applied:

ip pim isolate
router bgp 65000
  isolate
router ospf UNDERLAY-NET
  isolate
vpc domain 23
  shutdown

NOTE: If you have vPC orphan interfaces, please ensure 'vpc orphan-port suspend' is configured under them, before proceeding further
Do you want to continue (yes/no)? [no] yes

Generating before_maintenance snapshot before going into maintenance mode

Starting to apply commands...

Applying : ip pim isolate
Applying : router bgp 65000
Applying :   isolate
Applying : router ospf UNDERLAY-NET
Applying :   isolate
Applying : vpc domain 23
Applying :   shutdown2018 Aug 24 10:31:21 Leaf-103 %$ VDC-1 %$ %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going down, suspending all vPCs on secondary. If vfc is bound to vPC, then only ethernet vlans of that VPC shall be down.
2018 Aug 24 10:31:21 Leaf-103 %$ VDC-1 %$ %VPC-2-VPC_SHUTDOWN: vPC shutdown status is ON


Maintenance mode operation successful.
Leaf-103(maint-mode)(config)# 2018 Aug 24 10:31:25 Leaf-103 %$ VDC-1 %$ %MMODE-2-MODE_CHANGED: System changed to "maintenance" mode.

Leaf-103(maint-mode)(config)# 2018 Aug 24 10:31:51 Leaf-103 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: NVE: send reinit to bring down nve1 - nve

Leaf-103(maint-mode)(config)#
Example 10-1: Removing Leaf-103 by using GIR

Figure 10-2 shows the reaction of Leaf-103.



 Figure 10-2: Graceful Insertion and Remove (GIR) in Leaf-103.

Now we are going to do some verifications.

As can be seen from the example 10-2, vPC related PortChannels and related physical interfaces are suspended, Loopback 100 interface is disabled and the interface NVE1 is down.

Leaf-103(maint-mode)(config)# sh int statu | i Po10|Po23|Lo0|Lo77|Lo100|nve1
Eth1/4        ** Po23 member - v suspndByV trunk     full    auto    10g       
Eth1/5        ** Po23 member - v suspndByV trunk     full    auto    10g       
Po10          --                 suspndByV trunk     full    auto    --        
Po23          --                 suspndByV trunk     full    auto    --        
Lo0           ** RID/Underlay ** connected routed    auto    auto    --        
Lo77          ** BGP peering **  connected routed    auto    auto    --        
Lo100         ** VTEP/Overlay ** disabled  routed    auto    auto    --        
nve1          --                 down      --        auto    auto    --      
Example 10-2: Interface state verification (Leaf-103)

OSPF neighbor relations remains UP but PTP link is advertised with metric 65535.

Spine-11# sh ip ospf neighbors
 OSPF Process ID UNDERLAY-NET VRF default
 Total number of neighbors: 3
 Neighbor ID     Pri State            Up Time  Address         Interface
 Leaf-101          1 FULL/ -          04:26:45 192.168.0.101   Eth1/1
 Leaf-102          1 FULL/ -          04:25:45 192.168.0.102   Eth1/2
 Leaf-103          1 FULL/ -          04:25:46 192.168.0.103   Eth1/3


Spine-11# show ip ospf database router 192.168.0.103 detail
        OSPF Router with ID (192.168.0.11) (Process ID UNDERLAY-NET VRF default)

                Router Link States (Area 0.0.0.0)

   LS age: 1708
   Options: 0x2 (No TOS-capability, No DC)
   LS Type: Router Links
   Link State ID: 192.168.0.103
   Advertising Router: Leaf-103
   LS Seq Number: 0x8000000d
   Checksum: 0xdd42
   Length: 60
    Number of links: 3

     Link connected to: a Stub Network
      (Link ID) Network/Subnet Number: 192.168.0.103
      (Link Data) Network Mask: 255.255.255.255
       Number of TOS metrics: 0
         TOS   0 Metric: 1

     Link connected to: a Router (point-to-point)
     (Link ID) Neighboring Router ID: 192.168.0.11
     (Link Data) Router Interface address: 0.0.0.2
       Number of TOS metrics: 0
         TOS   0 Metric: 65535

     Link connected to: a Stub Network
      (Link ID) Network/Subnet Number: 192.168.77.103
      (Link Data) Network Mask: 255.255.255.255
       Number of TOS metrics: 0
         TOS   0 Metric: 1
Example 10-3: OSPF reaction to GIR

BGP neighbor peering between Spine-11 and Leaf-103 stays UP but Leaf-103 has withdrawn all routes as we can see from the figure 10-3 (there is zero received prefix from Leaf-103).

Spine-11# sh bgp l2vpn evpn summary
<snipped>
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.77.101  4 65000     372     331      249    0    0 04:27:15 2        
192.168.77.102  4 65000    5351    5325      249    0    0 04:26:18 4        
192.168.77.103  4 65000    5347    5328      249    0    0 04:26:17 0        
Example 10-3: BGP reaction to GIR

From the routing perspective, BGP and OSPF peering remains up and they just manipulate the routing updates. So the recovery is simple, OSPF and BGP just generate new routing updates. From the vPC domain perspective, all related interfaces will be brought UP.

Now I am going to do the “Insertion” process by using command no system mode maintenance, which brings Leaf-103 back to service (example 10-4).

Leaf-103(maint-mode)(config)# no system mode maintenance

Following configuration will be applied:

vpc domain 23
  no shutdown
router ospf UNDERLAY-NET
  no isolate
router bgp 65000
  no isolate
no ip pim isolate

Do you want to continue (yes/no)? [no] yes

Starting to apply commands...

Applying : vpc domain 23
Applying :   no shutdown2018 Aug 24 11:37:40 Leaf-103 %$ VDC-1 %$ %VPC-2-VPC_SHUTDOWN: vPC shutdown status is OFF

Applying : router ospf UNDERLAY-NET
Applying :   no isolate
Applying : router bgp 65000
Applying :   no isolate
Applying : no ip pim isolate

Maintenance mode operation successful.

The after_maintenance snapshot will be generated in 120 seconds
After that time, please use 'show snapshots compare before_maintenance after_maintenance' to check the health of the system
Leaf-103(config)# 2018 Aug 24 11:37:54 Leaf-103 %$ VDC-1 %$ %MMODE-2-MODE_CHANGED: System changed to "normal" mode.

Example 10-4: Bringing Leaf-103 back to service.

Example-1 summary: BGP EVPN peering with dedicated Loopback addresses

The main point of the previous example is to show that BGP peering remains UP while removing Leaf-103 from service by using GIR. So there is no need for first bringing up the BGP peering before exchanging routing updates, which speeds up the recovery process.

Now I am going to change the BGP EVPN peering. Instead of using dedicated Loopback Interface for BGP, I am going to use the same Loopback Interface that is used by NVE1 interface Loopback 100 (Figure 10-3).


Figure 10-3: BGP EVPN peering and NVE1 interface are using same Loopback Interface

Example 10-5 shows the configuration of Leaf-103 related to BGP. Now we are using Loopback 100 instead of Loopback 77.

router bgp 65000
  router-id 192.168.77.103
  timers bgp 3 9
  address-family ipv4 unicast
  address-family l2vpn evpn
    advertise-pip
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback100
    address-family l2vpn evpn
      send-community extended
Example 10-5: BGP peering using Loopback 100.

In Spine-11, the BGP peering is changed towards 192.168.100.103 (Loopback 100 in Leaf-103).

router bgp 65000
  router-id 192.168.77.111
  address-family ipv4 unicast
  address-family l2vpn evpn
  <snipped>
  neighbor 192.168.100.103
    remote-as 65000
    update-source loopback77
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
Example 10-6: configuring BGP peering using Loopback 100.

As can be seen from output taken from Spine-11 (in example 10-7) peering is now up and there are five routes received from Leaf-103.

Spine-11# sh bgp l2 evpn summ
<snipped>
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.77.101  4 65000     449     398      318    0    0 05:25:15 2        
192.168.77.102  4 65000    6514    6480      318    0    0 05:24:17 4        
192.168.77.103  4 65000    6318    6304        0    0    0 00:09:54 Idle    
192.168.100.103 4 65000     166     147      318    0    0 00:00:13 5       
Example 10-7: BGP peering using Loopback 100 - verification.

Now we repeat the GIR process in Leaf-103 and check if there are any major changes in the process.

At the end of the output, we can see that the interface NVE1 is brought down.

Leaf-103(config)# system mode maintenance

Following configuration will be applied:

ip pim isolate
router bgp 65000
  isolate
router ospf UNDERLAY-NET
  isolate
vpc domain 23
  shutdown

NOTE: If you have vPC orphan interfaces, please ensure 'vpc orphan-port suspend' is configured under them, before proceeding further
Do you want to continue (yes/no)? [no] yes

Generating before_maintenance snapshot before going into maintenance mode

Starting to apply commands...

Applying : ip pim isolate
Applying : router bgp 65000
Applying :   isolate
Applying : router ospf UNDERLAY-NET
Applying :   isolate
Applying : vpc domain 23
Applying :   shutdown2018 Aug 24 12:15:46 Leaf-103 %$ VDC-1 %$ %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going down, suspending all vPCs on secondary. If vfc is bound to vPC, then only ethernet vlans of that VPC shall be down.
2018 Aug 24 12:15:46 Leaf-103 %$ VDC-1 %$ %VPC-2-VPC_SHUTDOWN: vPC shutdown status is ON


Maintenance mode operation successful.
Leaf-103(maint-mode)(config)# 2018 Aug 24 12:15:50 Leaf-103 %$ VDC-1 %$ %MMODE-2-MODE_CHANGED: System changed to "maintenance" mode.
2018 Aug 24 12:16:16 Leaf-103 %$ VDC-1 %$ %USER-2-SYSTEM_MSG: NVE: send reinit to bring down nve1 - nve
Example 10-8: GIR in Leaf-103.

And the Loopback interface 100 is disabled.

Leaf-103(maint-mode)(config)# sh int statu | i Lo100
Lo100         ** VTEP/Overlay ** disabled  routed    auto    auto    --     
Example 10-9: GIR in Leaf-103.

This cause the BGP neighbor state goes to the Idle state, which means that the BGP neighbor relation between Spine-11 and Leaf-103 is down.

Spine-11# sh bgp l2 evpn summ | i 192.168.100.103
192.168.100.103 4 65000     262     250        0    0    0 00:05:30 Idle  
Example 10-10: BGP peering with the Leaf-103 change to IDLE.

Now the BGP recovery process has to go through the BGP neighbor negotiation process first and it increases the recovery time. The complexity of BGP neighbor negotiation process is shown in figure 10-4 by using BGP-FSM.

The BGP-FSM is explained in my post “Border Gateway Protocol – Finite State Machine (BGP-FSM)” published in July 2017.

Figure 10-4: BGP-FSM

Example-2 summary: BGP EVPN peering and NVE1 using the same Loopback interface.

The answer to the question presented at the beginning of the post:

Does it really matter if the NVE1 interface of a VTEP switch and BGP EVPN use the same Loopback interface IP address as a source or should there be a dedicated Loopback interface for BGP EVPN?”

And the answer is: YES, by using a dedicated Loopback interface, the BGP peering remains up during the GIR process and speed up the recovery process.

One IMPORTANT thing related to Loopback Interface selection! When the router boots up, it will enable Loopback Interfaces in numerical order starting from Loopback 0. If we get back to our example lab, we can see that there is one thing, which should have been done slightly different if we want to tune the convergence. To be able to speed up the BGP recovery process, the Loopback Interface number used by NVE1 should be smaller than Loopback interface number used by BGP peering. This is because of the NVE1 IP address is used as a next-hop-address in BGP EVPN Update messages sent by VTEP switches and BGP is not able to advertise routes until the next-hop (meaning the NVE1 source Loopback Interface) of the route is reachable.

Figure 10-5: Loopback interface “enabling” order during device boot

One last thing about Loopback addresses and their roles in VXLAN BGP EVPN Fabric is that the Loopback address used as a BGP RID is also used as a part for Route Distinguisher (RD) in BGP EVPN Updates (The process of is explained in my post “VXLAN Part VII: VXLAN BGP EVPN –Control Plane operation” posted on May 2018).

Conclusion

Even though the impact of the Loopback Interface numbering and usage to convergence time in VXLAN BGP EVPN fabric is a minor, the relationship between them is good to understand.


---------------------------------------------------------
Author: Toni Pasanen CCIE#28158
Published: 24-August 2018
Edited: August 25-August 2018 | Toni Pasanen

---------------------------------------------------------
References:

Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective
ISBN-10: 1-58714-467-0 – Krattiger Lukas, Shyam Kapadia, and Jansen Davis

Nexus 9000/3000 Graceful Insertion and Removal (GIR): White Paper – SEP 2016: https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-737899.html

11 comments:

  1. This is excellent information. It is amazing and wonderful to visit your site.
    Best Mobile Network Signal Repeater

    ReplyDelete
    Replies
    1. Nice to hear that you like the content. And especially thanks for your very kind words!

      Delete
  2. In Figure 10-1 Leaf-103 has ip address 192.168.0.103 on lo0 interface and Leaf-103 has the same ip address 192.168.0.103 on lo0 interface. I think you wanted to write 192.168.0.102 on lo0 for Leaf-102?

    ReplyDelete
    Replies
    1. One more typo:
      'As can be seen from output taken from Spine-11 (in example 10-7) peering is no up and there are five routes received from Leaf-103.'
      no up - seems like should be 'now up'

      Delete
    2. Hi DukeN3D, Thanks for pointing out those typos. I am blind for my own typos :).

      Delete
    3. It's Ok, have the same sins on my own and always ask for peer review of my config drafts.
      Can you please share the link to these nice stencils, looks like they are for Visio?

      Delete
    4. I have done icons with MS PowerPoint. I can send them via Linkedin messaging tool.

      Delete
  3. Cisco recommendation is to use a single loopback for RIDs, IGP, BGP EVPN and IP unnumbered if used. Additional loopbacks for VTEP, RP and Multi-site VIP.

    ReplyDelete
  4. I just want to say that your blog is packed full of wonderful information. It's a great help.

    ReplyDelete
  5. I have cleared multiple dought after reading this blog. Great job

    ReplyDelete