Sunday, 19 August 2018

VXLAN Part IX: VXLAN BGP EVPN - vPC

This post describes how the Multi-Chassis Link Aggregation Group (MC-LAG) technology using virtual PortChannel (vPC) works in a VXLAN BGP EVPN fabric. I will first go through the vPC configuration with a short explanation and then I’ll show the Control- and Data Plane operation from VXLAN BGP EVPN perspective by using various show commands and packet capture. I am also going to explain the “Advertising VIP/PIP” options using the external connection. Example topology is shown in Figure 9-1. Complete configurations of vPC peer switches Leaf-102 and Leaf-103 (Leaf-101 and Spine-11 configuration are the same than in the previous post) can be found from the Appendix 1 at the end of the post.




Figure 9-1: VXLAN BGP EVPN vPC Example Topology and IP addressing


Virtual Port Channel

I’ll start by doing the vPC configurations. We have two vPC VTEP switches Leaf-102 and Leaf-103. Inter-switch links, vPC related link terms and IP addressing and PortChannel numbering can be seen from the figure 9-2.

Figure 9-2: vPC domain

Step 1: Enable vPC and LACP features on both vPC VTEP switches.

feature vpc
feature lacp
Example 9-1: vPC and LACP features

Step 2: Configure Peer-Keepalive link.

VPC Peer-Keepalive link is used as a heartbeat link between the vPC peers to make sure that both vPC peers are alive. I am using dedicated VRF “VPC-Peer-Keepalive” for vPC Peer-Keepalive link. Example 9-2 shows the configuration of vCP VTEP Leaf-102.

vrf context VPC-Peer-Keepalive
!
interface Ethernet1/6
  no switchport
  vrf member VPC-Peer-Keepalive
  ip address 10.102.103.102/24
  no shutdown
!
vpc domain 23
  peer-keepalive destination 10.102.103.103 source 10.102.103.102 vrf VPC-Peer-Keepalive
Example 9-2: vPC Peer-Keepalive (Leaf-102)

Step 2.1: Verify Peer-Keepalive link operation

Leaf-102# show vpc peer-keepalive

vPC keep-alive status             : peer is alive                
--Peer is alive for             : (685) seconds, (480) msec
--Send status                   : Success
--Last send at                  : 2018.08.11 09:38:44 791 ms
--Sent on interface             : Eth1/6
--Receive status                : Success
--Last receive at               : 2018.08.11 09:38:45 314 ms
--Received on interface         : Eth1/6
--Last update from peer         : (0) seconds, (293) msec

vPC Keep-alive parameters
--Destination                   : 10.102.103.103
--Keepalive interval            : 1000 msec
--Keepalive timeout             : 5 seconds
--Keepalive hold timeout        : 3 seconds
--Keepalive vrf                 : VPC-Peer-Keepalive
--Keepalive udp port            : 3200
--Keepalive tos                 : 192
Example 9-3: vPC Peer-Keepalive (Leaf-102) status check

Note! We create vPC domain 23 in step 2. VPC peer switches will automatically create a unique vPC system MAC address. The vPC system MAC address has a fixed part = 0023.04ee.be.xx and the two last digits (xx) are taken from vPC domain ID. Our example vPC domain has ID 23, which HEX format is 17. So the vPC system address in our example will be 0023.04ee.be17. This can be verified from both switches. As can be seen from the examples 9-4 and 9-5 there are also vPC local system-mac. The vPC system MAC is common for both vPC peer switches and it is represented when the vPC system, formed by two vPC peer switches, represents itself as a unit. The vPC local system-mac is unique per vPC peer switch and it is used when switch presents itself as an individual switch, not as a vPC system. This is the case with Orphan ports for example.


Leaf-102# sh vpc role

vPC Role status
----------------------------------------------------
vPC role                        : primary                      
Dual Active Detection Status    : 0
vPC system-mac                  : 00:23:04:ee:be:17            
vPC system-priority             : 32667
vPC local system-mac            : 5e:00:00:01:00:07            
vPC local role-priority         : 32667
vPC local config role-priority  : 32667
vPC peer system-mac             : 5e:00:00:06:00:07            
vPC peer role-priority          : 32667
vPC peer config role-priority   : 32667
Example 9-4: vPC system MAC Leaf-102

Leaf-103# sh vpc role

vPC Role status
----------------------------------------------------
vPC role                        : secondary                    
Dual Active Detection Status    : 0
vPC system-mac                  : 00:23:04:ee:be:17            
vPC system-priority             : 32667
vPC local system-mac            : 5e:00:00:06:00:07            
vPC local role-priority         : 32667
vPC local config role-priority  : 32667
vPC peer system-mac             : 5e:00:00:01:00:07            
vPC peer role-priority          : 32667
vPC peer config role-priority   : 32667
Example 9-5: vPC system MAC Leaf-103

Step 3: Create vPC Peer-Link

VPC Peer-Link is an 802.1Q trunk link that carries vPC and non-vPC VLANs, Cisco Fabric Service Messages (consistency check, MAC address synchronization, advertisement of vPC member port status, STP management and synchronization of HSRP and IGMP snooping), flooded traffic from the peer vPC, STP BPDUs, HSRP Hello messages and IGMP updates. In our example, we create Port-Channel 23 for the vPC peer-link. We are going to use LACP as a channel protocol.


interface port-channel23
  switchport mode trunk
  spanning-tree port type network
  vpc peer-link
!
interface Ethernet1/4
  description ** Po23 member - vPC PEER-link **
  switchport mode trunk
  channel-group 23 mode active
!
interface Ethernet1/5
  description ** Po23 member - vPC PEER-link **
  switchport mode trunk
  channel-group 23 mode active
Example 9-6: vPC Peer-Link on switch Leaf-102

Note! If vPC peer link goes down while vPC Peer-Keepalive link is still up, the secondary switch suspends its vPC member port and shuts down the SVI associated to the vPC VLAN. Once this failure happens, Orphan ports connected to Secondary switch will be isolated. That is the reason for the recommendation to connect Orphan hosts to Primary switch.

Step 4: Configure vPC Member Ports

From the access device perspective, Ethernet switch in our example, the uplink port is a classical Ether-Channel while from the vPC VTEP point of view the link to access device is attached to a vPC Member Port. It is recommended to use the Link Aggregation Control Protocol (LACP) as a channel protocol because it is a standard protocol and it has built-in misconfiguration protection and fast failure detection mechanism.

Note! I am using Cisco VIRL where an access device OS is vios_l2 with Experimental Version 15.2(20170321:233949). I did not manage to bring up the Uplink Port-Channel in vios_l2 switch. While trying to form a channel, the switch generates the syslog message “%EC-5-L3DONTBNDL2: Gi0/2 suspended: LACP currently not enabled on the remote port.” This message might be related to bug documented in CSCva22545 (https://bst.cloudapps.cisco.com/bugsearch/bug/CSCva22545) where there are two affected releases and the other one is 15.2(3.7.4)PIH19. That is why I am using manual mode on both switches.

interface port-channel10
  switchport mode trunk
  vpc 10
!
interface Ethernet1/3
  description ** Link to Ethernet SW **
  switchport mode trunk
  channel-group 10
Example 9-7: vPC Member Port on Leaf-102 and Leaf-103

Step 5: Verification of vPC operational status.

Leaf-102# sh vpc
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 23 
Peer status                       : peer adjacency formed ok     
vPC keep-alive status             : peer is alive                
Configuration consistency status  : success
Per-vlan consistency status       : success                      
Type-2 consistency status         : success
vPC role                          : primary                      
Number of vPCs configured         : 1  
Peer Gateway                      : Enabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Disabled
Delay-restore status              : Timer is off.(timeout = 30s)
Delay-restore SVI status          : Timer is off.(timeout = 10s)
Operational Layer3 Peer-router    : Disabled

vPC Peer-link status
---------------------------------------------------------------------
id    Port   Status Active vlans   
--    ----   ------ -------------------------------------------------
1     Po23   up     1,10,20,77                                                  
        

vPC status
----------------------------------------------------------------------------
Id    Port          Status Consistency Reason                Active vlans
--    ------------  ------ ----------- ------                ---------------
10    Po10          up     success     success               1,10,20,77   
Example 9-8: vPC verification

Step 6: Configure vPC peer-gateway under vpc domain configuration

Some devices, such as NAS and Load Balancer, might not perform standard ARP-request for IP of the default gateway during the boot process. Instead of it, they take the first source MAC address that they hear from the wire and then bind that MAC address with the IP address of the default gateway. This kind of behavior might cause forwarding problems.

In figure 9-3, we have two hosts, Host-A in VLAN 10 and Host-B in VLAN 20. Let's say that Host-B is a NAS device that binds the first source MAC that it hears, to the default gateway IP. (1) vPC VTEP Leaf-102 send some data towards host-B. (2) Host-B has just booted up and it receives the frame sent from Leaf-103 and binds the source MAC address from the frame to the default gateway IP. (3) Then Host-B starts sending data to Host-A which is in VLAN 10, so the IP packet is sent to the default gateway. (4) The Ethernet switch, where the Host-B is connected, receives the IP packet and runs the channel hash algorithm and choose the link towards vPC VTEP Leaf-103. (5) Leaf-103 receives the IP packet and since the destination MAC address in Ethernet header belongs to Leaf-102, the IP packet is sent over the vPC Peer-link to the vPC VTEP peer Leaf-102. (6) Now the loop prevention mechanism kicks in, the data received from the vPC member port and the crossing over vPC peer-link is not allowed to send out to any vPC member port. So in our case, Leaf-102 drops the data packet.


Note! There is one exception to loop prevention mechanism “frame received from vPC member ports and crossed over vPC Peer-Link are not allowed to egress from vPC member port”. If vPC member port between Leaf-103 and host A is down, then the frame is allowed to egress from Leaf-102 port e1/3.


By using the vPC Peer-Gateway option, Leaf-103 is allowed to act as an active default gateway in VLAN 10 (and of course in VLAN 20) also in a situation where the IP packet received over the vPC member port has a destination MAC address, that belongs to the vPC peer Leaf-102. So in our example, Leaf-103 is allowed to send data packet straight to Host-A without sending it to vPC peer Leaf-102.

Figure 9-3: vPC peer-gateway


Vpc domain 23
  Peer-gateway
Example 9-9: vPC peer-gateway configuration

Step 7: Configure ARP sync under vpc domain configuration

ARP sync is used to synchronize the MAC address information in recovery situation after the vPC Peer-link has failed. Synchronization is done by using the Cisco Fabric Service protocol (CFS). The direction is from the primary vPC peer (Leaf-102 in our lab) to the secondary vPC peer (Leaf-103).

Vpc domain 23
  ip arp synchronize
Example 9-10: vPC ARP synch configuration

Step 6: Tune vPC Delay Restore timers (optional)

By using vPC delay restore, the vPC peer switch holds down the vPC links and SVIs until the routing protocols are converged. This property is enabled by default with timer value 30 and 10 seconds (vPC link/SVI). We are using timers 240/80. These values are related to a size of the network.


vpc domain 23
  delay restore 240
  delay restore interface-vlan 80
Example 9-11: vPC Delay Restore configuration

Some other consideration for vPC:

Since the primary subject of this post is to show how the VXLAN BGP EVPN works with vPC I am not going to show each and every vPC features in detail but still, here are some other consideration when implementing vPC.

The vPC priority should be statically defined in both primary and secondary vPC peer switch. This way we know which one is the primary switch. Orphan hosts should be connected to primary vPC peer switch. By doing this they are not restricted from the network in case of vPC Peer-link failure. If First Hop Redundancy Protocol such as HSRP is used, set the STP root and HSRP primary to the vPC primary switch.

At his moment, we have done following configuration related to vPC on switch Leaf-102 and Leaf-103.

feature vpc
!
vpc domain 23
  peer-switch
  peer-keepalive destination 10.102.103.103 source 10.102.103.102 vrf VPC-Peer-K
eepalive
  delay restore 240
  peer-gateway
  delay restore interface-vlan 80
  ip arp synchronize
!
interface port-channel10
  vpc 10
!
interface port-channel23
  vpc peer-link
!
interface Ethernet1/6
  no switchport
  vrf member VPC-Peer-Keepalive
  ip address 10.102.103.102/24
  no shutdown
!
interface Ethernet1/4
  description ** Po23 member - vPC PEER-link **
  switchport mode trunk
  channel-group 23 mode active
!
interface Ethernet1/5
  description ** Po23 member - vPC PEER-link **
  switchport mode trunk
  channel-group 23 mode active
!
interface Ethernet1/3
  description ** Link to Ethernet SW **
  switchport mode trunk
  channel-group 10
Example 9-12: vPC Delay Restore configuration

VTEP redundancy with vPC

When vPC is implemented into VXLAN fabric, both vPC VTEP peers start using Virtual IP (VIP) address as a source address instead of their physical IP address (PIP) for. This also means that BGP EVPN starts advertising both Route Types 2 (MAC/IP advertisement) and 5 (IP prefix-route) with VIP as a next-hop (default behavior). There are two IP addresses configured into Loopback 0 interface, Primary IP 192.168.100.102/32 (PIP) and secondary IP 192.168.100.23/32 (VIP) in our example lab (figure 9-4).  

Figure 9-4: vPC PIP and VIP addressing

First I will configure the same secondary IP address 192.168.100.23 under Loopback 100 interface on both vPC VTEP switches. An example is taken from VTEP-102. At this phase, we are not going to do any other configuration.

interface loopback100
  description ** VTEP/Overlay **
  ip address 192.168.100.102/32
  ip address 192.168.100.23/32 secondary
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
Example 9-12: Secondary IP address into Loopback 0

Now we will attach host Cafe into Ethernet switch and verify Control Plane operation by capturing traffic from wire to see the BGP EVPN MAC/IP advertisements. Then we will verify the Data Plane operation by pinging from Cafe to Beef.

Phase-1: Host Cafe boot up and sends a Gratuitous ARP message to verify the uniqueness of its IP address and inform the location of its own MAC address. Channel hash algorithm selects the interface g0/1 and the broadcast messages are sent out of it.

Phase-2: Leaf-102 receives the first broadcast frame. Its L2FWDER component notices the incoming frame from the interface Po10, with source MAC address 1000.0010.cafe. After MAC address-table update the L2FWDER installs the MAC address to L2RIB where it is sent to BGP EVPN process. Since LEAF-102 has vPC peer switch Leaf-103, it synchronizes the MAC address-table over the vPC Peer Link with CFS. This way Leaf-103 learns that MAC address 1000.0010.cafe is located behind PortChannel 10. Since the destination of the frame is a broadcast address, it is also flooded over the vPC Peer-link. These broadcast messages are also sent to the corresponding multicast group of VNI 10000.

Note! The detailed descriptions about update process of MAC-address table, L2RIB, BGP-table, and RIB as well as BGP EVPN process can be found from my previous post “VXLAN Part VII: VXLAN BGP EVPN –Control Plane operation”.
BUM (Broadcast, Unknown Unicast and Multicast) traffic processes in VXLAN are explained in VXLAN Part V: Flood and Learn

Phase-3: At this phase, the L2FWDER component has sent MAC address information from L2RIB to BGP EVPN process in both switches. They both send two BGP EVPN Route Type-2 Update to the Spine switch Spine-11, first one is the host Cafe MAC-address and the second on is MAC/IP information. For simplicity, we concentrate only on MAC address advertisements sent by vPC peer switches Leaf-102 and Leaf-103. The BGP EVPN Update messages can be seen in Capture 9-1 (Leaf-102) and 9-2 (Leaf-103) right after the figure 9-5. From these captures, we can see that the Path Attribute MP_REACH_NLRI Path Attribute Next-Hop is set 192.168.100.23. This information is not visible in the captured packet as binary mode, but it can be found from the HEX part, where we can see the HEX value c0 a8 64 17 (192.168.100.23 in Binary). Note that the EVPN NLRI Route Distinguisher includes the original sender RID which is how the Spine switch can differentiate these updates.

Phase-4: Spine-11 sends these BGP EVPN Updates to Leaf-101 without modification of the Path Attribute, it just adds a Cluster List Path Attribute, which is used as a loop prevention mechanism (Spine-11 is a Route Reflector). If we compare these updates received from Leaf-102 and Leaf-103, the only notable difference in BGP EVPN Update is the Route Distinguisher (RD). By checking the RD value, the Spine-11 knows that updates are from different Leaf switches (Detailed explanation can be found from part VXLAN Part VII: VXLAN BGP EVPN –Control Plane operation). 

Phase-5: From the Capture 9-3, taken from the Leaf-103 interface e1/1, we can see that the RR Spine-11 sends the BGP EVPN Update about host Cafe MAC, sent by Leaf-102, to Leaf-103. This Update is blocked by Leaf-103 based Site of Origin (SoO) Extended Community Attribute 192.168.100.23:0 (Route Origin field in Capture). 

Figure 9-5: BGP EVPN Update

Capture 9-1: BGP EVPN Update from Leaf-102 to Spine-11


Capture 9-2: BGP EVPN Update from Leaf-103 to Spine-11

Capture 9-3: BGP EVPN Update originated by Leaf-102 and to Leaf-103 by Spine-11.

From the output of example 9-13, we can see that the host Cafe MAC- and IP information is produced to L2RIB by BGP with the next-hop address of VIP/Anycast address of vCP domain 23 switches Leaf-102 and Leaf-103. Same output also shows that the MAC-IP binding is sent to ARP cache, actually to ARP suppression-cache.

Leaf-101# sh l2route evpn mac-ip evi 10 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated
Topology    Mac Address    Prod   Flags         Seq No     Host IP         Next-Hops     
----------- -------------- ------ ---------- --------------- ---------------
10          1000.0010.cafe BGP    --            0          192.168.11.11  192.168.100.23
            Sent To: ARP
            SOO: 775043377     
10          1000.0010.beef HMM    --            0          192.168.11.12  Local         
            Sent To: BGP
            L3-Info: 10077
Example 9-13: Leaf-101 L2RIB

Example 9-14 shows the ARP suppression-cache

Leaf-101# sh ip arp suppression-cache detail

Flags: + - Adjacencies synced via CFSoE
       L - Local Adjacency
       R - Remote Adjacency
       L2 - Learnt over L2 interface
       PS - Added via L2RIB, Peer Sync
       RO - Dervied from L2RIB Peer Sync Entry

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote Vtep Addrs

192.168.11.12   00:04:30 1000.0010.beef   10 Ethernet1/4         L
192.168.11.11   01:28:40 1000.0010.cafe   10 (null)              R        192.168.100.23
Example 9-14: ARP suppression-cache on Leaf-101.

Ping shows that we have IP connectivity between Cafe and Beef in VLAN 10
Beef#ping 192.168.11.11
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.11.11, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 10/15/21 ms
Example 9-15: ping from host Beef to Host Cafe

From the Capture 9-4, we can see that the ICMP messages IP header destination address in VXLAN Tunnel header is correctly set to 192.168.100.23 as it should be.

Capture 9-4: Ping from Beef to Cafe.

This is the basic operation vPC is implemented in VXLAN BGP EVPN fabric.

Advertising Primary IP address

In figure 9-6, there is an external network behind the vPC peer VTEP Leaf-103. This kind of setup might lead to the situation, where the spine switch Spine-11 sends data flow to an external network to the vPC Peer switch Leaf-102 that has no the how to route the packet and the data flow is black-holed. This might happens since in a basic setup, both vPC peers send a BGP EVPN Updates to Spine-11 with VIP/Any address. Note that vPC peers switches do not have a synchronization mechanism to synchronize Layer 3 prefix information.

The process is shown in Figure 9-6. (1) Router Ext-Ro02 sends the BGP Update about network 172.16.77.0/24 to Leaf-103, which in turns forwards the Update to the Spine-11 by using its VIP/Anycast IP 192.168.100.23 as a next-hop. (3) Spine-11 send the Update to its BGP RR Clients Leaf-101 and Leaf-102. Leaf-102 ignores the Update (Same SoO) and Leaf-101 installs the received information in TENANT77 specific tables (BGP, RIB). That is a simplified Control Plane operation. (4) Then the Data Plane, Host Beef sends data to a host located in the network 172.16.77.0/24. It sends the IP packets to its Default Gateway Leaf-101, which knows that the destination network is reachable through the next-hop address 192.168.100.23 (vCP peers Leaf-102 and Leaf-103 VIP/Anycast address). (5) Leaf-101 sends the packet to Spine-11. Spine-11 has two possible paths towards the next-hop 192.168.100.23, to either via Leaf-102 or via Leaf-103. It might select the path to Leaf-102, which does not know how to route the packet to destination network 172.16.77.0/24.

Figure 9-6: BGP EVPN Update

From the Example 9-16, we can see that the external network is advertised with the next-hop address of vPC peer switches VIP/Anycast address 192.168.100.23.

Leaf-101# sh ip bgp vrf TENANT77 172.16.77.0
BGP routing table information for VRF TENANT77, address family IPv4 Unicast
BGP routing table entry for 172.16.77.0/24, version 6
Paths: (1 available, best #1)
Flags: (0x8008041a) on xmit-list, is in urib, is best urib route, is in HW
  vpn: version 6, (0x100002) on xmit-list

  Advertised path-id 1, VPN AF advertised path-id 1
  Path type: internal, path is valid, is best path, in rib
             Imported from 192.168.77.103:3:[5]:[0]:[0]:[24]:[172.16.77.0]:[0.0.0.0]/224
  AS-Path: 64577 , path sourced external to AS
    192.168.100.23 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED 0, localpref 100, weight 0
      Received label 10077
      Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0006.0007
      Originator: 192.168.77.103 Cluster list: 192.168.77.111

<snipped>
Example 9-16: BGP table Leaf-101

This behavior can be changed in a way that instead of advertising VIP as a Next-Hop for external prefixes, the PIP (Primary/Physical IP) is used. This is achieved with commands Advertise-pip under BGP AFI together with advertise virtual-rmac under NVE interface which together lets BGP use Primary IP address as a next-hop when advertising prefix-routes. These commands are enabled on both vPC peers switch.

router bgp 65000
  address-family l2vpn evpn
    advertise-pip
!
interface nve1
    advertise virtual-rmac
Example 9-17: Advertise-pip and advertise virtual-rmac

After changes, we can see from the Leaf-101 that the next-hop is set to Primary IP (PIP) of Leaf-103.

Leaf-101# sh ip bgp vrf TENANT77 172.16.77.0
BGP routing table information for VRF TENANT77, address family IPv4 Unicast
BGP routing table entry for 172.16.77.0/24, version 19
Paths: (1 available, best #1)
Flags: (0x8008041a) on xmit-list, is in urib, is best urib route, is in HW
  vpn: version 21, (0x100002) on xmit-list

  Advertised path-id 1, VPN AF advertised path-id 1
  Path type: internal, path is valid, is best path, in rib
             Imported from 192.168.77.103:3:[5]:[0]:[0]:[24]:[172.16.77.0]:[0.0.0.0]/224
  AS-Path: 64577 , path sourced external to AS
    192.168.100.103 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED 0, localpref 100, weight 0
      Received label 10077
      Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0006.0007
      Originator: 192.168.77.103 Cluster list: 192.168.77.111

  VRF advertise information:
  Path-id 1 not advertised to any peer

  VPN AF advertise information:
  Path-id 1 not advertised to any peer
Example 9-18: Advertise-pip and advertise virtual-rmac.

By using Advertise-pip and advertise virtual-rmac commands, the next-hop operation changes a little bit. From the Capture 9-5, we can see that MAC Advertisement Route (Type-2) still use VIP as next-hop (HEX 17 = DEC 23).

Capture 9-5: Route Type-2

While IP prefix route (Type-5) uses PIP as next-hop address (HEX 67 = DEC 103)

Capture 9-6: Route Type-5

We can verify this also from the BGP table of Leaf-101. Host Cafe related MAC and MAC/IP has 192.168.100.23 (VIP) as a next-hop, while external network 172.16.77.0/24 has 192.168.100.103 (PIP) as a next-hop address.

Leaf-101# show bgp l2vpn evpn
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 346, Local Router ID is 192.168.77.101
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 192.168.77.101:32777    (L2VNI 10000)
*>l[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216
                      192.168.100.101                   100      32768 i
* i[2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/216
                      192.168.100.23                    100          0 i
*>i                   192.168.100.23                    100          0 i
*>l[2]:[0]:[0]:[48]:[1000.0010.beef]:[32]:[192.168.11.12]/272
                      192.168.100.101                   100      32768 i
* i[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[192.168.11.11]/272
                      192.168.100.23                    100          0 i
*>i                   192.168.100.23                    100          0 i

Route Distinguisher: 192.168.77.102:3
*>i[5]:[0]:[0]:[24]:[192.168.11.0]:[0.0.0.0]/224
                      192.168.100.102                   100          0 i

Route Distinguisher: 192.168.77.102:32777
*>i[2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/216
                      192.168.100.23                    100          0 i
*>i[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[192.168.11.11]/272
                      192.168.100.23                    100          0 i

Route Distinguisher: 192.168.77.103:3
*>i[5]:[0]:[0]:[24]:[172.16.77.0]:[0.0.0.0]/224
                      192.168.100.103          0        100          0 64577 i
*>i[5]:[0]:[0]:[24]:[192.168.11.0]:[0.0.0.0]/224
                      192.168.100.103                   100          0 i

Route Distinguisher: 192.168.77.103:32777
*>i[2]:[0]:[0]:[48]:[1000.0010.cafe]:[0]:[0.0.0.0]/216
                      192.168.100.23                    100          0 i
*>i[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[192.168.11.11]/272
                      192.168.100.23                    100          0 i

Route Distinguisher: 192.168.77.101:3    (L3VNI 10077)
* i[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[192.168.11.11]/272
                      192.168.100.23                    100          0 i
*>i                   192.168.100.23                    100          0 i
*>i[5]:[0]:[0]:[24]:[172.16.77.0]:[0.0.0.0]/224
                      192.168.100.103          0        100          0 64577 i
* i[5]:[0]:[0]:[24]:[192.168.11.0]:[0.0.0.0]/224
                      192.168.100.103                   100          0 i
*>i                   192.168.100.102                   100          0 i
Example 9-18: Advertise-pip and advertise virtual-rmac

One more thing, if we take a look at the TENANT77 BGP table in Example 9-19, we see that also local prefixes are advertised by using PIP.

Leaf-101# sh ip bgp vrf TENANT77
BGP routing table information for VRF TENANT77, address family IPv4 Unicast
BGP table version is 25, Local Router ID is 192.168.11.1
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup

   Network            Next Hop            Metric     LocPrf     Weight Path
*>i172.16.77.0/24     192.168.100.103          0        100          0 64577 i
* i192.168.11.0/24    192.168.100.103                   100          0 i
*>i                   192.168.100.102                   100          0 i
* i192.168.11.11/32   192.168.100.23                    100          0 i
*>i                   192.168.100.23                    100          0 i
Example 9-19: TENANT77 BGP table


Author: Toni Pasanen CCIE#28158
Published: 19-August 2018
Edited: August 19-August 2018 | Toni Pasanen

-------------------------------------------------
References:

Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective
ISBN-10: 1-58714-467-0 – Krattiger Lukas, Shyam Kapadia, and Jansen Davis

NX-OS and Cisco Nexus Switching – Next-Generation Data Center Architectures
Second Edition
ISBN-10: 1-58714-304-6 – Ron Fuller, David Jansen, and Matthew McPherson

Design and Configuration Guide: Best Practices for Virtual Port Channels (vPC) on Cisco Nexus7000 Series Switches - Revised: June 2016

LIST OF VPC BEST PRACTICES - Peter Welcher
https://www.netcraftsmen.com/vpc-best-practices-checklist/



Appendix 1.

Configuration of Leaf-102
Leaf-102# sh run

!Command: show running-config
!Time: Sat Aug 18 12:34:00 2018

version 7.0(3)I7(1)
hostname Leaf-102
vdc Leaf-102 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 128 maximum 128
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

cfs eth distribute
nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature lacp
feature vpc
feature nv overlay

username admin password 5 $5$r25DfmPc$EvUgSVebL3gCPQ8e1ngSTxeKYIk4yuuPIomJKa5Lp/
3  role network-admin
ip domain-lookup
snmp-server user admin network-admin auth md5 0x713961e592dd5c2401317a7e674464ac
 priv 0x713961e592dd5c2401317a7e674464ac localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

fabric forwarding anycast-gateway-mac 0001.0001.0001
ip pim rp-address 192.168.238.1 group-list 238.0.0.0/24 bidir
ip pim ssm range 232.0.0.0/8
vlan 1,10,20,77
vlan 10
  name L2VNI-for-VLAN10
  vn-segment 10000
vlan 20
  name L2VNI-for-VLAN20
  vn-segment 20000
vlan 77
  name TENANT77
  vn-segment 10077

spanning-tree vlan 1-3967 priority 4096
vrf context TENANT77
  vni 10077
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
vrf context VPC-Peer-Keepalive
vrf context management
hardware access-list tcam region racl 512
hardware access-list tcam region arp-ether 256 double-wide
vpc domain 23
  peer-switch
  peer-keepalive destination 10.102.103.103 source 10.102.103.102 vrf VPC-Peer-K
eepalive
  delay restore 240
  peer-gateway
  delay restore interface-vlan 80
  ip arp synchronize


interface Vlan1
  no shutdown
  no ip redirects
  no ipv6 redirects

interface Vlan10
  no shutdown
  vrf member TENANT77
  no ip redirects
  ip address 192.168.11.1/24
  no ipv6 redirects
  fabric forwarding mode anycast-gateway

interface Vlan20
  no shutdown
  vrf member TENANT77
  no ip redirects
  ip address 192.168.12.1/24
  no ipv6 redirects
  fabric forwarding mode anycast-gateway

interface Vlan77
  no shutdown
  vrf member TENANT77
  no ip redirects
  ip forward
  no ipv6 redirects

interface port-channel10
  switchport mode trunk
  vpc 10

interface port-channel23
  switchport mode trunk
  spanning-tree port type network
  vpc peer-link

interface nve1
  no shutdown
  host-reachability protocol bgp
  advertise virtual-rmac
  source-interface loopback100
  member vni 10000
    suppress-arp
    mcast-group 238.0.0.10
  member vni 10077 associate-vrf
  member vni 20000
    suppress-arp
    mcast-group 238.0.0.10

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/3
  description ** Link to Ethernet SW **
  switchport mode trunk
  channel-group 10

interface Ethernet1/4
  description ** Po23 member - vPC PEER-link **
  switchport mode trunk
  channel-group 23 mode active

interface Ethernet1/5
  description ** Po23 member - vPC PEER-link **
  switchport mode trunk
  channel-group 23 mode active

interface Ethernet1/6
  no switchport
  vrf member VPC-Peer-Keepalive
  ip address 10.102.103.102/24
  no shutdown


interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.102/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback77
  description ** BGP peering **
  ip address 192.168.77.102/32
  ip router ospf UNDERLAY-NET area 0.0.0.0

interface loopback100
  description ** VTEP/Overlay **
  ip address 192.168.100.102/32
  ip address 192.168.100.23/32 secondary
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.102
  name-lookup
router bgp 65000
  router-id 192.168.77.102
  timers bgp 3 9
  address-family ipv4 unicast
  address-family l2vpn evpn
    advertise-pip
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
      aggregate-address 192.168.11.0/24 summary-only
    neighbor 10.102.77.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.77
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map INCOMING_POLICIES_FROM_ExtRo01 in
        route-map OUTGOING_POLICIES out
    neighbor 10.102.78.2
      remote-as 64577
      description ** External Network - Ext-Ro02 **
      update-source Ethernet1/3.78
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map INCOMING_POLICIES_FROM_ExtRo02 in
        route-map OUTGOING_POLICIES out
evpn
  vni 10000 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 20000 l2
    rd auto
    route-target import auto
    route-target export auto

Configuration of Leaf-103
Leaf-103# sh run

!Command: show running-config
!Time: Sat Aug 18 12:35:18 2018

version 7.0(3)I7(1)
hostname Leaf-103
vdc Leaf-103 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 248 maximum 248
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

cfs eth distribute
nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature lacp
feature vpc
feature nv overlay

no password strength-check
username admin password 5 $5$.82HC6Bt$QEpUIVi292elRGmwWNLciK2xa2z13xVwsGhdp2BMU0
D  role network-admin
ip domain-lookup
snmp-server user admin network-admin auth md5 0x7f693b750cc7550144b8410e07eecf1d
 priv 0x7f693b750cc7550144b8410e07eecf1d localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

fabric forwarding anycast-gateway-mac 0001.0001.0001
ip pim rp-address 192.168.238.1 group-list 238.0.0.0/24 bidir
ip pim ssm range 232.0.0.0/8
vlan 1,10,20,77
vlan 10
  name L2VNI-for-VLAN10
  vn-segment 10000
vlan 20
  name L2VNI-for-VLAN20
  vn-segment 20000
vlan 77
  name TENANT77
  vn-segment 10077

spanning-tree vlan 1-3967 priority 4096

vrf context TENANT77
  vni 10077
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
vrf context VPC-Peer-Keepalive
vrf context management
hardware access-list tcam region racl 512
hardware access-list tcam region arp-ether 256 double-wide
vpc domain 23
  peer-switch
  peer-keepalive destination 10.102.103.102 source 10.102.103.103 vrf VPC-Peer-K
eepalive
  delay restore 240
  peer-gateway
  delay restore interface-vlan 80
  ip arp synchronize


interface Vlan1
  no shutdown
  no ip redirects
  no ipv6 redirects

interface Vlan10
  no shutdown
  vrf member TENANT77
  no ip redirects
  ip address 192.168.11.1/24
  no ipv6 redirects
  fabric forwarding mode anycast-gateway

interface Vlan20
  no shutdown
  vrf member TENANT77
  no ip redirects
  ip address 192.168.12.1/24
  no ipv6 redirects
  fabric forwarding mode anycast-gateway

interface Vlan77
  no shutdown
  vrf member TENANT77
  no ip redirects
  ip forward
  no ipv6 redirects

interface port-channel10
  switchport mode trunk
  vpc 10

interface port-channel23
  switchport mode trunk
  spanning-tree port type network
  vpc peer-link

interface nve1
  no shutdown
  host-reachability protocol bgp
  advertise virtual-rmac
  source-interface loopback100
  member vni 10000
    suppress-arp
    mcast-group 238.0.0.10
  member vni 10077 associate-vrf
  member vni 20000
    suppress-arp
    mcast-group 238.0.0.10

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/3
  description ** Link to Ethernet SW **
  switchport mode trunk
  channel-group 10

interface Ethernet1/4
  description ** Po23 member - vPC PEER-link **
  switchport mode trunk
  channel-group 23 mode active

interface Ethernet1/5
  description ** Po23 member - vPC PEER-link **
  switchport mode trunk
  channel-group 23 mode active

interface Ethernet1/6
  no switchport
  vrf member VPC-Peer-Keepalive
  ip address 10.102.103.103/24
  no shutdown

interface Ethernet1/7
  description ** to Ext-Ro02 **
  no switchport
  vrf member TENANT77
  ip address 10.103.77.103/24
  no shutdown


interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.103/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback77
  description ** BGP peering **
  ip address 192.168.77.103/32
  ip router ospf UNDERLAY-NET area 0.0.0.0

interface loopback100
  description ** VTEP/Overlay **
  ip address 192.168.100.103/32
  ip address 192.168.100.23/32 secondary
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.103
  name-lookup
router bgp 65000
  router-id 192.168.77.103
  timers bgp 3 9
  address-family ipv4 unicast
  address-family l2vpn evpn
    advertise-pip
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
      aggregate-address 192.168.11.0/24 summary-only
    neighbor 10.103.77.2
      remote-as 64577
      description ** External Network - Ext-Ro02 **
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map INCOMING_POLICIES_FROM_ExtRo02 in
        route-map OUTGOING_POLICIES out
    neighbor 10.103.78.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.78
      address-family ipv4 unicast
        send-community
        route-map INCOMING_POLICIES_FROM_ExtRo01 in
        route-map OUTGOING_POLICIES out
evpn
  vni 10000 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 20000 l2
    rd auto
    route-target import auto
    route-target export auto


39 comments:

  1. You are doing some beautiful work man! Fantastic!

    ReplyDelete
  2. Hi!
    Thanks. Could you post the configuration of route-maps? What are you matching?

    /Mohammed

    ReplyDelete
    Replies
    1. Hi Mohammad! Te route-maps actually belongs to the "VXLAN Part VIII: VXLAN BGP EVPN – External Connection" configuration. Route-maps does not have any role on this chapter.

      Delete
  3. Hi Toni!

    Thanks your respons! This helped me alot to understand the VXLAN BGP EVPN and i just want to thank you again. Thanks. Thanks.

    I do have another question about how the traffic is going on the overlay:

    Imagine if you are in the leaf101-router "not the Beef machine" and want to ping 192.168.11.11, how the traffic will go or do you get any respons back from 192.168.11.11?

    Second question.

    Imagine if you are in leaf103 "not the Cafe machine" and the link e1/3 towards cafes swicth is down and you want to ping 192.168.11.11, what will happen there? could you get any respons från 192.168.11.11 and how?

    The last question:

    If i don´t need to use multicast "PIM" is there advantages och disadvantages there? I see also you are using "hardware access-list tcam region racl 512" do i need to free up TCAM if i am using 93180YC-EX or this plattform can handle that whithout any TCAM allocation?

    Thanks again.

    ReplyDelete
    Replies
    1. Hi Mohammad!
      Thanks for the excellent questions!
      If we ping from the Leaf-101 by using VLAN 10 Anycast Gateway address (AGW) 192.168.11.1 as a source address to Hots-Café 192.168.11.11 in VLAN 10 which is located behind the Leaf-102 and Leaf-103 (which both have the same IP 192.168.11.1 for VLAN 10 AGW). The source IP address used in the icmp-request packets is 192.168.11.1 and when Host Café sends the icmp-reply, it will send it towards 192.168.11.1. The Ethernet switch may send the packet either to Leaf-102 or Leaf-103. Since both Leaf switches “owns” the IP address 192.168.11.1, they will not send icmp-reply back to querier Leaf-101. So we will not get an icmp-reply for icmp-request.
      In the second scenario where link e1/3 in Leaf-103 is down and we ping from it by using AGW 102.168.11.1 as a source to Host Café 192.168.11.1, icmp-request will be sent over the peer-link to Leaf-102, which in turns forwards packets to the host Cafe over the link 1/3. Host Café sends the ICMP-reply towards Leaf-102 (only possible path) by using destination IP 192.168.11.1. When Leaf-102 receives the packet it will not forward it to Leaf-103 since it “owns” the destination IP 192.168.11.1. By the way, even though the link between Leaf-103 and Ethernet switch is up, we might end up to the same situation. This depends on which path is selected by Channel Hash algorithm. So in the case of vPC, we are not able to predict whether the ping works or not.
      If you only have a couple of switches, you could use ingress replication instead Multicast in Underlay network for BUM traffic.

      Delete
    2. Hi Toni!

      Thanks your respons!

      If i understand you right then it means, the ping from leafs don´t work att all. Good. Is there anyway to test a ping from the switch to some host which is connected to the remote leafs?

      "if you only have a couple of switches" Is there any limitations of switches? I do have leaf-1/2 vpc and leaf-3/4 vpc and two spine swicthes.

      I am trying to use SPINE-border is there any disadvantage or advantage to use SPINEs like a border?

      Thanks

      Delete
    3. First, there are Operating, Administrating and Management (OAM) model where we have advanced tools for monitoring and troubleshooting (NVE ping, path trace among the other things). I will write a post about OAM later (I have one topic before that).
      Second, if the SPINE switch is used as a border node, it will become a VTEP (in addition to the SPINE). There will naturally be external peers and we might have additional Control Plane protocols such as MPLS LDP, some dynamic routing protocol, etc. required on SPINE/Border switch. This adds complexity to the SPINE (increase OPEX). In turn, by using SPINE as a border node, you can have savings in CAPEX point of view.

      Delete
    4. Hi Toni!

      Ok, Thanks. Just waiting to OAM, it is fantastic to heard that you are planning about this. I am very thank full your work, thank you very much.
      What is the question "If you only have a couple of switches, you could use ingress replication instead Multicast in Underlay network for BUM traffic" Is there any limitations of switches? I do have leaf-1/2-vpc and leaf-3/4-vpc and two spine swicthes.

      Thank you ver much for your time.

      Delete
    5. Unknown16 November 2018 at 23:36
      A little update: I was missing ip pim rp-address [anycast_rp_addr] on Spines.
      All is working perfectly now.

      Delete
  4. A little update: I was missing ip pim rp-address [anycast_rp_addr] on Spines.
    All is working perfectly now.

    ReplyDelete
  5. I have faced interesting problem when up links from Leaf 102 OR Leaf 103 ( vPC pair) go down - spanning-tree on switch with Po10 is blocking port-channel with a message : " Desg BLK 4 128.67 P2p Dispute"

    Have anyone seen this before?

    ReplyDelete
  6. Great posts Toni, really detailed and educational. Quick question, on VIRL what kind of host are you using ? I am using the Ubuntu server but for a reason I cannot ping my default gateway. Am I missing something here? I have configured my dgw and ip address on the server.

    Thanks,

    George

    ReplyDelete
    Replies
    1. Hi George,
      I am using a router as a client. For the quick check, you can verify that the STP root is in VTEP switch.
      Toni

      Delete
  7. Hi Toni, Thanks for your reply. I have managed to add the ubuntu hosts into the fabric and now I can see that I have learnt both mac and IP host addresses. Quick one, did you notice while on VIRL that the BGP sessions between leaf and Spines remain idle sometimes and I had to clear the sessions. Anyway, seriously , you have done a really great job here. I liked your explanation on ESI and the algorithm regarding the DF election.

    Thanks again.

    ReplyDelete
    Replies
    1. Thanks George for your kind comments. And yes, virtual devices on VIRL has to sometimes boot to make things work. In my case problems are related to L2IOS vlan database.
      Cheers-Toni

      Delete
  8. Hi Toni,
    Your blog is the lifespring providing me cisco vxlan knowledge I need!!!
    I am actually read your blog over and over again in order to better understand it.

    for VPC I noticed you are using a secondary IP(192.168.100.23) on loopback 100 on both VPC devices.
    I believe on spine switch, it point 192.168.100.23 as its bgp peer.
    While I have some problems here:
    1.is this a cisco recommanded config here, I mean using secondary IP.
    2.what is the mac address spine switch is using in order to forward packet to 192.168.100.23.
    3.if it is me, I will use HSRP on VPC as it can be active active as well.
    4.I am actually confused by this config actually. since eth1 is not port channal and loopback 100 is not hsrp, won't this config cause IP conflict?

    Yours sincerely
    Michael

    ReplyDelete
    Replies
    1. Hi Michael,

      Excellent questions one again!

      1) I am using a couple of Cisco documents as a source in this chapter but since the vPC+ (no peer/keepalive link required) is now available, I am not 100% sure what is their current best practice for doing this.

      2) The MAC address that Spine-11 uses depends on the result of the ECMP hash. If the result points to Leaf-101, the MAC of its core interface is used and if Leaf-102 is selected, then its core interface MAC is used. The Router-MAC BGP Extended Community Path Attribute (PA) carried in BGP Update depends on the vPC configuration. If we are using "advertise virtual-rmac", then the virtual MAC is used instead of vPC system MAC.

      vPC peers use the same VIP. This way they are seen by remote leafs as one unit. This is the same kind of model than what HSRP shows to the LAN side. There is no IP conflict.
      The BGP peering model does not change here, the BGP L2VPN EVPN Afi peering is still between Loopback 77 interfaces (192.168.77.sw-id/32).

      If I ever rewrite this chapter, I will include these in it.

      Thanks - Toni

      Delete
  9. HI Tony,
    Really appreciate your patience.
    Sorry I ignore the time when you updated this blog. By the time, VPC+ is not launched yet.
    I have to make a sigh here. Technology advanced in such a rapid speed. Moore's Law works in network as well.

    BTW what software are you using to draw these diagrams. they look awesome!!!

    Yours sincerely
    Michael

    ReplyDelete
    Replies
    1. hi Toni,
      another question just comes in my mind.(the further you extend your knowledge scope, the more question you will have!) and please allow me to touble you again.

      in this blog, you are using the same secondary IP on leaf 102 and leaf 103 for form a bgp peer with spine, and when cafe join the network both leaf102 and leaf103 will forward type 2 including cafe's mac and ip to spine,
      let us assume we are using vpc+ here, leaf102 and leaf 103 will form a active/active hsrp. so in this case, when cafe comes in network. Will both leaf102 and leaf103 send type2 to spine OR only one of them sends update?
      As far as I can understand, since VPC+ will use VIP and VMAC for HSRP and include them in type2, one update should be enough. and if this is true, using vpc+ in this senario should be better as it reduce a second type2 to be sent.

      Regards

      Michael

      Delete
    2. Hi Michael,
      Knowledge sharing is the reason for this block, so please be welcome to ask questions (for sure I can not answer them all but I do my best). I haven't been able to test vPC+ yet with real physical device and it is not supported in NX-OSv 9.3.1 that I am currently using. I have my assumptions how it works but as long as I do not have exact information I'll be quiet :)

      Delete
    3. Hi Michael,
      The colored figures are made with PowerPoint and the black-and-white figures I am using in newer posts are made with MS Visio. The Icons are self made.
      Cheers - Toni

      Delete
    4. thanks for posting this lab, im about to deploy vxlan. coming from a VPC+ environment, id like to VPC 2 of my 4 leafs so i can connect downstream L2 switches for for port density. my question is...
      1 will HSRP bring any advantage vs anycast gateway?
      2 is the VPC primary responding to any ARP request from cafe/beef clients and VPC secondary is just standing by?

      Delete
  10. Great posts Toni, Thanks for sharing your awesome knowledge..

    ReplyDelete
    Replies
    1. Thanks, I have not have time to write new posts because I have been quite busy with my VXLAN book project. Now when it is finally complete, I try to find time to start writing again :)

      Delete
  11. Hello Toni,
    That was very educational. I am trying to impelement a similar topology. I use 2x 9396 in a vPC pair. I have an orphan host in LEAF_A and the external connectivity in LEAF_B. Between the vPC pair I have L3 underlay connectivity using a physical L3 port. I am trying to ping the external network from host in LEAF_A but I am getting "ttl expired in transit" message. The default route is installed to LEAF_B and is propagated to LEAF_A. Can you provide any insights on it?

    ReplyDelete
    Replies
    1. Hi, It looks like you have routing loop in your environment. Have you checked this blog post https://nwktimes.blogspot.com/2018/09/vxlan-part-xi-using-vpc-peer-link-as.html that describes how VPC peer-link is used as Underlay Network backup path. Note that Cisco recommends that orphan hosts should be connected to primary VPC peer.

      Delete
    2. Hi, Thank you for your answer. I actually read this article. In my case (using Cisco 9396 switches) I cannot find the command 'system nve infra-vlans'. Do you know if similar command exists for these switches? Also, I don't use vPC peer link for L3 connectivity. I am using physical interfaces. If you want, I can sent you command outputs and configuration sections. Thank you!

      Delete
  12. This comment has been removed by the author.

    ReplyDelete
  13. Hello Tony,

    I am Jignesh, I work in mobitv as Sr. Network Engg. I recently started reading your blog and I really like it (https://nwktimes.blogspot.com/). I also order book. I don't have any way to ask question to you. Do you have any info or answer for below question. If you already have answers in your blog than i have not read all blog yet and if you have not answer i would love to know all following things or if you can write articles for us. it would be great.

    Question Follows:
    In ARP supression Mode Under VXLAN:
    A- Unicast ARP reply keep alive mechanisma between be dead to check host liveness (window use it)
    B- how DHCP relay work in VXLAN and does it learn MAC and IP mapping in L2FWD VXLAN?
    C- how duplicate IP for endhost will check in static and dynamic(dhcp).
    D- do you think proxy arp will be disable automatically if you enable arp suppresssion?
    d- it never needed beacuse of anycast gateway and spine-leaf architecture.
    E- do you think Layer-2 Protocol like CDP, LLDP will work without any issue? Does it required any special Route type?

    I already have drop message on your linkedln. I am sorry if i ask questions which look like dummy.
    thank you!- Jignesh

    ReplyDelete
  14. This comment has been removed by the author.

    ReplyDelete
  15. Hi!
    Great post, I am just wondering if leaf-102 and leaf-103 do not need to peer with BGP as they are in a vpc configuration (BGP backup session on SVI) : https://www.cisco.com/c/en/us/support/docs/switches/nexus-9000-series-switches/214624-configure-system-nve-infra-vlans-in-vxla.html
    Do you have an idea ?

    ReplyDelete
    Replies
    1. In vPC i don't think they need any BGP peering, they will be act like single switch for other VTEP.

      Delete
    2. That's right, a BGP peering is not configured between vPC members. In design, where peer link is used as an underlay backup link, the routing protocol used in an underlay should be enabled (IGP or BGP). If we have Multicast enabled Underlay network, also PIM should be enabled.

      Delete
  16. Toni,

    You are doing great work, i bought your books also. Thanks for your work. I have question. I build Spine-Leaf network but my requirement is to have all L2VNI (because i have cisco asa firewall gateway of all my VLANs in network). In that case can i enabled "suppress-arp" for ARP-Suppression? Cisco saying you can only enable arp-suppression with L3VNI where you have Anycast Gateway. is that true?

    ReplyDelete
    Replies
    1. I'm not sure why they are saying that because even though when there is no AGW configured for L2VN and the fabric is used only as L2 transit network, there still is MAC-IP NLRI carried in EVPN RT 2 (Mac Advertisement). I checked this from by book on page 234 where there is a BGP entry having both MAC and IP addreess information about host abba in VLAN 30 which uses fabric as L2 transit. However, I am not sure if that information is actually installed in ARP supression cache if there is no AGW for that L2VN. That might be the reason for not enabling ARP-Suppression for L2 transit network. You could verify that by using command "show ip arp suppression-cache detail" and checking if there is ARP cache entry. Check also the BGP table is there is both MAC-only and MAC-IP entries about your hosts.

      Delete
  17. Hi Tony, if the vPC border leafs also work as border gateway in multisite setup with point to point connections between the border gateways in two DCs then if the DCI on one of the border leaf go down then how an orphan port on DCI down border leaf will communicate with rest of the fabric?

    Thank you

    ReplyDelete