Wednesday, 29 May 2019

EVPN ESI Multihoming - Part I: EVPN Ethernet Segment (ES)


Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

This chapter introduces the standard based EVPN ESI Multi-homing solution in BGP EVPN VXLAN Fabric. It starts by explaining the mechanism of how CE device (Access switch or host) can be attached to two or more independent PE devices (Leaf switches) by using Port-Channel. This section discusses the concept of Ethernet Segment and Port-Channel. Next, this chapter explains how the BGP EVPN Route-Type 4 (Ethernet Segment Route) is for creating the redundancy group between the switches that share the ES. This section introduces the BGP EVPN Route-Type 4 NLRI address format. In addition, this chapter shows how switches belonging to the same redundancy group selects the Designated Forwarder (DF) for BUM traffic among themselves. Also, this chapter introduces the VLAN Consistency Check by using Cisco Fabric Service over IP (CFSoIP). The last two sections explain the Layer 2 Gateway Spanning-Tree (L2G-STP) mechanism and Core-Link Tracking system.

Part II introduces the BGP EVPN Route-Type 1 (Ethernet Auto-Discovery) and how it is used for convergence. Part III discusses the data flows between the hosts in normal and failure situation. Part II and III will be published later.



Figure 1-1: The VXLAN EVPN Multi-homing topology and addressing scheme.



Introduction

In the above figure 1-1, ASW-104 is connected to Leaf-102 and Leaf-103 via logical port-channel 234 that is bundled from the interfaces E1/1 - 2 by using Link Aggregation Control Protocol (LACP). Leaf-102 and Leaf-103 are both connected to ASW-104 via interface E1/2, which are defined to be part of the port-channel 234. However, Leaf-102 and Leaf-103 are standalone switches without Multi-chassis Ether-Channel Trunk (MCT) between them. To be able to introduce themselves to ASW-104 as a single switch, Leaf -102 and Leaf-103 has to first, know that they belong to the same redundancy group and second, introduce the same system-MAC address to AWS-104 so it is able to bundle uplinks to port-channel. Also, leaf switches have to decide which one is allowed to forward BUM traffic (per VLAN) to and from the ES. In addition, the Spanning-Tree root has to be in Leaf switches. To protection against packet loss caused by an uplink failure on either leaf switches (AWS-104 does not recognize these failure events), also Core Link Tracking should be enabled on uplink ports of leaf switches. To protect against VLAN misconfiguration on leaf switches, Cisco Fabric Service over IP (CFSoIP) should be implemented used on leaf switches that share the ES. In order for leaf and spine switches to do Equal Cost Multi-Pathing (ECMP) for VXLAN encapsulated frames, the maximum-paths for iBGP has to be adjusted.


Ethernet Segment Identifier (ESI) and Port-Channel

EVPN ESI multi-homing is enabled using evpn esi multihoming global configuration command. Interfaces E1/1 and E1/2 in ASW-104 are bundled to the port-channel 234 while interface E1/2 in both Leaf-102 and Leaf-103 participate in the port-channel 234 even though interfaces are not bundled together because leaf switches are stand-alone devices. From the ASW-104 perspective, the port-channel 234 is just a normal port-channel. From the Leaf-102 and Leaf-103 perspective, the port-channel represents an EVPN Ethernet Segment (ES).

ES is activated using ethernet-segment <Id> command with system-mac < mac> sub-command under the interface port-channel 234. Even though it looks like that the ethernet segment -command defines the Ethernet Segment Identifier (ESI), it only defines part of it called ES Local Discriminator (ES LD). The actual ESI consists of three parts; the first octet defines the type of the ESI, which in case of Cisco NX-OS is MAC-based ESI value (0x03). Next six octets are taken from the system-MAC configuration. The last three octet includes the ES LD value defined under the interface port-channel. Thus, the Ethernet Segment Identifier in this example scenario is 03.01.02.0103.02.34. In addition to using the system-MAC as a part of the ESI value, it is also used in LACP messages Actor System field to represent local system-MAC address. Since both Leaf-102 and Leaf-103 uses the same system-MAC, ASW-104 sees them as a one switch and is able to bring up the port-channel interface.

Example 1-2 shows the configuration used in both Leaf-102 and Leaf-103. Figure 1-2 illustrates the physical topology and addressing scheme as well as LACP message exchanges between switches. Note that the source and destination MAC addresses used in Ethernet header are the real system MAC addresses.


evpn esi multihoming
!
interface port-channel234
  switchport mode trunk
  switchport trunk allowed vlan 10-11
  ethernet-segment 1234
    system-mac 0102.0103.0234
!
Interface Ethernet1/2
  Switchport mode trunk
  Switchport trunk allowed vlan 10,11
  Channel-group 234 mode active
Example 1-1: Enabling EVPN multi-homing on Leaf switches




Figure 1-2: EVPN Multihoming

Example 1-2 shows that both interface E1/1 and E1/2 on switch ASW-104 are participating in Port-Channel 234.


ASW-104# show port-channel summary | b Group
Group Port-       Type     Protocol  Member Ports
      Channel
-------------------------------------------------------------------------
234   Po234(SU)   Eth      LACP      Eth1/1(P)    Eth1/2(P)
Example 1-2: Port-channel 234 state on ASW-104.



Redundancy Group

Switches belonging to the same Ethernet Segment (ES) needs to introduce themselves to each other as an ES member. In addition, they have to decide who will be a Designated Forwarder (DF) for given ES segment i.e. who is responsible for forwarding BUM traffic (Broadcast, Unknown Unicast, and Multicast traffic) to and from the  ES.  The introduction is made by using BGP EVPN Route-Type 4 (Ethernet Segment Route) BGP update message.

Enabling EVPN ESI multihoming generates the BGP EVPN Route-type 4 (Ethernet Segment Route) NLRI into BGP Loc-RIB table, which is advertised to BGP EVPN afi peer switches. Example 1-3 illustrates the route-type 4 NLRI entry originated by Leaf-102. Note that Leaf-103 is down at this stage and there is only NLRI originated by Leaf-102, this way the output of BGP table is simpler.


Leaf-102# sh bgp l2vp evpn route-type 4
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:27233   (ES [0301.0201.0302.3400.04d2 0])
BGP routing table entry for [4]:[0301.0201.0302.3400.04d2]:[32]:[192.168.100.102]/136, version    3
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on xmit-list, is not in l2rib/evpn
Multipath: iBGP

  Advertised path-id 1
  Path type: local, path is valid, is best path, no labeled nexthop
  AS-Path: NONE, path locally originated
    192.168.100.102 (metric 0) from 0.0.0.0 (192.168.77.102)
      Origin IGP, MED not set, localpref 100, weight 32768
      Extcommunity: ENCAP:8 RT:0102.0103.0234

  Path-id 1 advertised to peers:
    192.168.77.11

Example 1-3: sh bgp l2vp evpn route-type 4.

BGP EVPN Route-Type 4 Update message carries ES Import Route-Target Extended community Path Attribute, which value is the same that system-MAC used with the ES. This way each ES member switches are able to import the NLRI carried in BGP Update into BGP Loc-RIB.

Route Distinguisher is formed from BGP router Id, and from base value 26999 + ES port-Channel Id. The format of NLRI carried in BGP Update is illustrated in figure 1-3.


[4] = describes the BGP EVPN Route-Type 4 - Ethernet Segment Route.

[0301.0201.0302.3400.04d2] = the first part – 03, defines that the ESI value is MAC-based ESI. The second part is a combination of an Ethernet Segment and system-MAC, which together forms an actual ESI value. Note that HEX 04d2 Binary mode is 1234.

[32] = this is the length of the ip address of advertising switch

[192.168.100.102] = IP address attached to NVE interface (Loopback 100)


Figure 1-3: Ethernet Segment Route NLRI address format.


Capture 1-1 shows the BGP EVPN Update sent by Leaf-102 to Spine-11.

Frame 208: 160 bytes on wire (1280 bits), 160 bytes captured (1280 bits) on interface 0
Ethernet II, Src: 1e:af:01:02:1e:11, Dst: c0:8e:00:11:1e:12
Internet Protocol Version 4, Src: 192.168.77.102, Dst: 192.168.77.11
Transmission Control Protocol, Src Port: 179, Dst Port: 29824, Seq: 153, Ack: 153, Len: 94
Border Gateway Protocol - UPDATE Message
    Marker: ffffffffffffffffffffffffffffffff
    Length: 94
    Type: UPDATE Message (2)
    Withdrawn Routes Length: 0
    Total Path Attribute Length: 71
    Path attributes
        Path Attribute - ORIGIN: IGP
        Path Attribute - AS_PATH: empty
        Path Attribute - LOCAL_PREF: 100
        Path Attribute - EXTENDED_COMMUNITIES
            Flags: 0xc0, Optional, Transitive, Complete
            Type Code: EXTENDED_COMMUNITIES (16)
            Length: 16
            Carried extended communities: (2 communities)
                Encapsulation: VXLAN Encapsulation [Transitive Opaque]
                    Type: Transitive Opaque (0x03)
                    Subtype (Opaque): Encapsulation (0x0c)
                    Tunnel type: VXLAN Encapsulation (8)
                ES Import: RT: 01:02:01:03:02:34 [Transitive EVPN]
                    Type: Transitive EVPN (0x06)
                    Subtype (EVPN): ES Import (0x02)
                    ES-Import Route Target: 01:02:01:03:02:34
        Path Attribute - MP_REACH_NLRI
            Flags: 0x90, Optional, Extended-Length, Non-transitive, Complete
            Type Code: MP_REACH_NLRI (14)
            Length: 34
            Address family identifier (AFI): Layer-2 VPN (25)
            Subsequent address family identifier (SAFI): EVPN (70)
            Next hop network address (4 bytes)
            Number of Subnetwork points of attachment (SNPA): 0
            Network layer reachability information (25 bytes)
                EVPN NLRI: Ethernet Segment Route
                    Route Type: Ethernet Segment Route (4)
                    Length: 23
                    Route Distinguisher: 192.168.77.102:27233
                    ESI: 01:02:01:03:02:34, Discriminator: 00 04
                        ESI Type: ESI MAC address defined (3)
                        ESI system MAC: 01:02:01:03:02:34
                        ESI system mac discriminator: 00 04
                        Remaining bytes: d2
                    IP Address Length: 32
                    IPv4 address: 192.168.100.102
Capture 1-1: BGP Update concerning ESI sent by Leaf-102

Now the Leaf-103 is up and running. Example 1-4 shows the BGP table of Leaf-103 concerning BGP EVPN Ethernet Segment Route. It has received BGP Update from Leaf-102 and it is also installed into BGP Loc-RIB.


Leaf-103# sh bgp l2vpn evpn route-type 4
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:27233
BGP routing table entry for [4]:[0301.0201.0302.3400.04d2]:[32]:[192.168.100.102]/136, version 14
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW
Multipath: iBGP

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop
             Imported to 1 destination(s)
  AS-Path: NONE, path sourced internal to AS
    192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Extcommunity: ENCAP:8 RT:0102.0103.0234
      Originator: 192.168.77.102 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

Route Distinguisher: 192.168.77.103:27233   (ES [0301.0201.0302.3400.04d2 0])
BGP routing table entry for [4]:[0301.0201.0302.3400.04d2]:[32]:[192.168.100.102]/136, version 15
Paths: (1 available, best #1)
Flags: (0x000012) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW
Multipath: iBGP

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop
             Imported from 192.168.77.102:27233:[4]:[0301.0201.0302.3400.04d2]:[32]:[192.168.100.102]/136
  AS-Path: NONE, path sourced internal to AS
    192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Extcommunity: ENCAP:8 RT:0102.0103.0234
      Originator: 192.168.77.102 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer
BGP routing table entry for [4]:[0301.0201.0302.3400.04d2]:[32]:[192.168.100.103]/136, version 9
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on xmit-list, is not in l2rib/evpn
Multipath: iBGP

  Advertised path-id 1
  Path type: local, path is valid, is best path, no labeled nexthop
  AS-Path: NONE, path locally originated
    192.168.100.103 (metric 0) from 0.0.0.0 (192.168.77.103)
      Origin IGP, MED not set, localpref 100, weight 32768
      Extcommunity: ENCAP:8 RT:0102.0103.0234

  Path-id 1 advertised to peers:
    192.168.77.11
Example 1-4: BGP table of Leaf-103 concerning Ethernet Segment Route.

Now both Leaf-102 and Leaf-103 know that they share the same Ethernet Segment.


Designated Forwarder (DF)

Switches Leaf-102 and Leaf-103 are seen by ASW-104 as a one switch that is connected through the Port-Channel 234. Leaf switches choose Designated Forwarder (DF) among themselves to forward BUM (Broadcast, Unknown Unicast and Multicast) traffic to and from ES. If ES has more than one VLAN, the DF roles are load-balanced between leaf nodes, i.e. DF for VLAN 10 is Leaf-102 and DF for VLAN 11 is Leaf-103. The selection process uses the formula “i = V mod N”, where V represents VLAN Id and N represents a number of leaf switches in the redundancy group. The “i” is an ordinal of a leaf switch in the redundancy group. When Leaf-102 and Leaf-103 exchange BGP L2VPN EVPN Route-Type 4 (Ethernet Segment Route) their IP address is included in NLRI. Each switch sets these IP address learned from BGP Update in numerical order from lowest to highest. In the case of Leaf-102 and Leaf-103, the order is 192.168.100.102, 192.168.100.103. The lowest IP i.e. 192.168.100.102 gets ordinal zero (0) and the next one gets ordinal one (1) and so on.
Formula to calculate DF for VLAN 10 is
V mod N = i

V = 10 (VLAN Id)
N = 2 (number of leaf switches)
10 mod 2 = 0 > Leaf-102
(Remainders is zero (0) when 10 is divided by 2)
Ordinal zero is used by Leaf-102, so it will be the DF for VLAN 10.
Formula to calculate DF for VLAN 11 is


V mod N = i
V = 11 (VLAN Id)
N = 2 (number of leaf switches)
11 mod 2 = 0 > Leaf-103
(Remainders is one (1) when 11 is divided by 2)
Ordinal one is used by Leaf-103, so it will be the DF for VLAN 11.


If VLAN 12 is also attached to ES its DF will be Leaf-102

V mod N = i
V = 12 (VLAN Id)
N = 2 (number of leaf switches)
12 mod 2 = 0 > Leaf-102
(Remainders is one (0) when 12 is divided by 2)

Example 1-5 shows that Leaf-102 DF candidate list includes IP address 192.168.100.102 (Leaf-102) and 192.168.100.103 (Leaf-103). It also shows that there are two active VLANs (10 and 11) in this redundancy group and Leaf-102 is DF for VLAN 10.

Leaf-102# sh nve ethernet-segment

ESI: 0301.0201.0302.3400.04d2
   Parent interface: port-channel234
  ES State: Up
  Port-channel state: Up
  NVE Interface: nve1
   NVE State: Up
   Host Learning Mode: control-plane
  Active Vlans: 10-11
   DF Vlans: 10
   Active VNIs: 10000-10001
  CC failed for VLANs:
  VLAN CC timer: 0
  Number of ES members: 2
  My ordinal: 0
  DF timer start time: 00:00:00
  Config State: config-applied
  DF List: 192.168.100.102 192.168.100.103
  ES route added to L2RIB: True
  EAD/ES routes added to L2RIB: True
  EAD/EVI route timer age: not running
Example 1-5: show nve Ethernet-segment on Leaf-102.

Example 1-6 shows that Leaf-103 is DF for VLAN 11.

Leaf-103# sh nve ethernet-segment

ESI: 0301.0201.0302.3400.04d2
   Parent interface: port-channel234
  ES State: Up
  Port-channel state: Up
  NVE Interface: nve1
   NVE State: Up
   Host Learning Mode: control-plane
  Active Vlans: 10-11
   DF Vlans: 11
   Active VNIs: 10000-10001
  CC failed for VLANs:
  VLAN CC timer: 0
  Number of ES members: 2
  My ordinal: 1
  DF timer start time: 00:00:00
  Config State: config-applied
  DF List: 192.168.100.102 192.168.100.103
  ES route added to L2RIB: True
  EAD/ES routes added to L2RIB: True
  EAD/EVI route timer age: not running
Example 1-6: show nve Ethernet-segment on Leaf-103.

Figure 1-4 shows the Designated Forwarder per VLAN in Ethernet Segment 1234.


Figure 1-4: DF per VLAN.

VLAN Consistency Check


Though Leaf-102 and Leaf-103 share the same Ethernet Segment, they still are two individual switches without shared Control Plane (or Data Plane). This means that they do not know which VLANs are allowed on shared Port-Channel in other ES member switches. One solution is to use Cisco Fabric Service over IP (CFSoIP) service. It is used for discovering of CFS capable devices and then for VLAN Consistency Checking (CC) on Port-Channel. Figure 1-5 illustrates the addressing used in this section. Note that CFSoIP works only over the management interface. 





Figure 1-5: CFSoIP addressing Scheme.

Note! CFSoIP in EVPN multihoming implementation verifies the Port-Channel VLAN list, not the content of VLAN database.

Capture 1-2 shows the operation of CFSoIP. Leaf-102 start by introducing itself to other CFS capable devices on shared ES by sending IP/UDP packet to Multicast Group 239.102.103.10 using reserved UDP src/dst port 7546. This group is manually defined for ES 1234. Leaf-103, which already has joined the same Mcast Group, receives the message. It starts opening the TCP connection with Leaf-102, which is used for exchange the shortcut of allowed VLAN on Port-Channel 234.


Frame 99: 166 bytes on wire (1328 bits), 166 bytes captured (1328 bits) on interface 0
Ethernet II, Src: 50:00:00:03:00:00 (50:00:00:03:00:00), Dst: IPv4mcast_66:67:0a (01:00:5e:66:67:0a)
Internet Protocol Version 4, Src: 10.0.0.102, Dst: 239.102.103.10
User Datagram Protocol, Src Port: 7546, Dst Port: 7546

Frame 100: 74 bytes on wire (592 bits), 74 bytes captured (592 bits) on interface 0
Ethernet II, Src: 50:00:00:04:00:00 (50:00:00:04:00:00), Dst: 50:00:00:03:00:00 (50:00:00:03:00:00)
Internet Protocol Version 4, Src: 10.0.0.103, Dst: 10.0.0.102
Transmission Control Protocol, Src Port: 7546, Dst Port: 62414, Seq: 3182, Ack: 4062, Len: 8
Capture 1-2: CFSoIP session opening between Leaf-102 and Leaf-103.

The CFSoIP configuration is simple. CFSoIP is enabled with cfs ipv4 distribute command. The default Mcast group address is overridden with cfs ipv4 mcast <Mcast Group address>. EVPN ESI multihoming specific VLAN Consistency Check is enabled with vlan-consistency-check command under evp esi multihoming command. Leaf-102 CFSoIP related configuration is shown in example 1-7.


cfs ipv4 mcast-address 239.102.103.10
cfs ipv4 distribute
!
evpn esi multihoming
  vlan-consistency-check
!
interface mgmt0
  vrf member management
  ip address 10.0.0.102/24
Example 1-7: CFSoIP configuration in Leaf-102.

Example 1-8 shows that Leaf-102 and Leaf-103 are now CFS peers. Note that Leaf-103 name resolution is based on ip host Leaf-103 10.0.0.103 global command.


Leaf-102# show cfs peers name nve

Scope      : Physical-ip
-------------------------------------------------------------------------
 Switch WWN              IP Address
-------------------------------------------------------------------------
 20:00:50:00:00:03:00:07 10.0.0.102                          [Local]
                         Leaf-102
 20:00:50:00:00:04:00:07 10.0.0.103                          [Not Merged]
                         Leaf-103

Total number of entries = 2
Example 1-8: CFSoIP peer verification on Leaf-102.

At this moment, the VLAN databases in both leaf switches are the same, which can be seen from example 1-9 (there are no failed VLANs).


Leaf-102# sh nve ethernet-segment

ESI: 0301.0201.0302.3400.04d2
   Parent interface: port-channel234
  ES State: Up
  Port-channel state: Up
  NVE Interface: nve1
   NVE State: Up
   Host Learning Mode: control-plane
  Active Vlans: 10-11
   DF Vlans: 10
   Active VNIs: 10000-10001
  CC failed for VLANs:
  VLAN CC timer: 0
  Number of ES members: 2
  My ordinal: 0
  DF timer start time: 00:00:00
  Config State: config-applied
  DF List: 192.168.100.102 192.168.100.103
  ES route added to L2RIB: True
  EAD/ES routes added to L2RIB: True
  EAD/EVI route timer age: not running
----------------------------------------
Example 1-9: VLAN consistency verification on Leaf-102.

In case of adding vlan only to Port-Channel 234 on Leaf-103 the VLAN consistency check fails as can be seen from examples 1-10 and 1-11.


Leaf-102# sh nve ethernet-segment | i CC
  CC failed for VLANs: 10-11
  VLAN CC timer: 0
Example 1-10: VLAN consistency verification on Leaf-102.


Leaf-103# sh nve ethernet-segment | i CC
  CC failed for VLANs: 111
  VLAN CC timer: 0
Example 1-11: VLAN consistency verification on Leaf-103.


Layer 2 Gateway Spanning-Tree Protocol (L2G-STP)

Switches participating in Spanning-Tree domain selects STP-root switch based on the lowest STP Bridge ID (priority value + (Vlan Id or MST0) + system -MAC). If all access switches and leaf switches use the same STP priority value, the switch with the smallest system-MAC address will be selected as an STP root switch. This means that the STP root switch might be located in the access switch. This, in turn, might lead the situation where STP might block some of the high-speed uplinks (port-channel) from access switch to VXLAN Fabric leaf switches. This is why the STP root should be placed in VXLAN Fabric and to be more specific, the VXLAN Fabric should be seen as one switch acting as an STP root. This way the STP blocked port can be pushed to Inter-Switch in the access layer.

In order to EVPN ESI multihoming enabled VXLAN Fabric can be seen as one STP –root from the LAN access switches point of view, all leaf switches have to use same Bridge Id in BPDUs. By using Layer 2 Gateway Spanning-Tree Protocol (L2G-STP) mechanism on a switch, it starts automatically use the MAC address c84c.75fa.6000 as a part of a Bridge Id. This is crucial since leaf switches that share the same ES do not have shared Control Plane, and without L2G-STP they use different System MAC-address in LACP messages. When the L2G-STP is implemented in all leaf switches, the whole VXLAN fabric introduces itself to access switches as a pseudo-STP-root switch. However, L2G-STP mechanism does not affect the STP priority value and that is why you need to manually change it to lower value than what is used in access switches. Also, L2G-STP mechanism terminates the STP domain.


The L2G-STP mechanism is disabled by default and can be enabled with global configuration command spanning-tree domain enable. The configuration is shown in example 1-12 is already in place in Leaf-103 and Leaf-104. Spanning-Tree mode is set to MST, both VLANs 10 and 11 are mapped to instance 1. MST instance 0 and 1 both have priority 8192 which is lower than a priority in ASW-104 which is set to maximum value 61440. Figure 1-6 shows an example topology.




Figure 1-6: L2G-STP addressing Scheme.

spanning-tree mode mst
spanning-tree mst 0-1 priority 8192
spanning-tree mst configuration
  name VXLAN-Fabric
  instance 1 vlan 10-11
 Example 1-12: STP pre-configuration on Leaf-102 and Leaf-103.

After enabling L2G-STP on leaf switches, the system assigns the MAC address c84c.75fa.6000 to leaf switches. By doing this, ASW-104 sees the VXLAN fabric as one Pseudo-Switch that is the root for all VLANs.
Example 1-13 taken from ASW-104 shows that the STP root mac address c84c.75fa.6000 and the Port-Channel 234 STP port role is Root and the state is Forwarding.

ASW-104# sh spanning-tree vlan 10

MST0001
  Spanning tree enabled protocol mstp
  Root ID    Priority    8193
             Address     c84c.75fa.6000
             Cost        10000
             Port        4329 (port-channel234)
             Hello Time  2  sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    61441  (priority 61440 sys-id-ext 1)
             Address     5000.0006.0007
             Hello Time  2  sec  Max Age 20 sec  Forward Delay 15 sec

Interface        Role Sts Cost      Prio.Nbr Type
---------------- ---- --- --------- -------- ----------------------------
Po234            Root FWD 10000     128.4329 P2p
Eth1/3           Desg FWD 20000     128.3    P2p
Example 1-13: show spanning-tree vlan 10 on ASW-104.

Examples 1-14 and 1-15 and verifies that both leaf switches introduces themselves as an STP root for AWS-104.

Leaf-102# sh spanning-tree summary
Switch is in mst mode (IEEE Standard)
Root bridge for: MST0000-MST0001
L2 Gateway STP bridge for: MST0001
Port Type Default                        is disable
Edge Port [PortFast] BPDU Guard Default  is disabled
Edge Port [PortFast] BPDU Filter Default is disabled
Bridge Assurance                         is enabled
Loopguard Default                        is disabled
Pathcost method used                     is long
PVST Simulation                          is enabled
STP-Lite                                 is disabled

Name                   Blocking Listening Learning Forwarding STP Active
---------------------- -------- --------- -------- ---------- ----------
MST0000                      0         0        0          6          6
MST0001                      0         0        0          2          2
---------------------- -------- --------- -------- ---------- ----------
2 msts                       0         0        0          8          8
Example 1-14: show spanning-tree summary on Leaf-102.

Leaf-103# sh spanning-tree summary
Switch is in mst mode (IEEE Standard)
Root bridge for: MST0000-MST0001
L2 Gateway STP bridge for: MST0001
Port Type Default                        is disable
Edge Port [PortFast] BPDU Guard Default  is disabled
Edge Port [PortFast] BPDU Filter Default is disabled
Bridge Assurance                         is enabled
Loopguard Default                        is disabled
Pathcost method used                     is long
PVST Simulation                          is enabled
STP-Lite                                 is disabled

Name                   Blocking Listening Learning Forwarding STP Active
---------------------- -------- --------- -------- ---------- ----------
MST0000                      0         0        0          5          5
MST0001                      0         0        0          1          1
---------------------- -------- --------- -------- ---------- ----------
2 msts                       0         0        0          6          6
Example 1-15: show spanning-tree summary on Leaf-103.

Now if the STP priority is changed to zero (lower than 8192 ) in ASW-104, both leaf switches will place the Interface Port-Channel 234 in *L2GW_Inc/Blocking state (example 1-16). This way the VXLAN fabric is secured from the failure events caused by possible forwarding loops.


Leaf-102# sh spanning-tree interface po 234

Mst Instance     Role Sts Cost      Prio.Nbr Type
---------------- ---- --- --------- -------- ----------------------------
MST0000          Root FWD 20000     128.4329 P2p

MST0001          Desg BKN*20000     128.4329 P2p *L2GW_Inc
Example 1-16: show spanning-tree interface po 234 on Leaf-103.


Core Link Tracking

In a normal situation, the default LACP hashing algorithm on ASW-104 might choose the interface g0/1 for data flow from Abba to Cafe.  In a situation where Leaf-102 loses its connectivity to Spine-11, ASW-104 continues sending data to Leaf-102 because it is not aware of the indirect link failure. The Core-Link Tracking mechanism protects against sudden packet loss caused by this kind of failure event where upstream leaf switch loses its connection to the core (all leaf-to-spine Inter-Switch links are down).

Core-Link Tracking mechanism shut down links attached to the Ethernet Segment if all links to the spine are down. Core-Link Tracking is enabled under uplink interface configuration by using command evpn multihoming core-tracking (example 1-7).


interface Ethernet1/1
  evpn multihoming core-tracking
Example 1-17: Core-Link tracking on Leaf-102.

The figure illustrates the situation where interface E1/1 goes down on Leaf-102. As a reaction to this event, the Core-Tracking mechanism set all local interfaces attached to any EVPN Ethernet Segment to down-state. ASW-104 notice this direct link failure and the interface E1/1 state is changed to down. Now the only operational interface on Port-Channel 234 is the interface E1/2 and all traffic are sent over it.

Figure 1-7: Core-Link Tracking.


Leaf-102(config-if)# sh int e1/2
Ethernet1/2 is down (NVE core link down)
admin state is up, Dedicated Interface
  Belongs to Po234
Example 1-18: Core-Link down on Leaf-102.


Leaf-102# sh int po 234 | i 234
port-channel234 is down (NVE core link down)
Leaf-102# sh port-channel summ | b Group
Group Port-       Type     Protocol  Member Ports
      Channel
-----------------------------------------------------------------
234   Po234(SD)   Eth      LACP      Eth1/2(D)
Example 1-19: Port-Channel 234 down on Leaf-102.


ASW-104# sh port-channel summary | b Group
Group Port-       Type     Protocol  Member Ports
      Channel
-----------------------------------------------------------------
234   Po234(SU)   Eth      LACP      Eth1/1(D)    Eth1/2(P)
Example 1-20: Interface-E1/1 down on ASW-104.

  
Author: Toni Pasanen CCIE#28158
Published: 29.5.2019
Updated: 
-------------------------------------------------
References:


RFC 7432: BGP MPLS-Based Ethernet VPN

RFC 8214: Virtual Private Wire Service Support in Ethernet VPN

RFC 8584: Framework for Ethernet VPN Designated Forwarder Election Extensibility


United State Patent: Patent No US 8.559,341 B2:
SYSTEM AND METHOD FOR PROVIDING A LOOP FREE TOPOLOGY IN A NETWORK ENVIRONMENT

21 comments:

  1. Hi Toni,
    I went through your excellent VXLAN series and I have to say it is well organised and informative.
    this resolve many of my confusion about Vxlan and EVPN.

    for this multihoming part, would you please inform:
    1. I believe it can be achieved via vPC, so can I say this is an alternative for vPC solution.
    2. could multihoming gateways be located in different physical addresses,say two different DC.

    All the Best
    Michael

    ReplyDelete
    Replies
    1. Hi Michael,

      Nice to hear that you have found answers to your questions here.
      Q1: Yes, both are used for multihoming purposes. EVPN ESI Multihoming is a standard solution and vPC is Cisco proprietary.
      Q2: Yes.

      Delete
  2. HI Toni,
    I found another point I could not understand, in the stp part, you mention when the stp priority changed to lower than 8912 then portchannel on ES will be blocked. do you mean if other switches has a lower priority or one of the ES peer has a lower priority?
    As long as I can understand ES peer should just be the root, why not just set their priority to the lowest?

    Cheers
    Michael

    ReplyDelete
    Replies
    1. Hi Michael
      I use STP priority 8192 for demonstrating the failure event where the root is not in the VXLAN fabric. STP priority 0 for root is good practice. Thanks for pointing out the confusing STP text. I changed it to this: ” Now if the STP priority is changed to zero (lower than 8192 ) in ASW-104”.
      Cheers -Toni

      Delete
    2. I should thank you for your explanation!
      you are the selfless kind of people who will share knowledge with others

      and another thing I noticed is that you would like to admit your mistake and correct in time in your blog,I can see from your previous conversation with others.

      All the best

      Michael

      Delete
    3. I really appreciate your kind words.
      I think that mistakes will help to memorize complex things, but the same mistake is allowed only once :-)

      Delete
  3. Clear explanations like always!

    While reading vPC and ESI in config guide, I found this thing:
    • EVPN Multihoming is supported on the Cisco Nexus 9300 platform switches only and it is not supported
    on the Cisco Nexus 9200, 9300-EX/-FX/-FXP/-FX2 and 9500 platform switches. The Cisco Nexus 9500
    platform switches can be used as Spine switches, but they cannot be used as VTEPs.
    But 9300-EX/-FX/-FXP/-FX2 are the most widely used platforms, I checked on live boxes, there is no 'evpn esi multihoming' command. So the only way to get LACP from two different leafs on these platforms is vPC?

    ReplyDelete
    Replies
    1. Either with a traditional vPC or with Fabric Peering
      https://blogs.cisco.com/datacenter/change-is-the-only-constant-vpc-with-fabric-peering-for-vxlan-evpn

      Delete
    2. I agree, this is announced, but seems working only from version 9.2.3, while at this moment Cisco recommends 7.x for production use https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/recommended_release/b_Minimum_and_Recommended_Cisco_NX-OS_Releases_for_Cisco_Nexus_9000_Series_Switches.html

      Delete
    3. This comment has been removed by the author.

      Delete
    4. Hi guys, ESI is supported on FX2 N9K switches only. I tried a similar ESI set-up on VIRL.I guess Toni you did the same here. The problem I encountered and this probably has to do with VIRL is that I cannot ping my default gateway(SVI) on both VTEP switches from my hosts. I tried a switch with an SVI and an Ubuntu server but no luck. The problem I experienced has to do with the ARP reply from the VTEP side which does select the VNI id to send the message instead the vlan id. Am I doing something wrong? Please let me know if you can ping your DG. I am not talking about directly connected users but either a device in vPC or a host behind a vPC.

      Thanks,

      George

      Delete
    5. vPC or vPC+ or NX-OSv 9.3.1 :-)

      Delete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. This is the best artical to explain evpn multhoming !!!

    one tiny typo I think,
    DF for VLAN 10 is Leaf-102 and DF for VLAN 11 is Leaf-104.

    should be
    DF for VLAN 10 is Leaf-102 and DF for VLAN 11 is Leaf-103.

    ReplyDelete
    Replies
    1. Thanks for notifying, I just fixed it. I need to fix it also in my book :)

      Delete
  6. Hi, Toni.

    Congratulations on doing such a good job. You're very kind sharing with us all your experiences with this technology. It really helps a lot!

    Just one question. Dou you think using ESi for multihoming is a mature option for a production environment or still it's better be cautious and stay with vPC?

    Thanks again!

    Regards

    ReplyDelete
    Replies
    1. Hi Fran,
      I'd rather use a vPC dual-homing with Cisco devices for the sake of simplicity (It keeps BGP table cleaner:). This however is not recommendation, only my personal opinion.

      Delete
  7. Wow, marvelous blog layout! How long have you been blogging for? you make blogging look easy. The overall look of your web site is great, let alone the content!

    ReplyDelete
  8. I published my first blog post on March 2017, so it was three and a half year ago. Thank you for your positive feedback. I am currently working on a new book but I will continue with this blog.

    ReplyDelete
  9. Just Amazing. Thank you Toni

    ReplyDelete
  10. Thanks for the amazing blog post!
    I have two questions: on the Cisco site (here: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/93x/vxlan/configuration/guide/b-cisco-nexus-9000-series-nx-os-vxlan-configuration-guide-93x/b-cisco-nexus-9000-series-nx-os-vxlan-configuration-guide-93x_appendix_011001.html#task_scx_2pq_zfb) it seems that the spanning-tree domain must be disabled at the end of the configuration of the domain itself. Isn't it something like a mistake in the documentations?
    Can you confirm that it is possible to use RSTP instead of MST?
    Again, thanks for the amazing content here.

    ReplyDelete