Now you can also download my VXLAN book from the Leanpub.com
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)
This chapter introduces the standard based EVPN ESI Multi-homing solution in BGP EVPN VXLAN Fabric. It starts by explaining the mechanism of how CE device (Access switch or host) can be attached to two or more independent PE devices (Leaf switches) by using Port-Channel. This section discusses the concept of Ethernet Segment and Port-Channel. Next, this chapter explains how the BGP EVPN Route-Type 4 (Ethernet Segment Route) is for creating the redundancy group between the switches that share the ES. This section introduces the BGP EVPN Route-Type 4 NLRI address format. In addition, this chapter shows how switches belonging to the same redundancy group selects the Designated Forwarder (DF) for BUM traffic among themselves. Also, this chapter introduces the VLAN Consistency Check by using Cisco Fabric Service over IP (CFSoIP). The last two sections explain the Layer 2 Gateway Spanning-Tree (L2G-STP) mechanism and Core-Link Tracking system.
Part II introduces the BGP EVPN Route-Type 1 (Ethernet Auto-Discovery) and how it is used for convergence. Part III discusses the data flows between the hosts in normal and failure situation. Part II and III will be published later.
Figure 1-1: The VXLAN
EVPN Multi-homing topology and addressing scheme.
Introduction
In the above figure 1-1, ASW-104 is connected to Leaf-102 and Leaf-103 via logical port-channel 234 that is bundled from the interfaces E1/1 - 2 by using Link Aggregation Control Protocol (LACP). Leaf-102 and Leaf-103 are both connected to ASW-104 via interface E1/2, which are defined to be part of the port-channel 234. However, Leaf-102 and Leaf-103 are standalone switches without Multi-chassis Ether-Channel Trunk (MCT) between them. To be able to introduce themselves to ASW-104 as a single switch, Leaf -102 and Leaf-103 has to first, know that they belong to the same redundancy group and second, introduce the same system-MAC address to AWS-104 so it is able to bundle uplinks to port-channel. Also, leaf switches have to decide which one is allowed to forward BUM traffic (per VLAN) to and from the ES. In addition, the Spanning-Tree root has to be in Leaf switches. To protection against packet loss caused by an uplink failure on either leaf switches (AWS-104 does not recognize these failure events), also Core Link Tracking should be enabled on uplink ports of leaf switches. To protect against VLAN misconfiguration on leaf switches, Cisco Fabric Service over IP (CFSoIP) should be implemented used on leaf switches that share the ES. In order for leaf and spine switches to do Equal Cost Multi-Pathing (ECMP) for VXLAN encapsulated frames, the maximum-paths for iBGP has to be adjusted.
Ethernet Segment Identifier (ESI) and Port-Channel
EVPN ESI multi-homing is enabled using evpn esi multihoming global configuration command. Interfaces E1/1 and E1/2 in ASW-104 are bundled to the port-channel 234 while interface E1/2 in both Leaf-102 and Leaf-103 participate in the port-channel 234 even though interfaces are not bundled together because leaf switches are stand-alone devices. From the ASW-104 perspective, the port-channel 234 is just a normal port-channel. From the Leaf-102 and Leaf-103 perspective, the port-channel represents an EVPN Ethernet Segment (ES).
ES is activated using ethernet-segment <Id> command with system-mac < mac> sub-command under the interface port-channel 234. Even though it looks like that the ethernet segment -command defines the Ethernet Segment Identifier (ESI), it only defines part of it called ES Local Discriminator (ES LD). The actual ESI consists of three parts; the first octet defines the type of the ESI, which in case of Cisco NX-OS is MAC-based ESI value (0x03). Next six octets are taken from the system-MAC configuration. The last three octet includes the ES LD value defined under the interface port-channel. Thus, the Ethernet Segment Identifier in this example scenario is 03.01.02.0103.02.34. In addition to using the system-MAC as a part of the ESI value, it is also used in LACP messages Actor System field to represent local system-MAC address. Since both Leaf-102 and Leaf-103 uses the same system-MAC, ASW-104 sees them as a one switch and is able to bring up the port-channel interface.
Example 1-2 shows the configuration used in both Leaf-102 and Leaf-103. Figure 1-2 illustrates the physical topology and addressing scheme as well as LACP message exchanges between switches. Note that the source and destination MAC addresses used in Ethernet header are the real system MAC addresses.
evpn esi multihoming
!
interface
port-channel234
switchport mode trunk
switchport trunk allowed vlan 10-11
ethernet-segment 1234
system-mac 0102.0103.0234
!
Interface
Ethernet1/2
Switchport mode trunk
Switchport trunk allowed vlan 10,11
Channel-group 234 mode active
Example 1-1: Enabling EVPN multi-homing on Leaf switches
Figure 1-2: EVPN
Multihoming
Example 1-2 shows that both interface E1/1 and E1/2 on switch ASW-104 are participating in Port-Channel 234.
ASW-104#
show port-channel summary | b Group
Group
Port- Type Protocol
Member Ports
Channel
-------------------------------------------------------------------------
234 Po234(SU)
Eth LACP Eth1/1(P) Eth1/2(P)
Example 1-2: Port-channel 234 state on ASW-104.
Redundancy Group
Switches belonging to the same Ethernet Segment (ES) needs to introduce themselves to each other as an ES member. In addition, they have to decide who will be a Designated Forwarder (DF) for given ES segment i.e. who is responsible for forwarding BUM traffic (Broadcast, Unknown Unicast, and Multicast traffic) to and from the ES. The introduction is made by using BGP EVPN Route-Type 4 (Ethernet Segment Route) BGP update message.
Enabling EVPN ESI multihoming generates the BGP EVPN Route-type 4 (Ethernet Segment Route) NLRI into BGP Loc-RIB table, which is advertised to BGP EVPN afi peer switches. Example 1-3 illustrates the route-type 4 NLRI entry originated by Leaf-102. Note that Leaf-103 is down at this stage and there is only NLRI originated by Leaf-102, this way the output of BGP table is simpler.
Leaf-102#
sh bgp l2vp evpn route-type 4
BGP
routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:27233 (ES [0301.0201.0302.3400.04d2 0])
BGP
routing table entry for [4]:[0301.0201.0302.3400.04d2]:[32]:[192.168.100.102]/136,
version 3
Paths:
(1 available, best #1)
Flags:
(0x000002) (high32 00000000) on xmit-list, is not in l2rib/evpn
Multipath:
iBGP
Advertised path-id 1
Path type: local, path is valid, is best path, no labeled nexthop
AS-Path: NONE, path locally originated
192.168.100.102 (metric 0) from 0.0.0.0
(192.168.77.102)
Origin IGP, MED not set, localpref 100, weight 32768
Extcommunity: ENCAP:8 RT:0102.0103.0234
Path-id 1 advertised to peers:
192.168.77.11
Example 1-3: sh bgp l2vp evpn route-type 4.
Route Distinguisher is formed from BGP router Id, and from base value 26999 + ES port-Channel Id. The format of NLRI carried in BGP Update is illustrated in figure 1-3.
[4] = describes the
BGP EVPN Route-Type 4 - Ethernet Segment Route.
[0301.0201.0302.3400.04d2] = the first
part – 03, defines that the ESI value is MAC-based ESI. The second part is a
combination of an Ethernet Segment and
system-MAC, which together forms an actual ESI value. Note that HEX 04d2 Binary
mode is 1234.
[32]
=
this is the length of the ip address of advertising switch
[192.168.100.102] = IP address
attached to NVE interface (Loopback 100)
Figure 1-3: Ethernet Segment Route NLRI address format.
Capture 1-1 shows the BGP EVPN Update sent by Leaf-102 to
Spine-11.
Frame
208: 160 bytes on wire (1280 bits), 160 bytes captured (1280 bits) on interface
0
Ethernet
II, Src: 1e:af:01:02:1e:11, Dst: c0:8e:00:11:1e:12
Internet
Protocol Version 4, Src: 192.168.77.102, Dst: 192.168.77.11
Transmission
Control Protocol, Src Port: 179, Dst Port: 29824, Seq: 153, Ack: 153, Len: 94
Border
Gateway Protocol - UPDATE Message
Marker: ffffffffffffffffffffffffffffffff
Length: 94
Type: UPDATE Message (2)
Withdrawn Routes Length: 0
Total Path Attribute Length: 71
Path attributes
Path Attribute - ORIGIN: IGP
Path Attribute - AS_PATH: empty
Path Attribute - LOCAL_PREF: 100
Path Attribute - EXTENDED_COMMUNITIES
Flags: 0xc0, Optional, Transitive,
Complete
Type Code: EXTENDED_COMMUNITIES
(16)
Length: 16
Carried extended communities: (2
communities)
Encapsulation: VXLAN
Encapsulation [Transitive Opaque]
Type: Transitive Opaque
(0x03)
Subtype (Opaque):
Encapsulation (0x0c)
Tunnel type: VXLAN
Encapsulation (8)
ES Import: RT:
01:02:01:03:02:34 [Transitive EVPN]
Type: Transitive EVPN (0x06)
Subtype (EVPN): ES Import
(0x02)
ES-Import Route Target: 01:02:01:03:02:34
Path Attribute - MP_REACH_NLRI
Flags: 0x90, Optional,
Extended-Length, Non-transitive, Complete
Type Code: MP_REACH_NLRI (14)
Length: 34
Address family identifier (AFI):
Layer-2 VPN (25)
Subsequent address family
identifier (SAFI): EVPN (70)
Next hop network address (4 bytes)
Number of Subnetwork points of
attachment (SNPA): 0
Network layer reachability
information (25 bytes)
EVPN NLRI: Ethernet Segment
Route
Route Type: Ethernet
Segment Route (4)
Length: 23
Route Distinguisher: 192.168.77.102:27233
ESI: 01:02:01:03:02:34,
Discriminator: 00 04
ESI Type: ESI MAC
address defined (3)
ESI system MAC: 01:02:01:03:02:34
ESI system mac
discriminator: 00 04
Remaining bytes: d2
IP Address Length: 32
IPv4 address:
192.168.100.102
Capture 1-1: BGP Update concerning ESI sent by Leaf-102
Leaf-103#
sh bgp l2vpn evpn route-type 4
BGP
routing table information for VRF default, address family L2VPN EVPN
Route
Distinguisher: 192.168.77.102:27233
BGP
routing table entry for
[4]:[0301.0201.0302.3400.04d2]:[32]:[192.168.100.102]/136, version 14
Paths:
(1 available, best #1)
Flags:
(0x000002) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW
Multipath:
iBGP
Advertised path-id 1
Path type: internal, path is valid, is best path,
no labeled nexthop
Imported to 1 destination(s)
AS-Path: NONE, path sourced internal to AS
192.168.100.102 (metric 81) from
192.168.77.11 (192.168.77.111)
Origin IGP, MED not set, localpref 100,
weight 0
Extcommunity: ENCAP:8 RT:0102.0103.0234
Originator: 192.168.77.102 Cluster list:
192.168.77.111
Path-id 1 not advertised to any peer
Route
Distinguisher: 192.168.77.103:27233 (ES
[0301.0201.0302.3400.04d2 0])
BGP
routing table entry for
[4]:[0301.0201.0302.3400.04d2]:[32]:[192.168.100.102]/136, version 15
Paths:
(1 available, best #1)
Flags:
(0x000012) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW
Multipath:
iBGP
Advertised path-id 1
Path type: internal, path is valid, is best path,
no labeled nexthop
Imported from
192.168.77.102:27233:[4]:[0301.0201.0302.3400.04d2]:[32]:[192.168.100.102]/136
AS-Path: NONE, path sourced internal to AS
192.168.100.102 (metric 81) from
192.168.77.11 (192.168.77.111)
Origin IGP, MED not set, localpref 100,
weight 0
Extcommunity: ENCAP:8 RT:0102.0103.0234
Originator: 192.168.77.102 Cluster list:
192.168.77.111
Path-id 1 not advertised to any peer
BGP
routing table entry for
[4]:[0301.0201.0302.3400.04d2]:[32]:[192.168.100.103]/136, version 9
Paths:
(1 available, best #1)
Flags:
(0x000002) (high32 00000000) on xmit-list, is not in l2rib/evpn
Multipath:
iBGP
Advertised path-id 1
Path type: local, path is valid, is best path, no labeled nexthop
AS-Path: NONE, path locally originated
192.168.100.103 (metric 0) from 0.0.0.0
(192.168.77.103)
Origin IGP, MED not set, localpref 100, weight 32768
Extcommunity: ENCAP:8 RT:0102.0103.0234
Path-id
1 advertised to peers:
192.168.77.11
Example 1-4: BGP table of Leaf-103 concerning Ethernet
Segment Route.
Designated Forwarder (DF)
Switches Leaf-102 and Leaf-103 are seen by ASW-104 as a one switch that is connected through the Port-Channel 234. Leaf switches choose Designated Forwarder (DF) among themselves to forward BUM (Broadcast, Unknown Unicast and Multicast) traffic to and from ES. If ES has more than one VLAN, the DF roles are load-balanced between leaf nodes, i.e. DF for VLAN 10 is Leaf-102 and DF for VLAN 11 is Leaf-103. The selection process uses the formula “i = V mod N”, where V represents VLAN Id and N represents a number of leaf switches in the redundancy group. The “i” is an ordinal of a leaf switch in the redundancy group. When Leaf-102 and Leaf-103 exchange BGP L2VPN EVPN Route-Type 4 (Ethernet Segment Route) their IP address is included in NLRI. Each switch sets these IP address learned from BGP Update in numerical order from lowest to highest. In the case of Leaf-102 and Leaf-103, the order is 192.168.100.102, 192.168.100.103. The lowest IP i.e. 192.168.100.102 gets ordinal zero (0) and the next one gets ordinal one (1) and so on.
Formula to calculate DF for VLAN 10 is
V
mod N = i
V = 10 (VLAN Id)
N = 2 (number of leaf switches)
10 mod 2 = 0 > Leaf-102
(Remainders is
zero (0) when 10 is divided by 2)
Ordinal zero is used by Leaf-102, so it
will be the DF for VLAN 10.
Formula
to calculate DF for VLAN 11 is
V
mod N = i
V = 11 (VLAN Id)
N = 2 (number of leaf switches)
11 mod 2 = 0 > Leaf-103
(Remainders is one
(1) when 11 is divided by 2)
Ordinal one is used by Leaf-103, so it
will be the DF for VLAN 11.
If VLAN 12 is also attached to ES its DF
will be Leaf-102
V
mod N = i
V = 12 (VLAN Id)
N = 2 (number of leaf switches)
12 mod 2 = 0 > Leaf-102
(Remainders is one
(0) when 12 is divided by 2)
Example 1-5 shows that Leaf-102 DF
candidate list includes IP address 192.168.100.102 (Leaf-102) and
192.168.100.103 (Leaf-103). It also shows that there are two active VLANs (10 and 11) in this redundancy group and
Leaf-102 is DF for VLAN 10.
Leaf-102#
sh nve
ethernet-segment
ESI:
0301.0201.0302.3400.04d2
Parent interface: port-channel234
ES State: Up
Port-channel state: Up
NVE Interface: nve1
NVE State: Up
Host Learning Mode: control-plane
Active Vlans: 10-11
DF Vlans: 10
Active VNIs: 10000-10001
CC failed for VLANs:
VLAN CC timer: 0
Number of ES members: 2
My ordinal: 0
DF timer start time: 00:00:00
Config State: config-applied
DF List: 192.168.100.102 192.168.100.103
ES route added to L2RIB: True
EAD/ES routes added to L2RIB: True
EAD/EVI route timer age: not running
Example 1-5: show nve Ethernet-segment on Leaf-102.
Example 1-6 shows that Leaf-103 is DF for VLAN 11.
Leaf-103#
sh nve ethernet-segment
ESI:
0301.0201.0302.3400.04d2
Parent interface: port-channel234
ES State: Up
Port-channel state: Up
NVE Interface: nve1
NVE State: Up
Host Learning Mode: control-plane
Active Vlans: 10-11
DF Vlans: 11
Active VNIs: 10000-10001
CC failed for VLANs:
VLAN CC timer: 0
Number of ES members: 2
My ordinal: 1
DF timer start time: 00:00:00
Config State: config-applied
DF List: 192.168.100.102 192.168.100.103
ES route added to L2RIB: True
EAD/ES routes added to L2RIB: True
EAD/EVI route timer age: not running
Example 1-6: show nve Ethernet-segment on Leaf-103.
Figure 1-4: DF per VLAN.
Though Leaf-102 and Leaf-103 share the same Ethernet Segment, they still are two individual switches without shared Control Plane (or Data Plane). This means that they do not know which VLANs are allowed on shared Port-Channel in other ES member switches. One solution is to use Cisco Fabric Service over IP (CFSoIP) service. It is used for discovering of CFS capable devices and then for VLAN Consistency Checking (CC) on Port-Channel. Figure 1-5 illustrates the addressing used in this section. Note that CFSoIP works only over the management interface.
Figure 1-5: CFSoIP addressing Scheme.
Note! CFSoIP in EVPN multihoming
implementation verifies the Port-Channel VLAN list, not the content of VLAN
database.
Capture 1-2 shows the operation of CFSoIP. Leaf-102 start by introducing itself to other CFS capable devices on shared ES by sending IP/UDP packet to Multicast Group 239.102.103.10 using reserved UDP src/dst port 7546. This group is manually defined for ES 1234. Leaf-103, which already has joined the same Mcast Group, receives the message. It starts opening the TCP connection with Leaf-102, which is used for exchange the shortcut of allowed VLAN on Port-Channel 234.
Frame 99: 166 bytes on
wire (1328 bits), 166 bytes captured (1328 bits) on interface 0
Ethernet
II, Src: 50:00:00:03:00:00 (50:00:00:03:00:00), Dst: IPv4mcast_66:67:0a
(01:00:5e:66:67:0a)
Internet
Protocol Version 4, Src: 10.0.0.102, Dst:
239.102.103.10
User Datagram Protocol, Src Port: 7546, Dst Port:
7546
Frame 100: 74 bytes on
wire (592 bits), 74 bytes captured (592 bits) on interface 0
Ethernet
II, Src: 50:00:00:04:00:00 (50:00:00:04:00:00), Dst: 50:00:00:03:00:00
(50:00:00:03:00:00)
Internet
Protocol Version 4, Src: 10.0.0.103, Dst:
10.0.0.102
Transmission
Control Protocol, Src Port: 7546,
Dst Port: 62414, Seq: 3182, Ack: 4062, Len: 8
Capture 1-2: CFSoIP session opening between Leaf-102 and
Leaf-103.
cfs ipv4 mcast-address 239.102.103.10
cfs ipv4 distribute
!
evpn esi
multihoming
vlan-consistency-check
!
interface
mgmt0
vrf member management
ip address 10.0.0.102/24
Example 1-7: CFSoIP configuration in Leaf-102.
Leaf-102#
show cfs
peers name nve
Scope : Physical-ip
-------------------------------------------------------------------------
Switch WWN IP Address
-------------------------------------------------------------------------
20:00:50:00:00:03:00:07 10.0.0.102 [Local]
Leaf-102
20:00:50:00:00:04:00:07 10.0.0.103 [Not Merged]
Leaf-103
Total
number of entries = 2
Example 1-8: CFSoIP peer verification on Leaf-102.
Leaf-102#
sh nve
ethernet-segment
ESI:
0301.0201.0302.3400.04d2
Parent interface: port-channel234
ES State: Up
Port-channel state: Up
NVE Interface: nve1
NVE State: Up
Host Learning Mode: control-plane
Active Vlans: 10-11
DF Vlans: 10
Active VNIs: 10000-10001
CC failed for VLANs:
VLAN CC timer: 0
Number of ES members: 2
My ordinal: 0
DF timer start time: 00:00:00
Config State: config-applied
DF List: 192.168.100.102 192.168.100.103
ES route added to L2RIB: True
EAD/ES routes added to L2RIB: True
EAD/EVI route timer age: not running
----------------------------------------
Example 1-9: VLAN consistency verification on Leaf-102.
Leaf-102#
sh nve
ethernet-segment | i CC
CC failed for VLANs: 10-11
VLAN CC timer: 0
Example 1-10: VLAN consistency verification on Leaf-102.
Leaf-103#
sh nve
ethernet-segment | i CC
CC failed for VLANs: 111
VLAN CC timer: 0
Example 1-11: VLAN consistency verification on Leaf-103.
Layer
2 Gateway Spanning-Tree Protocol (L2G-STP)
In order to EVPN ESI multihoming enabled VXLAN Fabric can be seen as one STP –root from the LAN access switches point of view, all leaf switches have to use same Bridge Id in BPDUs. By using Layer 2 Gateway Spanning-Tree Protocol (L2G-STP) mechanism on a switch, it starts automatically use the MAC address c84c.75fa.6000 as a part of a Bridge Id. This is crucial since leaf switches that share the same ES do not have shared Control Plane, and without L2G-STP they use different System MAC-address in LACP messages. When the L2G-STP is implemented in all leaf switches, the whole VXLAN fabric introduces itself to access switches as a pseudo-STP-root switch. However, L2G-STP mechanism does not affect the STP priority value and that is why you need to manually change it to lower value than what is used in access switches. Also, L2G-STP mechanism terminates the STP domain.
The L2G-STP mechanism is disabled by default and can be enabled
with global configuration command spanning-tree
domain enable. The configuration is shown
in example 1-12 is already in place in Leaf-103 and Leaf-104. Spanning-Tree
mode is set to MST, both VLANs 10 and 11 are mapped to instance 1. MST instance
0 and 1 both have priority 8192 which is lower than a priority in ASW-104 which is set to maximum value 61440. Figure 1-6
shows an example topology.
Figure 1-6: L2G-STP addressing Scheme.
spanning-tree
mode mst
spanning-tree
mst 0-1 priority 8192
spanning-tree
mst configuration
name VXLAN-Fabric
instance 1 vlan 10-11
Example 1-12: STP
pre-configuration on Leaf-102 and Leaf-103.
After enabling L2G-STP on leaf switches, the system assigns
the MAC address c84c.75fa.6000 to leaf switches. By doing this, ASW-104 sees
the VXLAN fabric as one Pseudo-Switch that is the root for all VLANs.
Example 1-13 taken from ASW-104 shows that the STP root mac
address c84c.75fa.6000 and the Port-Channel 234 STP port role is Root and the
state is Forwarding.
ASW-104#
sh spanning-tree vlan 10
MST0001
Spanning tree enabled protocol mstp
Root ID
Priority 8193
Address c84c.75fa.6000
Cost 10000
Port 4329 (port-channel234)
Hello Time 2
sec Max Age 20 sec Forward Delay 15 sec
Bridge ID
Priority 61441 (priority 61440 sys-id-ext 1)
Address 5000.0006.0007
Hello Time 2
sec Max Age 20 sec Forward Delay 15 sec
Interface Role Sts Cost Prio.Nbr Type
----------------
---- --- --------- -------- ----------------------------
Po234 Root FWD 10000 128.4329 P2p
Eth1/3 Desg FWD 20000 128.3
P2p
Example 1-13: show spanning-tree vlan 10 on ASW-104.
Examples 1-14 and 1-15 and verifies that both leaf switches introduces
themselves as an STP root for AWS-104.
Leaf-102#
sh spanning-tree summary
Switch is in mst mode (IEEE Standard)
Root
bridge for: MST0000-MST0001
L2
Gateway STP bridge for: MST0001
Port
Type Default is
disable
Edge
Port [PortFast] BPDU Guard Default is
disabled
Edge
Port [PortFast] BPDU Filter Default is disabled
Bridge
Assurance is
enabled
Loopguard
Default is
disabled
Pathcost method used is long
PVST
Simulation is enabled
STP-Lite is disabled
Name Blocking Listening Learning
Forwarding STP Active
----------------------
-------- --------- -------- ---------- ----------
MST0000 0 0
0 6 6
MST0001 0 0 0 2 2
----------------------
-------- --------- -------- ---------- ----------
2
msts 0 0 0 8 8
Example 1-14: show spanning-tree summary on Leaf-102.
Leaf-103#
sh spanning-tree summary
Switch is in mst mode (IEEE Standard)
Root
bridge for: MST0000-MST0001
L2
Gateway STP bridge for: MST0001
Port
Type Default is
disable
Edge
Port [PortFast] BPDU Guard Default is
disabled
Edge
Port [PortFast] BPDU Filter Default is disabled
Bridge
Assurance is
enabled
Loopguard
Default is
disabled
Pathcost method used is long
PVST
Simulation is
enabled
STP-Lite is disabled
Name Blocking Listening Learning
Forwarding STP Active
----------------------
-------- --------- -------- ---------- ----------
MST0000 0 0 0 5 5
MST0001 0 0 0 1 1
----------------------
-------- --------- -------- ---------- ----------
2
msts 0 0 0
6 6
Example 1-15: show spanning-tree summary on Leaf-103.
Leaf-102#
sh spanning-tree interface po 234
Mst
Instance Role Sts Cost Prio.Nbr Type
----------------
---- --- --------- -------- ----------------------------
MST0000 Root FWD 20000 128.4329 P2p
MST0001 Desg BKN*20000 128.4329 P2p *L2GW_Inc
Example 1-16: show spanning-tree interface po 234 on Leaf-103.
Core Link Tracking
In a normal situation, the default LACP hashing algorithm on ASW-104 might choose the interface g0/1 for data flow from Abba to Cafe. In a situation where Leaf-102 loses its connectivity to Spine-11, ASW-104 continues sending data to Leaf-102 because it is not aware of the indirect link failure. The Core-Link Tracking mechanism protects against sudden packet loss caused by this kind of failure event where upstream leaf switch loses its connection to the core (all leaf-to-spine Inter-Switch links are down).
Core-Link Tracking mechanism shut down links attached to the Ethernet Segment if all links to the spine are down. Core-Link Tracking is enabled under uplink interface configuration by using command evpn multihoming core-tracking (example 1-7).
interface
Ethernet1/1
evpn multihoming core-tracking
Example 1-17: Core-Link tracking on Leaf-102.
The figure illustrates the
situation where interface E1/1 goes down on Leaf-102. As a reaction to this
event, the Core-Tracking mechanism set all local interfaces attached to any
EVPN Ethernet Segment to down-state.
ASW-104 notice this direct link failure and the interface E1/1 state is changed
to down. Now the only operational interface on Port-Channel 234 is the interface
E1/2 and all traffic are sent over it.
Figure 1-7: Core-Link Tracking.
Leaf-102(config-if)#
sh int e1/2
Ethernet1/2
is down (NVE core link down)
admin
state is up, Dedicated Interface
Belongs to Po234
Example 1-18: Core-Link down on Leaf-102.
Leaf-102#
sh int po
234 | i 234
port-channel234
is down (NVE core link down)
Leaf-102#
sh port-channel summ | b Group
Group
Port- Type Protocol
Member Ports
Channel
-----------------------------------------------------------------
234 Po234(SD)
Eth LACP Eth1/2(D)
Example 1-19: Port-Channel 234 down on Leaf-102.
ASW-104#
sh port-channel summary | b Group
Group
Port- Type Protocol
Member Ports
Channel
-----------------------------------------------------------------
234 Po234(SU)
Eth LACP Eth1/1(D) Eth1/2(P)
Example 1-20: Interface-E1/1 down on ASW-104.
Author: Toni Pasanen
CCIE#28158
Published: 29.5.2019
Updated:
-------------------------------------------------
References:
RFC 7432: BGP MPLS-Based Ethernet VPN
RFC 8214: Virtual Private Wire Service Support in Ethernet VPN
RFC 8584: Framework
for Ethernet VPN Designated Forwarder Election Extensibility
EVPN Multihoming: https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/mp_l2_vpns/configuration/xe-16-10/mp-l2-vpns-xe-16-10-book/evpn-multihoming.pdf
United State Patent: Patent No US 8.559,341 B2:
SYSTEM AND METHOD FOR
PROVIDING A LOOP FREE TOPOLOGY IN A NETWORK ENVIRONMENT
Hi Toni,
ReplyDeleteI went through your excellent VXLAN series and I have to say it is well organised and informative.
this resolve many of my confusion about Vxlan and EVPN.
for this multihoming part, would you please inform:
1. I believe it can be achieved via vPC, so can I say this is an alternative for vPC solution.
2. could multihoming gateways be located in different physical addresses,say two different DC.
All the Best
Michael
Hi Michael,
DeleteNice to hear that you have found answers to your questions here.
Q1: Yes, both are used for multihoming purposes. EVPN ESI Multihoming is a standard solution and vPC is Cisco proprietary.
Q2: Yes.
HI Toni,
ReplyDeleteI found another point I could not understand, in the stp part, you mention when the stp priority changed to lower than 8912 then portchannel on ES will be blocked. do you mean if other switches has a lower priority or one of the ES peer has a lower priority?
As long as I can understand ES peer should just be the root, why not just set their priority to the lowest?
Cheers
Michael
Hi Michael
DeleteI use STP priority 8192 for demonstrating the failure event where the root is not in the VXLAN fabric. STP priority 0 for root is good practice. Thanks for pointing out the confusing STP text. I changed it to this: ” Now if the STP priority is changed to zero (lower than 8192 ) in ASW-104”.
Cheers -Toni
I should thank you for your explanation!
Deleteyou are the selfless kind of people who will share knowledge with others
and another thing I noticed is that you would like to admit your mistake and correct in time in your blog,I can see from your previous conversation with others.
All the best
Michael
I really appreciate your kind words.
DeleteI think that mistakes will help to memorize complex things, but the same mistake is allowed only once :-)
Clear explanations like always!
ReplyDeleteWhile reading vPC and ESI in config guide, I found this thing:
• EVPN Multihoming is supported on the Cisco Nexus 9300 platform switches only and it is not supported
on the Cisco Nexus 9200, 9300-EX/-FX/-FXP/-FX2 and 9500 platform switches. The Cisco Nexus 9500
platform switches can be used as Spine switches, but they cannot be used as VTEPs.
But 9300-EX/-FX/-FXP/-FX2 are the most widely used platforms, I checked on live boxes, there is no 'evpn esi multihoming' command. So the only way to get LACP from two different leafs on these platforms is vPC?
Either with a traditional vPC or with Fabric Peering
Deletehttps://blogs.cisco.com/datacenter/change-is-the-only-constant-vpc-with-fabric-peering-for-vxlan-evpn
I agree, this is announced, but seems working only from version 9.2.3, while at this moment Cisco recommends 7.x for production use https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/recommended_release/b_Minimum_and_Recommended_Cisco_NX-OS_Releases_for_Cisco_Nexus_9000_Series_Switches.html
DeleteThis comment has been removed by the author.
DeleteHi guys, ESI is supported on FX2 N9K switches only. I tried a similar ESI set-up on VIRL.I guess Toni you did the same here. The problem I encountered and this probably has to do with VIRL is that I cannot ping my default gateway(SVI) on both VTEP switches from my hosts. I tried a switch with an SVI and an Ubuntu server but no luck. The problem I experienced has to do with the ARP reply from the VTEP side which does select the VNI id to send the message instead the vlan id. Am I doing something wrong? Please let me know if you can ping your DG. I am not talking about directly connected users but either a device in vPC or a host behind a vPC.
DeleteThanks,
George
vPC or vPC+ or NX-OSv 9.3.1 :-)
DeleteThis comment has been removed by a blog administrator.
ReplyDeleteThis is the best artical to explain evpn multhoming !!!
ReplyDeleteone tiny typo I think,
DF for VLAN 10 is Leaf-102 and DF for VLAN 11 is Leaf-104.
should be
DF for VLAN 10 is Leaf-102 and DF for VLAN 11 is Leaf-103.
Thanks for notifying, I just fixed it. I need to fix it also in my book :)
DeleteHi, Toni.
ReplyDeleteCongratulations on doing such a good job. You're very kind sharing with us all your experiences with this technology. It really helps a lot!
Just one question. Dou you think using ESi for multihoming is a mature option for a production environment or still it's better be cautious and stay with vPC?
Thanks again!
Regards
Hi Fran,
DeleteI'd rather use a vPC dual-homing with Cisco devices for the sake of simplicity (It keeps BGP table cleaner:). This however is not recommendation, only my personal opinion.
Wow, marvelous blog layout! How long have you been blogging for? you make blogging look easy. The overall look of your web site is great, let alone the content!
ReplyDeleteI published my first blog post on March 2017, so it was three and a half year ago. Thank you for your positive feedback. I am currently working on a new book but I will continue with this blog.
ReplyDeleteJust Amazing. Thank you Toni
ReplyDeleteThanks for the amazing blog post!
ReplyDeleteI have two questions: on the Cisco site (here: https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/93x/vxlan/configuration/guide/b-cisco-nexus-9000-series-nx-os-vxlan-configuration-guide-93x/b-cisco-nexus-9000-series-nx-os-vxlan-configuration-guide-93x_appendix_011001.html#task_scx_2pq_zfb) it seems that the spanning-tree domain must be disabled at the end of the configuration of the domain itself. Isn't it something like a mistake in the documentations?
Can you confirm that it is possible to use RSTP instead of MST?
Again, thanks for the amazing content here.