Now you can also download my VXLAN book from the Leanpub.com
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)
This chapter introduces the VXLAN EVPN Multi-Site (EVPN-MS) architecture for interconnecting EVPN Domains. The first section discusses the limitations of flat VXLAN EVPN fabric and the improvements that can be achieved with EVPN-MS. The second section focuses on the technical details of EVPN-MS solutions by using various configuration examples and packet captures.
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)
This chapter introduces the VXLAN EVPN Multi-Site (EVPN-MS) architecture for interconnecting EVPN Domains. The first section discusses the limitations of flat VXLAN EVPN fabric and the improvements that can be achieved with EVPN-MS. The second section focuses on the technical details of EVPN-MS solutions by using various configuration examples and packet captures.
Figure 1-1: Characteristics of Super-Spine VXLAN fabric.
Shared EVPN domain limitations
Figure 1-1 depicts the example BGP EVPN implementation
that includes three Datacenters in three
different locations. Each DC have seven
Leaf-switches and two Spine-switches. For the DC-interconnect, there is a pair
of Super-Spine switches. All VLANs/VNIs has to be available in each Leaf switch
no matter of location. This means that full mesh NVE peering between each Leaf
switches is required.
Even though the physical Underlay Network in this
solution is hierarchical, the Overlay Network on top of it is flat i.e. there
is one shared geographically dispersed EVPN domain (one L2 flooding domain).
From the Underlay Network perspective, this means that the routing design and
routing protocol choice should be consistent throughout the EVPN domain,
otherwise there will be a complex and hard to manage IP prefix redistribution
from one protocol to another. The same design requirements apply also to multi-destination traffic, the
BUM traffic forwarding has to be based on the same solution throughout the EVPN
domain. The Ingress-Replication (IR) does not scale well in large scale VXLAN
EVPN fabric. In this example network, there are 21 Leaf switches. Each switch
has 20 NVE peers, so if IR is used for BUM traffic forwarding, the copy of
multi-destination frame/packet has to be individually sent to all NVE peers.
This might lead to a situation where BUM
traffic flows disturb the actual
application data traffic on an uplink of sending switch. This is why the
Multicast enabled Underlay-Network is preferred in large-scale solution. In summary, a large scale VXLAN EVPN fabric can’t
rely on IP-only Underlay Network.
From the Overlay-Network perspective, the amount of
bi-directional VXLAN tunnels on large-scale solution also has its challenges.
Even though the example here consists of only 21 Leaf switches, there are 20 NVE peering per switch and 210
bi-directional VXLAN tunnels [n x (n-1)/2)]. If the count of Leaf switches
is doubled from 21 to 42 (41 NVE peers per Leaf), the bi-directional tunnel
count will rise up from 210 to 861. If each switch has 41 NVE peers, it also
means 41 possible next-hop addresses per Leaf switch. In the case of VM moves inside one location, every
single Leaf has to update the next-hop
table.
There are no
real plug-and-play capabilities in this solution. When either adding devices to
infrastructure or adding a whole new site, each existing Leaf switch will build
an NVE peering with an added device(s).
The opposite happens when devices are removed from the infrastructure, each
remaining Leaf switch will tear down the
tunnels.
From the administrative perspective, this solution is managed as one entity. This excludes
the design where e.g. customer wants to manage one DC while the service provider manages the other DC owned by
the same customer.
EVPN Multi-Site Architecture
Introduction
Figure 1-2 includes the same physical topology used in
the previous example with an additional pair
of Border Gateways (BGWs) in each site. The one big VXLAN EVPN fabric is now divided
into the set of smaller fabrics, which are connected through the BGWs into DC
Core routers/switches in a shared Common EVPN domain. This brings back the
hierarchy into the Overlay Network.
Each fabric forms an individual management domain that
has dedicated underlay routing architecture (routing protocol, interface
IP-addressing and so on). In addition, either Multicast or Ingress-Replication (IR)
can be used independently, one fabric can use Multicast based solution while the
other fabric can use IR. Site-local Leaf switches form a bi-directional NVE peering (VXLAN tunnels) only with an
intra-site Leaf switches and with a BGW switch.
In Addition, Local BGW switches form an
NVE peering between each other and also between site-external BGW switches.
The intra-site Underlay Network can use any IGP
protocol or BGP for routing exchange while the Overlay Network routing use BGP (L2VPN
EVPN afi). The Underlay Network routing protocol in Common EVPN domain between
BGW switches and DC Core switch/router is eBGP (IPv4 unicast afi) while the eBGP
(L2VPN EVPN afi) is used in the Overlay Network.
Because each site operates as an individual fabric,
there are no Control Plane relationship requirements
between sites. Connecting a new site to DC Core routers does not generate any
major Control Plane changes from the protocol perspective (such as new NVE or
routing protocol peering) in intra-site Leaf switches on remote site. New BGWs
will only establish both Underlay and Overlay network eBGP peering with DC Core
routers and forms NVE peering with the existing BGW switches. After that, BGW
switched can exchanges routing information. In this manner, the EVPN Multi-Site
solution is plug-and-play capable.
There are two
BGWs per site in figure 1-2 but this is not the limitation. NX-OS 9.3.x support
maximum of six BGWs per site.
Next sections introduce
the VXLAN EVPN Multi-Site solution in detail.
Figure 1-2: Characteristics of Super-Spine VXLAN fabric.
Intra-Site EVPN Domain (Fabric)
This section shortly introduces the intra-site example
solution used site 12 and 34 (figure 1-3). Both sites use the same Underlay and
Overlay Network design. OSPF (RID 192.168.0.dev-number/32) is used for
IP-connectivity between nodes and all Loopback address information is
advertised internally. PIM BiDir is enabled on an Underlay Network and Spine
switches are defined as a Pseudo Rendezvous Point. BGP L2EVPN peering is done
between Loopback interface 77 (192.168.77.dev-number/32). NVE interfaces use IP
address of Loopback 100 (192.168.100.dev-number/32) and NVE peering is
established between these addresses. BGW-1 System MAC address is 5000.0002.0007, BGW-2 System MAC address is 5000.0003.0007
and BGW-3 System MAC address is 5000.0004.0007. The complete device
configuration can be found from Appendix
A at the end of this chapter. Left-hand site uses
BGP AS 65012 and Site-Id 12. Right-hand site uses
BGP AS65034 and Site-Id 34. For the sake of simplicity, the device count is
minimized on this example network. Host Abba in VLAN 10 (IP:172.16.10.101/MAC:
1000.0010.abba) is connected to Leaf-101 and host beef (IP:172.16.10.102/MAC:
1000.0010.beef) is connected to Leaf-102. VLAN 10 is mapped to VNI 10000. In
addition to unique switch Physical IP (PIP), intra-site BGW switches BGW-1 and
BGW-2 share the same Virtual IP (VIP) that
is taken from Loopback Interface 88 (192.168.88.12 in both devices).
Figure 1-3: Example EVPN Multi-Site topology.
Intra-Site NVE peering and VXLAN
tunnels
This
section explains the intra-site architecture. Example 1-1 shows that Leaf-101
has three NVE peers, one with the BGW shared Virtual-IP (VIP) and two with BGW
switches Physical-IP (PIP). BGW switches advertise
VIP address as a next-hop address concerning all Route-Type 2 and 5 updates received
from the remote BGW. A physical IP
address is used for three purposes. First,
In case that BGW switch has directly connected hosts (only routing model is
supported), the host prefix is advertised with PIP as a next-hop. Second, If BGW switch is connected to an
external network, the networks received
from the external site are advertise with
PIP as a next-hop. Third, For the BUM
traffic replication, BGW switches use
PIP. This means that Ingress-Replication (IR) tunnels end-point address
advertised within BGP L2VPN EVPN Route-Type 3 (Inclusive Multicast Route) is
PIP.
Leaf-101# sh nve peers detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 192.168.88.12
NVE Interface : nve1
Peer State : Up
Peer Uptime : 01:29:16
Router-Mac : n/a
Peer First VNI : 10000
Time since Create : 01:29:16
Configured VNIs : 10000,10077
Provision State :
peer-add-complete
Learnt CP VNIs : 10000
vni assignment mode : SYMMETRIC
Peer Location : N/A
Peer-Ip: 192.168.100.1
NVE Interface : nve1
Peer State : Up
Peer Uptime : 01:07:26
Router-Mac : n/a
Peer First VNI : 10000
Time since Create : 01:07:26
Configured VNIs : 10000,10077
Provision State :
peer-add-complete
Learnt CP VNIs : 10000
vni assignment mode : SYMMETRIC
Peer Location : N/A
Peer-Ip: 192.168.100.2
NVE Interface : nve1
Peer State : Up
Peer Uptime : 01:50:10
Router-Mac : n/a
Peer First VNI : 10000
Time since Create : 01:50:11
Configured VNIs : 10000,10077
Provision State :
peer-add-complete
Learnt CP VNIs : 10000
vni assignment mode : SYMMETRIC
Peer Location : N/A
Example 1-1: show nve peers detail.
BGW
switches generate the BGP L2VPN EVPN
Route-Type 2 (MAC Advertisement Route) advertisements about their system-MAC
address with the next-hop address of NVE Interface (PIP). Example 1-3 shows
that Leaf-101 has received updates from Spine-11 concerning all three BGW
switches used in this example. Note that the next-hop address towards intra-site
BGW switches System MAC is a Physical IP
(PIP) of BGW switch while the next-hop address towards system-MAC of inter-site
BGW-3 switch is shared Virtual IP address (VIP) used between Intra-Site BGW
switches BGW-1 and BGW-2. Note that both VIP and PIP has to be advertised by
the Underlay Network routing protocol. Even though the route origin is not
visible in example 1-3, the BGP RID of advertising BGW can be seen from the
Route-Distinguisher.
Leaf-101# sh bgp l2vpn evpn
<snipped>
Network Next Hop Metric LocPrf
Weight Path
Route Distinguisher: 192.168.77.1:32777
*>i[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216
192.168.100.1 100 0 i
Route Distinguisher: 192.168.77.2:32777
*>i[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 100 0 i
Route Distinguisher: 192.168.77.3:32777
*>i[2]:[0]:[0]:[48]:[5000.0004.0007]:[0]:[0.0.0.0]/216
192.168.88.12 100 0 65088 65034 i
Route Distinguisher:
192.168.77.101:32777 (L2VNI 10000)
*>i[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216
192.168.100.1 100 0 i
*>i[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 100 0 i
*>i[2]:[0]:[0]:[48]:[5000.0004.0007]:[0]:[0.0.0.0]/216
192.168.88.12 100 0 65088 65034 i
Example 1-3: show bgp l2vpn evpn.
The
system-MAC attached to NVE interface can be verified by using show nve
interface command.
BGW-1# sh nve interface
Interface: nve1, State: Up,
encapsulation: VXLAN
VPC Capability: VPC-VIP-Only [not-notified]
Local Router MAC: 5000.0002.0007
Host Learning Mode: Control-Plane
Source-Interface: loopback100 (primary:
192.168.100.1, secondary: 0.0.0.0)
Example 1-4: show nve peers detail
Example 1-5 shows the BGP table entry on Leaf-101
concerning the system-MAC address of BGW-1. The route
is imported into BGP table based on Route-Target 65012:10000. The encapsulation type is VXLAN (type 8) and
the advertised next-hop address is 192.168.100.1 (PIP). Based on both
encapsulation type VXLAN and Next-hop IP address Leaf-101 knows that switch
with IP address 192.168.100.1 has to be VXLAN tunnel end-point. Note that
system-MAC address is advertised as a sticky-MAC address (shown in partial
Capture 1-1) with MAC-Mobility
Extended Community where the static-flag
is set one (1) and the Sequence number is set to zero. The captured packet is shown in Capture 1-1 after example1-5.
Leaf-101# sh bgp l2vpn evpn 5000.0002.0007
BGP routing table information for VRF
default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.1:32777
BGP routing table entry for
[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216, version 19
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on
xmit-list, is not in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: internal, path is
valid, is best path, no labeled nexthop
Imported to 1 destination(s)
AS-Path: NONE, path sourced internal
to AS
192.168.100.1 (metric 81) from 192.168.77.11
(192.168.77.11)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000
Extcommunity: RT:65012:10000
SOO:192.168.77.1:512 ENCAP:8
MAC Mobility Sequence:01:0
Originator: 192.168.77.1 Cluster list: 192.168.77.11
Path-id 1 not advertised to any peer
Route Distinguisher:
192.168.77.101:32777 (L2VNI 10000)
BGP routing table entry for
[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216, version 20
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on
xmit-list, is not in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: internal, path is
valid, is best path, no labeled nexthop
Imported from
192.168.77.1:32777:[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216
AS-Path: NONE, path sourced internal to AS
192.168.100.1 (metric 81) from 192.168.77.11
(192.168.77.11)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000
Extcommunity: RT:65012:10000
SOO:192.168.77.1:512 ENCAP:8
MAC Mobility Sequence:01:0
Originator: 192.168.77.1 Cluster list: 192.168.77.11
Path-id 1 not advertised to any peer
Example 1-5: show nve peers detail
Before
bringing up the tunnel, Leaf-101 has to verify that the IP address 192.168.100.1
is reachable through the Underlay Network.
Leaf-101# sh ip route 192.168.100.1
IP Route Table for VRF
"default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes
VRF <string>
192.168.100.1/32, ubest/mbest:
1/0
*via 10.101.11.11, Eth1/1, [110/81], 02:58:17, ospf-UNDERLAY-NET, intra
Example 1-6: show ip route 192.168.100.1
Internet Protocol Version 4, Src: 192.168.77.11, Dst: 192.168.77.101
Transmission Control Protocol, Src Port: 57069, Dst Port: 179, Seq: 110,
Ack: 39, Len: 242
Border Gateway Protocol - UPDATE Message
<snipped>
Path Attribute -
EXTENDED_COMMUNITIES
<snipped>
Type Code:
EXTENDED_COMMUNITIES (16)
Length: 32
Carried extended
communities: (4 communities)
Route Target:
65012:10000 [Transitive 2-Octet AS-Specific]
Route Origin:
192.168.77.1:512 [Transitive IPv4-Address-Specific]
Type:
Transitive IPv4-Address-Specific (0x01)
Subtype
(IPv4): Route Origin (0x03)
IPv4 address:
192.168.77.1
2-Octet AN:
512
Encapsulation:
VXLAN Encapsulation [Transitive Opaque]
Type:
Transitive Opaque (0x03)
Subtype
(Opaque): Encapsulation (0x0c)
Tunnel type:
VXLAN Encapsulation (8)
MAC Mobility:
Sticky MAC [Transitive EVPN]
Type: Transitive EVPN (0x06)
Subtype
(EVPN): MAC Mobility (0x00)
Flags: 0x01
.... ...1 =
Sticky/Static MAC: Yes
Sequence
number: 0
Path Attribute - ORIGINATOR_ID:
192.168.77.1
Path Attribute -
CLUSTER_LIST: 192.168.77.11
Path Attribute -
MP_REACH_NLRI
Flags: 0x90, Optional,
Extended-Length, Non-transitive, Complete
Type Code:
MP_REACH_NLRI (14)
Length: 44
Address family
identifier (AFI): Layer-2 VPN (25)
Subsequent address
family identifier (SAFI): EVPN (70)
Next hop network
address (4 bytes)
Number of Subnetwork
points of attachment (SNPA): 0
Network layer
reachability information (35 bytes)
EVPN NLRI: MAC
Advertisement Route
Route Type:
MAC Advertisement Route (2)
Length: 33
Route
Distinguisher: 0001c0a84d018009 (192.168.77.1:32777)
ESI: 00 00 00
00 00 00 00 00 00
Ethernet Tag
ID: 0
MAC Address
Length: 48
MAC Address:
50:00:00:02:00:07 (50:00:00:02:00:07)
IP Address
Length: 0
IP Address:
NOT INCLUDED
MPLS Label
Stack 1: 625, (BOGUS: Bottom of Stack NOT set!)
Capture 1-1: show nve peers detail
Figure
summarizes the NVE peering from the Leaf-101 perspective. BGW-1 sends the BGP
L2VPN EVPN Update including its’ system-MAC address. This way Intra-Site
Leaf-101 learns the information which is needed
for NVE peering. Shared NVE Anycast-BGW address 192.168.88.12 is learned from
the BGP L2VPN EVPN Mac Route Advertisement originated by BGW-3 and forwarded bt
both intra-site BGW switches. When DC Core switch (Route-Server) receives the
Update message it changes the Route-Target (RT) AS part to its’ own AS due to the
rt-rewrite definition. When BGW-1
receives this update, it also modifies the RT Extended Community to its own AS
and it import the NLRI information. When sending an Update to Leaf-101, BGW-1
sets the Next-Hop to 192.168.88.12, which is the shared Anycast BGW address. Based
on the RT 65012:10000 Leaf-101 is able to import BGP L2VPN EVPN MAC
Advertisement Route originated by BGW-3 and learn the IP address of Intra-Site
Anycast BGW from the Next-Hop field. This learning process is Control Plane
learning and is also used for establishing NVE peering between Intra-Site BGW
switches.
Figure 1-4: NVE peer learning process.
Example 1-7 shows that even though Leaf-101 has
established NVE peering to BGW-1 the tunnel is still Unidirectional. BGW1 does
only have NVE peering with fabric internal BGW2 and the BGW-3 but not with
Leaf-101. This is because only BGWs advertises their System MAC-addresses as
Route-Type 2 MAC advertisement route. Leaf is a normal VTEP switch so it does
not advertise its system MAC.
BGW-1# sh nve peer control-plane
Interface Peer-IP State LearnType Uptime Router-Mac
--------- --------------- ----- --------- -------- -----------------
nve1
192.168.100.2 Up CP
03:05:49 n/a
nve1
192.168.100.3 Up CP
02:44:54 n/a
Example 1-7: show nve peers detail
Now host Abba joins to the network. It pings the Anycast GW used in VLAN 10. This way Leaf-101
learn the MAC address of host Abba and send BGP L2VPN EVPN Route-type 2
advertisement to Route-Reflector Spine-11, which in turn forward the message to
BGW switches. Example 1-11 shows that after receiving the BGP Update related to
host Abba, BGW1 also has established the
NVE peering with Leaf-101. Now there is a
bi-directional VXLAN tunnel between these two switches and data can flow over
it.
BGW-1# sh bgp l2vpn evpn
BGP routing table information for VRF
default, address family L2VPN EVPN
BGP table version is 29, Local Router ID
is 192.168.77.1
Status: s-suppressed, x-deleted,
S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external,
c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? -
incomplete, | - multipath, & - backup, 2 - best2
Network Next Hop Metric LocPrf
Weight Path
Route Distinguisher:
192.168.77.1:27001 (ES
[0300.0000.0000.0c00.0309 0])
*>l[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.1]/136
192.168.100.1 100 32768 i
*>i[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
192.168.100.2 100 0 i
Route Distinguisher:
192.168.77.1:32777 (L2VNI 10000)
*>i[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.100.101 100 0 i
*>l[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216
192.168.100.1 100 32768 i
*>i[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 100 0 i
*>e[2]:[0]:[0]:[48]:[5000.0004.0007]:[0]:[0.0.0.0]/216
192.168.100.3 0 65088 65034
i
*>i[2]:[0]:[0]:[48]:[1000.0010.abba]:[32]:[172.16.10.101]/272
192.168.100.101 100 0 i
*>l[3]:[0]:[32]:[192.168.100.1]/88
192.168.100.1 100 32768 i
*>e[3]:[0]:[32]:[192.168.100.3]/88
192.168.100.3 0 65088 65034
i
Route Distinguisher: 192.168.77.2:27001
*>i[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
192.168.100.2 100 0 i
Route Distinguisher: 192.168.77.2:32777
*>i[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 100 0 i
Route Distinguisher: 192.168.77.3:32777
*>e[2]:[0]:[0]:[48]:[5000.0004.0007]:[0]:[0.0.0.0]/216
192.168.100.3 0 65088 65034
i
*>e[3]:[0]:[32]:[192.168.100.3]/88
192.168.100.3 0 65088 65034
i
Route Distinguisher:
192.168.77.101:32777
*>i[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.100.101 100 0 i
*>i[2]:[0]:[0]:[48]:[1000.0010.abba]:[32]:[172.16.10.101]/272
192.168.100.101 100 0 i
Example 1-8: sh bgp l2vpn evpn
BGW-1# sh nve peers control-plane
Interface Peer-IP State LearnType Uptime Router-Mac
--------- --------------- ----- --------- -------- -----------------
nve1
192.168.100.2 Up CP
03:14:42 n/a
nve1
192.168.100.3 Up CP
02:53:47 n/a
nve1 192.168.100.101 Up
CP 00:01:07 n/a
Example 1-9: show nve peers control-plane.
Summary
As a conclusion, intra-site NVE peering is based on
information carried within auto-generated Route-Type 2 describing system MAC
address by BGW. The NVE peering from BGW-to-Leaf is based on information
carried within the first Route-Type 2 MAC advertisement route that describes
one of the hosts behind the Leaf switch.
The result for this is bi-directional VXLAN tunnel.
Shared Common EVPN Domain Connections
Figure 1-5 illustrates the overall topology and eBGP
peering between Border Gateways and DC Core Switch. For simplicity, only one DC
Core switch is used. The DC Core switch has its dedicated BGP AS65088 meaning
external BGP peering is used. DC Core switch also has a role of Route Server
(RS) role. In this example, the RS is in the
data path but in real-life scenarios, it does
not have to be. The complete configuration of RS can be found from Appendix A at the end of this chapter. DC Core
switch and all three BGW switches belongs
to the Common EVPN Domain used for datacenter Interconnect (DCI). This means
that each BGW belongs to both Intra-Site EVPN Domain as well as to Common EVPN
Domain. All Unicast and Multicast traffic from one site to another goes through
the BGW.
The eBGP IPv4 unicast afi is used for IPv4 NLRI
exchange in an Underlay Network. BGW switches advertise
their unique NVE interface IP address (PIP) and shared Virtual IP address (VIP)
as well as the IP address of the external
interface connected to DC Core switch, which in turn forward updates these BGP
Updates to another site. PIP addresses
are used in outer IP header destination and source IP address when BUM traffic
is sent over Ingress-Replication tunnel. VIP address, in turn, is used in the outer IP header for Unicast traffic. Physical
Interface IP addresses (Underlay Netwok) are used for recursive route lookup to
find the next hop for the PIP/VIP address.
The eBGP L2VPN EVPN afi is used for exchanging EVPN
NLRI in an Overlay Network. The information includes NLRI of intra-site host
MAC and MAC/IP information (Route-Type 2) and IP Prefix information (Route-Type
5). Note that BGW switches also exchange their System-MAC addresses information
by using Route-Type 2 MAC Advertisement Routes for NVE peering. In addition, BGW switches advertise
Inclusive Multicast Route (Route-Type
3) to exchange NLRI information concerning Ingress-Replication tunnel. BGW
switches advertise also Ethernet Segment Routes (Route-Type 4)
used for Intra-Site BGW DF election over Common EVPN Domain but those are
ignored by the remote BGW switch due to unmatched route-target import policy.
Figure 1-5: Common EVPN Domain Underlay and Overlay eBGP peering.
Border Gateway setup
This
section explains the Border Gateway configuration.
Define
Site-Id
Configure
the device role as an EVPN Multi-Site Border Gateway an assign Site-Id to it.
The site-Id has to be identical in all BGWs belonging to the local site. Optionally, the shared Virtual-IP
address advertisement can be delayed after recovery. This way Underlay and
Overlay Network Control Plane protocols of BGW switch have sufficient time to
do their job such as building a BGP peering and establish both VXLAN and Ingress-Replication
tunnels before introducing itself as a possible next-hop for inter-site
destination by advertising Virtual IP address.
evpn multisite border-gateway 12
delay-restore time 300
Example 1-10: enabling EVPN MS BGW on BGW1.
Define
source IP for VIP under NVE Interface and BUM method for DCI
NVE
interface of BGW-1 use IP address 192.168.00.1 (Loopback 100) as a Physical IP
(PIP) and the IP address 192.168.88.12 (Loopback 88) as a Virtual IP address
(VIP). Ingress Replication is used for Inter-Site BUM traffic for VNI 10000
while Intra-Site BUM traffic uses
Multicast.
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback100
multisite border-gateway interface loopback88
member vni 10000
multisite ingress-replication
mcast-group 238.0.0.10
member vni 10077 associate-vrf
Example 1-11: configuring NVE interface on BGW1.
Configure
BGP Peering and redistribution
Configure eBGP IPv4 Unicast afi peering for Underlay
Network between physical link interface IP addresses. Advertise BGW switch Loopback
interface IP addresses used by NVE interface (VIP/PIP) and BGP peering to DC
Core switch. Configure eBGP L2VPN EVPN afi peering for Overlay Network between
Loopback 77 IP addresses and define the fabric peer-type as external. When eBGP
peering is configured between Loopback interfaces there is also a need for adjusting TTL value by using “ebgp-multihop <value>” command. In this
example scenario, the value is to five.
BGP L2VPN EVPN Updates send by BGW1 will carry site
site-specific Route-Target Extended Community per VNI. This community use
format BGP-AS: VNI-Id. This is why there
is a command “rewrite-evpn-rt-asn”
under L2VPN EVPN address-family. It modifies the Route-Target BGP AS-part from
received number to local AS number. When BGW1 sends an eBGP L2VPN Update to DC
Core switch, the original RT for VNI 10000 is 65012:10000. When DC Core switch
receives the update message, it changes
the RT to 65088:10000. It uses this RT
value when sending the update message to BGW-3 on the other site. When BGW-3
receives the update message, it changes the RT to 65034:10000 before installing
it into Adj-RIB-In. This way it is able to import NLRI information carried in
update originated by remote-site BGW. Adjust also the BGP maximum path for
load-balancing.
router bgp 65012
router-id 192.168.77.1
no enforce-first-as
address-family ipv4 unicast
redistribute direct route-map
REDIST-TO-SITE-EXT-DCI
address-family l2vpn evpn
maximum-paths 2
maximum-paths ibgp 2
neighbor 10.1.88.88
remote-as 65088
update-source Ethernet1/2
address-family ipv4 unicast
neighbor 192.168.77.11
remote-as 65012
description ** Spine-11 BGP-RR **
update-source loopback77
address-family l2vpn evpn
send-community extended
neighbor 192.168.77.88
remote-as 65088
update-source loopback77
ebgp-multihop 5
peer-type fabric-external
address-family l2vpn evpn
send-community
send-community extended
rewrite-evpn-rt-asn
!
route-map
REDIST-TO-SITE-EXT-DCI permit 10
match tag 1234
!
interface loopback88
description ** VIP for DCI-Inter-connect **
ip address 192.168.88.12/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
Example 1-12: configuring eBGP IPv4 Unicast and L2VPN EVPN peering route
redistribution on BGW1.
Configure
DCI and Fabric Interface Tracking
BGW is in the borderline of intra-site EVPN Domain
(Fabric EVPN) and Common EVPN Domain (DCI). All inter-site traffic goes through
the BGW switches, so it is extremely important to have a mechanism for tracking
the state of both Fabric and DCI interfaces. The
configuration is shown in the example
below. Link failure events are discussed
in detail in “Failure Scenario” section.
interface Ethernet1/1
description **Fabric Internal **
no switchport
mtu
9216
mac-address b063.0001.1e11
medium p2p
ip address 10.1.11.1/24
ip ospf network point-to-point
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
evpn multisite fabric-tracking
no shutdown
!
interface Ethernet1/2
description ** DCI Interface **
no switchport
mtu
9216
mac-address
b063.0001.1e12
medium p2p
ip address 10.1.88.1/24 tag 1234
ip pim sparse-mode
evpn multisite dci-tracking
no shutdown
Example 1-13: DCI and fabric interface tracking on BGW1.
These
are the basic EVPN Multi-Site related configuration. Complete configuration of
all BGP switches and DC Core switch can be found from Appendix A at the end of
this chapter.
BGP
peering Verification on BGW
Example 1-14 shows that BGW-1 has established iBGP L2VPN
EVPN session with 192.168.77.11 (Spine-11) and it has received three Route-Type
2 (MAC Advertisement Route) and one Route-Type 4 (Ethernet Segment Route). It
also has established an eBGP L2VPN EVPN session with 192.168.77.88 (DC Core
switch) from where it has received one Route-Type 2 (MAC Advertisement Route) and
one Route-Type 3 (Inclusive Multicast Ethernet Tag Route).
BGW-1# sh bgp l2vpn evpn summary
BGP summary information for VRF default,
address family L2VPN EVPN
BGP router identifier 192.168.77.1,
local AS number 65012
BGP table version is 251, L2VPN EVPN
config peers 2, capable peers 2
15 network entries and 15 paths using
2616 bytes of memory
BGP attribute entries [11/1804], BGP AS
path entries [1/10]
BGP community entries [0/0], BGP clusterlist entries [2/8]
Neighbor V
AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.77.11 4 65012
609 486 251
0 0 07:14:11 4
192.168.77.88 4 65088
487 531 251
0 0 07:12:39 2
Neighbor T
AS PfxRcd Type-2 Type-3
Type-4 Type-5
192.168.77.11 I 65012 4 3 0 1 0
192.168.77.88 E 65088 2 1 1 0 0
Example 1-14: sh bgp l2vpn evpn summary on BGW1.
NVE
peering Verification on BGW1
Example 1-15 shows that BGW-1 has established an NVE
peering between the intra-site BGW-2 and Leaf-101. The Peer-Location shown in
output verifies that these are fabric peers. In addition, BGW-1 has established
NVE peering with BGW-3 which location is described as DCI. The NVE peering
process between the inter-site BGW switches use the same mechanism than NVE
peering between intra site BGW switches. The trigger for the NVE peer learning
process is auto-generated system MAC-address advertisement (Route-Type 2 – MAC
Advertisement Route).
BGW-1# sh nve peers detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 192.168.100.2
NVE Interface : nve1
Peer State : Up
Peer Uptime : 08:13:09
Router-Mac : n/a
Peer First VNI : 10000
Time since Create : 08:13:09
Configured VNIs : 10000,10077
Provision State :
peer-add-complete
Learnt CP VNIs : 10000
vni assignment mode : SYMMETRIC
Peer Location : FABRIC
Peer-Ip: 192.168.100.3
NVE Interface : nve1
Peer State : Up
Peer Uptime : 07:52:14
Router-Mac : n/a
Peer First VNI : 10000
Time since Create : 07:52:14
Configured VNIs : 10000,10077
Provision State :
peer-add-complete
Learnt CP VNIs : 10000
vni assignment mode : SYMMETRIC
Peer Location : DCI
Peer-Ip: 192.168.100.101
NVE Interface : nve1
Peer State : Up
Peer Uptime : 04:59:34
Router-Mac : n/a
Peer First VNI : 10000
Time since Create : 04:59:34
Configured VNIs : 10000,10077
Provision State :
peer-add-complete
Learnt CP VNIs : 10000
vni assignment mode : SYMMETRIC
Peer Location : FABRIC
Example
1-15: sh bgp l2vpn evpn summary on
BGW1.
Example 1-16 taken from BGW-1 shows the BGW NVE related
information. The output shows among the other things the NVE source interface
that is a Physical IP address (PIP) and the shared Virtual IP address (VIP)
used as a next-hop for ingress and egress inter-site traffic. Note that the
operational state for the VIP interface (Loopback88)
is “down”. This is because the output is taken when there was neither IP
connectivity nor NVE peering between BGW-1 and BGW-2 (Spine-11 was turned off).
BGW-1# sh nve interface nve1 detail
Interface: nve1, State: Up,
encapsulation: VXLAN
VPC Capability: VPC-VIP-Only [not-notified]
Local Router MAC: 5000.0002.0007
Host Learning Mode: Control-Plane
Source-Interface: loopback100 (primary:
192.168.100.1, secondary: 0.0.0.0)
Source Interface State: Up
Virtual RMAC Advertisement: No
NVE Flags:
Interface Handle: 0x49000001
Source Interface hold-down-time: 180
Source Interface hold-up-time: 30
Remaining hold-down time: 0 seconds
Virtual Router MAC: N/A
Virtual Router MAC Re-origination:
0200.c0a8.580c
Interface state: nve-intf-add-complete
Multisite delay-restore time: 300 seconds
Multisite delay-restore time left: 22 seconds
Multisite bgw-if: loopback88 (ip:
192.168.88.12, admin: Up, oper: Down)
Multisite bgw-if oper down reason:
Example 1-16: sh nve interface nve1 detail on BGW1.
When Spine-11 boots up and the IP connectivity and NVE
peering is established between BGW-1 and BGW-2 the operational state for Loopback88
interface on BGW-1 changes to UP-state.
BGW-1# sh nve interface nve1 detail | i bgw-if
Multisite bgw-if:
loopback88 (ip: 192.168.88.12, admin: Up, oper: Up)
Multisite bgw-if
oper down reason:
Example 1-17: sh nve interface nve1 detail on
BGW1.
BGP
NLRI information verification.
Host Abba connected to Leaf-101 joins the network. It pings
the Anycast-GW IP address 172.16.10.1 (SVI for VLAN 10). Leaf-101 learns the
MAC address information from the ingress frame. It stores the MAC information
into MAC address table and L2RIB of MAC-VRF from where the MAC address information
is exported into BGP Loc-RIB and send it through the Adj-RIB-Out to Spine-11.
BGP Route-Reflector Spine-11 forwards the
BGP Update to both BGW-1 and BGW-2. BGW switches
forwards BGP Update to DC Core switch after local processing. The example below shows that DC Core switch has
learned the MAC address 1000.0010.abba of the host
from both BGW-1 and BGW-2 with the same Next-Hop address 192.168.88.12 (VIP).
Route Distinguisher:
192.168.77.101:32777
*>e[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.88.12 2000 0 65012 i
* e 192.168.88.12 2000 0 65012 i
e[2]:[0]:[0]:[48]:[1000.0010.abba]:[32]:[172.16.10.101]/248
192.168.88.12 2000 0 65012 i
e
192.168.88.12 2000 0 65012 i
Example 1-18: sh bgp l2vpn evpn summary on DC Core switch (RouteServer).
The example below shows that the DC Core switch has installed a route to 192.168.88.12 into RIB from BGP
Loc-RIB with two equal next-hop IP address (Underlay Network addresses) and
will use both of these for ECMP load-balancing toward the destination.
RouteServer-1# sh ip route 192.168.88.12
<snipped>
192.168.88.12/32, ubest/mbest: 2/0
*via 10.1.88.1, [20/0], 00:00:04, bgp-65088, external, tag 65012
*via 10.2.88.2, [20/0], 00:00:04, bgp-65088, external, tag 65012
Example 1-19: sh ip route 192.168.88.12 on DC Core switch.
The example below shows that BGW-3 has
received the BGP Update about the MAC addresses information of host Abba from
DC Core switch. BGW-3 has changed the Route-Target AS-part to its BGP AS before
importing the route from Adj-RIB-In (pre)
into the Adj-RIB-In (post). From the Adj-RIB-In (post) route is imported into the
Loc-RIB.
BGW-3# show bgp l2vpn evpn 1000.0010.abba
BGP routing table information for VRF
default, address family L2VPN EVPN
!----------------< COMMENT: This
entry is in BGP Loc-RIB >-------------
Route
Distinguisher: 192.168.77.3:32777
(L2VNI 10000)
BGP routing
table entry for [2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216, version
289
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on
xmit-list, is in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: external, path is
valid, is best path, no labeled nexthop, in rib
Imported from
192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
AS-Path: 65088 65012 , path sourced external to AS
192.168.88.12 (metric 0) from 192.168.77.88
(192.168.77.88)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000
Extcommunity: RT:65034:10000 ENCAP:8
Path-id 1 not advertised to any peer
!----------------< COMMENT: This
entry is in BGP Adj-RIB-In >-------------
Route
Distinguisher: 192.168.77.101:32777
BGP routing
table entry for [2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216, version
288
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on
xmit-list, is not in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: external, path is
valid, is best path, no labeled nexthop
Imported to 1 destination(s)
AS-Path: 65088 65012 , path sourced external to AS
192.168.88.12 (metric 0) from 192.168.77.88
(192.168.77.88)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10000
Extcommunity: RT:65034:10000 ENCAP:8
Path-id 1 not advertised to any peer
Example 1-20: sh bgp l2vpn evpn summary on BGW1.
L2RIB
Verification on remote BGW
BGW-3
has installed MAC address information from the BGP Loc-RIB into L2RIB.
BGW-3# show l2route mac all
Flags -(Rmac):Router MAC (Stt):Static (L):Local
(R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv
(AD):Auto-Delete (D):Del Pending
(S):Stale
(C):Clear, (Ps):Peer Sync (O):Re-Originated
(Nho):NH-Override
(Pf):Permanently-Frozen,
(Orp): Orphan
Topology Mac Address Prod
Flags Seq No Next-Hops
----------- -------------- ------
------------- ---------- ----------------
10 1000.0010.abba BGP Rcv 0 192.168.88.12
10 5000.0004.0007 VXLAN Stt,Nho,
0 192.168.100.3
Example 1-21: show l2route mac all on BGW3.
MAC
Address-Table Verification on remote BGW switch
BGW-3
has also installed MAC information into MAC address-table. The information
stored in both L2RIB and MAC Address-Table incudes almost identical
information. The difference compared to these two tables relies on usage. The
Data Plane use MAC address-Table for switching while the Control Plane use the
L2RIB for exporting/importing information to and from BGP processes.
BGW-3# show system internal l2fwder mac
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC
VLAN MAC Address Type
age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
G
- b063:0003:1e12 static
- F F
sup-eth1(R)
G
- b063:0003:1e11 static
- F F
sup-eth1(R)
* 10
1000.0010.abba static -
F F nve-peer2 192.168.88.12
G
- b063:0003:1e14 static
- F F
sup-eth1(R)
G
- b063:0003:1e13 static
- F F
sup-eth1(R)
G
- 0200:c0a8:5822 static
- F F
sup-eth1(R)
1 1 -00:01:00:01:00:01 - 1
Example 1-22: show system internal l2fwder mac on BGW3.
BGP
NLRI Next-Hop verification on remote BGW
When
BGW-3 forwards a BGP Updates message received
from the Common EVPN Domain into intra-site devices, it changes the next-hop IP
address to VIP address (even though it is the only BGW on-site).
Leaf-102# sh bgp l2vpn evpn
Route
Distinguisher: 192.168.77.101:32777
*>i[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.88.34 100 0 65088 65012 i
Route
Distinguisher: 192.168.77.102:32777
(L2VNI 10000)
*>i[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.88.34 100 0 65088 65012 i
Leaf-102#
Example 1-23: show bgp l2vpn evpn on Leaf-102.
The
same process is done by BGW-1 and BGW-2. They both changes the next-hop address
to shared VIP address when sending BGP L2VPN EVPN BGP Updates received from
Common EVPN Domain to intra-site devices.
Leaf-101# sh bgp l2vpn evpn
<snipped>
Route Distinguisher:
192.168.77.102:32777
*>i[2]:[0]:[0]:[48]:[1000.0010.beef]:[0]:[0.0.0.0]/216
192.168.88.12 100 0 65088 65034 i
Leaf-101#
Example 1-24: show bgp l2vpn evpn on Leaf-101.
Simple
ping test verifies that there is a connection
between host Beef connected to Leaf-102 and host Abba connected to Leaf-101.
Beef#ping 172.16.10.101
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to
172.16.10.101, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5),
round-trip min/avg/max = 96/137/224 ms
Example 1-25: ping from host Beef to host Abba within VLAN 10.
Multi-Destination traffic
forwarding
There is two important
consideration related to inter-site BUM traffic. First, if there is more than
one intra-site BGW switches, the role of Designated Forwarder (DF) per VLAN/VNI
is selected randomly among all Intra-Site BGW switches. DF is the switch that
is responsible for inter-site ingress/egress BUM traffic forwarding. Second, when DF election is done, BGW switches
need to know whom to forward inter-site BUM traffic over Common EVPN Domain.
This means that BGWs with each location needs to build a Multi-Destination Tree
between themselves. Next two-section
explains the DF election process by using BGP L2VPN EVPN Route-Type 4 (Ethernet
Segment Route) and Multi-Destination Tree building process by using BGP L2VPN
EVPN Route-Type 3 (Inclusive Multicast Route) for finding Ingress-Replication
peers.
Designated
Forwarder
BGW switches send
a BGP L2VPN EVPN Route-Type 4 (Ethernet Segment Route) update to all of their
BGP L2VPN EVPN peer. Switches use this information for selecting DF per
VLAN/VNI. The first part of the NLRI update
message [4] describes the EVPN Route-Type. The second part
[0300.0000.0000.0c00.0309] includes information about; ESI Type (03 = MAC-based ESI), ESI system MAC 0000.0000.000c (formed
from Site-Id 12 = HEX 0c). It also contains the auto-generated ESI local
discriminator 000309. Value [32] describes the length of the following IP
address that describes the sender IP address [192.168.100.1] which is used for
DF election process (explained later). In addition, the Update message carries an
ES Import Route-Target Extended Community
BGP Path Attribute that is generated automatically based on the local Site-Id
(0c = 12). Only the Intra-Site BGW switches later import these Updates.
All
BGP L2VPN EVPN peers will receive the Route-Type 4 BGP Update, also Leaf-101
(reflected by Spine-11) and BGW-3 will receive the BGP Update though they ignore it because they do not have matching
import clause for the RT.
Figure 1-6: Route-Type 4 sent by BGW-1 and BGW-2.
Examples
1-27 shows that BGW-1 has installed Ethernet Segment Route learned from the
peer site-local BGW-2 switch into BGP table. The Route-Target Extended Path
Attribute is based on Site-Id meaning that only intra-site BGW switches are
able to import Ethernet Segment Routes between each other.
BGW-1# sh bgp l2vpn evpn route-type 4
BGP routing table information for VRF
default, address family L2VPN EVPN
Route Distinguisher:
192.168.77.1:27001 (ES [0300.0000.0000.0c00.0309 0])
BGP routing
table entry for [4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.1]/136,
version 7
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn
Multipath: eBGP iBGP
Advertised path-id 1
Path type: local, path is valid, is best
path, no labeled nexthop
AS-Path: NONE, path locally originated
192.168.100.1 (metric 0) from 0.0.0.0 (192.168.77.1)
Origin IGP, MED not set, localpref 100, weight 32768
Extcommunity: ENCAP:8 RT:0000.0000.000c
Path-id 1 advertised to peers:
192.168.77.11 192.168.77.88
BGP routing
table entry for [4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136,
version 9
Paths: (1 available, best #1)
Flags: (0x000012) (high32 00000000) on
xmit-list, is in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: internal, path is
valid, is best path, no labeled nexthop
Imported from
192.168.77.2:27001:[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
AS-Path: NONE, path sourced internal to AS
192.168.100.2 (metric 81) from 192.168.77.11 (192.168.77.11)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: ENCAP:8 RT:0000.0000.000c
Originator: 192.168.77.2 Cluster list: 192.168.77.11
Path-id 1 not advertised to any peer
Route Distinguisher: 192.168.77.2:27001
BGP routing table entry for
[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136, version 8
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: internal, path is
valid, is best path, no labeled nexthop
Imported to 1 destination(s)
AS-Path: NONE, path sourced internal
to AS
192.168.100.2 (metric 81) from 192.168.77.11 (192.168.77.11)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: ENCAP:8 RT:0000.0000.000c
Originator: 192.168.77.2 Cluster list: 192.168.77.11
Path-id 1 advertised to peers:
192.168.77.88
Example 1-26: BGP L2VPN EVPN Ethernet Segment Route in BGW-1 BGP table.
Examples
1-27 shows that BGW-2 has installed Ethernet Segment Route learned from the site-local
peer BGW-1 switch into BGP table.
BGW-2# sh bgp l2vpn evpn route-type 4
BGP routing table information for VRF
default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.1:27001
BGP routing
table entry for [4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.1]/136,
version 8
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: internal, path is
valid, is best path, no labeled nexthop
Imported to 1 destination(s)
AS-Path: NONE, path sourced internal
to AS
192.168.100.1 (metric 81) from 192.168.77.11 (192.168.77.11)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: ENCAP:8 RT:0000.0000.000c
Originator: 192.168.77.1 Cluster list: 192.168.77.11
Path-id 1 advertised to peers:
192.168.77.88
Route Distinguisher:
192.168.77.2:27001 (ES
[0300.0000.0000.0c00.0309 0])
BGP routing
table entry for [4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.1]/136,
version 9
Paths: (1 available, best #1)
Flags: (0x000012) (high32 00000000) on
xmit-list, is in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: internal, path is
valid, is best path, no labeled nexthop
Imported from
192.168.77.1:27001:[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.1]/136
AS-Path: NONE, path sourced internal to AS
192.168.100.1 (metric 81) from 192.168.77.11 (192.168.77.11)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: ENCAP:8 RT:0000.0000.000c
Originator: 192.168.77.1 Cluster list: 192.168.77.11
Path-id 1 not advertised to any peer
BGP routing table entry for
[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136, version 7
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn
Multipath: eBGP iBGP
Advertised path-id 1
Path type: local, path is valid, is best
path, no labeled nexthop
AS-Path: NONE, path locally originated
192.168.100.2 (metric 0) from 0.0.0.0 (192.168.77.2)
Origin IGP, MED not set, localpref 100, weight 32768
Extcommunity: ENCAP:8 RT:0000.0000.000c
Path-id 1 advertised to peers:
192.168.77.11 192.168.77.88
Example 1-27: BGP L2VPN EVPN Ethernet Segment Route in BGW-2 BGP table.
The capture below shows the BGP L2VPN EVPN Route-Type 4 (Ethernet
Segment Route) sent by BGW-1.
Border Gateway Protocol - UPDATE Message
Marker: ffffffffffffffffffffffffffffffff
Length: 93
Type: UPDATE Message (2)
Withdrawn Routes Length: 0
Total Path Attribute Length: 70
Path attributes
Path Attribute - ORIGIN: IGP
Path Attribute - AS_PATH: 65012
Path Attribute - EXTENDED_COMMUNITIES
Flags: 0xc0, Optional, Transitive,
Complete
Type Code: EXTENDED_COMMUNITIES
(16)
Length: 16
Carried extended
communities: (2 communities)
Encapsulation: VXLAN
Encapsulation [Transitive Opaque]
Type: Transitive Opaque
(0x03)
Subtype (Opaque):
Encapsulation (0x0c)
Tunnel type: VXLAN
Encapsulation (8)
ES Import: RT:
00:00:00:00:00:0c [Transitive EVPN]
Type: Transitive EVPN
(0x06)
Subtype (EVPN): ES Import
(0x02)
ES-Import Route Target:
00:00:00_00:00:0c (00:00:00:00:00:0c)
Path Attribute - MP_REACH_NLRI
Flags: 0x90, Optional,
Extended-Length, Non-transitive, Complete
Type Code: MP_REACH_NLRI (14)
Length: 34
Address family identifier (AFI):
Layer-2 VPN (25)
Subsequent address family
identifier (SAFI): EVPN (70)
Next hop network address (4 bytes)
Number of Subnetwork points of
attachment (SNPA): 0
Network layer reachability
information (25 bytes)
EVPN NLRI: Ethernet Segment
Route
Route Type: Ethernet
Segment Route (4)
Length: 23
Route Distinguisher:
0001c0a84d016979 (192.168.77.1:27001)
ESI: 00:00:00:00:00:0c,
Discriminator: 00 03
ESI Type: ESI MAC
address defined (3)
ESI system MAC:
00:00:00_00:00:0c (00:00:00:00:00:0c)
ESI system mac discriminator:
00 03
Remaining bytes: 09
IP Address Length: 32
IPv4 address: 192.168.100.1
Capture 1-2: BGP L2VPN EVPN Route-Type 4
BGW switches choose Designated Forwarder (DF) among
themselves to forward BUM (Broadcast, Unknown Unicast and Multicast) traffic to
and from intra-site EVPN Domain. If intra-site has more than one VLAN, the DF
roles are load-balanced between BGW nodes, i.e. DF for VLAN 10 is BGW-1 and DF
for VLAN 1 and 77 is BGW-2. The selection process uses the formula “i
= V mod N”, where V represents VLAN Id and N represents a number of BGW switches in the redundancy group. The “i” is an ordinal of a
leaf switch in the redundancy group. When
BGW-1 and BGW-2 exchanges BGP L2VPN EVPN
Route-Type 4 (Ethernet Segment Route) their IP address is included in NLRI.
Each switch sets these IP address learned from BGP Update in numerical order
from lowest to highest. In case of BGW-1
and BGW-2, the order is 192.168.100.1, 192.168.100.2. The lowest IP i.e.
192.168.100.1 gets ordinal zero (0) and the next one gets ordinal one (1) and
so on.
Formula
to calculate DF for VLAN 10 is
V
mod N = i
V = 10 (VLAN Id)
N = 2 (number of leaf switches)
10 mod 2 = 0 > Leaf-102
(Remainders is
zero (0) when 10 is divided by 2)
Ordinal zero is used by BGW-1, so it
will be the DF for VLAN 10.
Formula
to calculate DF for VLAN 77 is
V
mod N = i
V = 77 (VLAN Id)
N = 2 (number of leaf switches)
77 mod 2 = 01 > BGW-1
(Remainders is one
(1) when 77 is divided by 2)
Ordinal one is used by BGW-2, so it will
be the DF for VLAN 77.
This procedure is the same that what was
introduced in “EVPN ESI Multihoming- Part I: EVPN Ethernet Segment (ES) DF
election section”.
Examples 1-28 shows that BGW-1 is DF for
VLAN 10 and example 1-29 shows that BGW-2 is DF for VLAN 1 and 77.
BGW-1# sh nve ethernet-segment
ESI: 0300.0000.0000.0c00.0309
Parent interface: nve1
ES State: Up
Port-channel state: N/A
NVE Interface: nve1
NVE State: Up
Host Learning Mode: control-plane
Active Vlans: 1,10,77
DF Vlans: 10
Active VNIs: 10000
CC failed for VLANs:
VLAN CC timer: 0
Number of ES members: 2
My ordinal: 0
DF timer start time: 00:00:00
Config State: N/A
DF List: 192.168.100.1 192.168.100.2
ES route added to L2RIB: True
EAD/ES routes added to L2RIB: False
EAD/EVI route timer age: not running
Example 1-28: DF election verification on BGW-1.
BGW-2# sh nve ethernet-segment
ESI: 0300.0000.0000.0c00.0309
Parent interface: nve1
ES State: Up
Port-channel state: N/A
NVE Interface: nve1
NVE State: Up
Host Learning Mode: control-plane
Active Vlans: 1,10,77
DF Vlans: 1,77
Active VNIs: 10000
CC failed for VLANs:
VLAN CC timer: 0
Number of ES members: 2
My ordinal: 1
DF timer start time: 00:00:00
Config State: N/A
DF List: 192.168.100.1 192.168.100.2
ES route added to L2RIB: True
EAD/ES routes added to L2RIB: False
EAD/EVI route timer age: not running
----------------------------------------
Example 1-29: DF election verification on BGW-2.
Ingress-Replication
In order to forward Inter-Site Multi-Destination
traffic, BGW switches form a
Multi-destination tree between remote-site BGW
switches. Switches use BGP L2VPN EVPN Route-Type 3 (Inclusive Multicast
Route) to describe their Tunnel-Id used with VNI and tunnel type, which is
Ingress-Replication. By using this information, switches are able to form the
Multi-Destination tree over Unicast-Only Underlay Network.
In figure 1-6 BGW-1 sends a BGP L2VPN EVPN Update to
BGW-3. EVPN NLRI describes the Route-Type (Inclusive Multicast Route) and
sender IP (192,168.100.1). PMSI Tunnel Attribute describes the tunnel type
(Ingress-Replication) the VNI which BUM traffic should be sent over the tunnel
and Tunnel-Id used by BGW-1. This attribute is discussed in a later section. Route-Target Extended Community
is set based on local values (65012:10000) which receiving switches changes to
correspond their own AS: VNI. The same
process applies to BGW-2 and BGW-3.
Note that DC Core SW does not forward BGP L2VPN EVPN
Inclusive Multicast Route sent by BGW-1 to BGW-2 due to same AS number. This is
a normal BGP Loop Prevention mechanism. Also,
BGP L2VPN EVPN Inclusive Multicast Route is only sent out from the DCI
interface and receiving BGW switch does not forward it to local BGP speakers.
Figure 1-6: BGP L2VPN EVPN Route-Type 3 (Inclusive Multicast Ethernet Tag).
Examples
1-30 shows that BGW-1 received the BGP L2VPN EVPN Route-Type 3 NLRI information
originated by BGW-3.
BGW-1# sh bgp l2vpn evpn route-type 3
!---------------< Comment: This is
the local information advertise to BGW-3 >---------------
BGP routing table information for VRF
default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.1:32777 (L2VNI 10000)
BGP routing table entry for
[3]:[0]:[32]:[192.168.100.1]/88, version 3
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn
Multipath: eBGP iBGP
Advertised path-id 1
Path type: local, path is valid, is best
path, no labeled nexthop
AS-Path: NONE, path locally originated
192.168.100.1 (metric 0) from 0.0.0.0 (192.168.77.1)
Origin IGP, MED not set, localpref 100, weight 32768
Origin flag 0x2
Extcommunity: RT:65012:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress Replication
Label: 10000, Tunnel Id: 192.168.100.1
Path-id 1 advertised to peers:
192.168.77.88
!---------------< Comment: This is
the information installed into BGP Loc-RIB ---------->
BGP routing
table entry for [3]:[0]:[32]:[192.168.100.3]/88, version 26
Paths: (1 available, best #1)
Flags: (0x000012) (high32 00000000) on
xmit-list, is in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: external, path is
valid, is best path, no labeled nexthop
Imported from
192.168.77.3:32777:[3]:[0]:[32]:[192.168.100.3]/88
AS-Path: 65088 65034 , path sourced external to AS
192.168.100.3 (metric 0) from 192.168.77.88 (192.168.77.88)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: RT:65012:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress
Replication
Label: 10000, Tunnel Id: 192.168.100.3
Path-id 1 not advertised to any peer
!-----------< Comment: This is the
information installed into Adj-RIB-In >-------
Route Distinguisher: 192.168.77.3:32777
BGP routing table entry for
[3]:[0]:[32]:[192.168.100.3]/88, version 24
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: external, path is
valid, is best path, no labeled nexthop
Imported to 1 destination(s)
AS-Path: 65088 65034 , path sourced external to AS
192.168.100.3 (metric 0) from 192.168.77.88 (192.168.77.88)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: RT:65012:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress Replication
Label: 10000, Tunnel Id: 192.168.100.3
Path-id 1 not advertised to any peer
Example 1-30: DF selection verification on BGW-2.
Examples
1-31 shows that also BGW-2 received the BGP L2VPN EVPN Route-Type 3 NLRI
information originated by BGW-3.
BGW-2# sh bgp l2vpn evpn route-type 3
BGP routing table information for VRF
default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.2:32777 (L2VNI 10000)
BGP routing table entry for
[3]:[0]:[32]:[192.168.100.2]/88, version 3
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn
Multipath: eBGP iBGP
Advertised path-id 1
Path type: local, path is valid, is best
path, no labeled nexthop
AS-Path: NONE, path locally originated
192.168.100.2 (metric 0) from 0.0.0.0 (192.168.77.2)
Origin IGP, MED not set, localpref 100, weight 32768
Origin flag 0x2
Extcommunity: RT:65012:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress Replication
Label: 10000, Tunnel Id: 192.168.100.2
Path-id 1 advertised to peers:
192.168.77.88
BGP routing table entry for [3]:[0]:[32]:[192.168.100.3]/88,
version 26
Paths: (1 available, best #1)
Flags: (0x000012) (high32 00000000) on
xmit-list, is in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: external, path is
valid, is best path, no labeled nexthop
Imported from
192.168.77.3:32777:[3]:[0]:[32]:[192.168.100.3]/88
AS-Path: 65088 65034 , path sourced external to AS
192.168.100.3 (metric 0) from 192.168.77.88 (192.168.77.88)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: RT:65012:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress Replication
Label: 10000, Tunnel Id: 192.168.100.3
Path-id 1 not advertised to any peer
Route Distinguisher: 192.168.77.3:32777
BGP routing table entry for
[3]:[0]:[32]:[192.168.100.3]/88, version 24
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: external, path is
valid, is best path, no labeled nexthop
Imported to 1 destination(s)
AS-Path: 65088 65034 , path sourced external to AS
192.168.100.3 (metric 0) from 192.168.77.88 (192.168.77.88)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: RT:65012:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress Replication
Label: 10000, Tunnel Id: 192.168.100.3
Path-id 1 not advertised to any peer
Example 1-31: sh bgp l2vpn evpn route-type 3.
Examples
1-32 shows that BGW-3 received the BGP L2VPN EVPN Route-Type 3 NLRI information
originated by BGW-1 and BGW-2.
BGW-3# sh bgp l2vpn evpn route-type 3
BGP routing table information for VRF
default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.1:32777
BGP routing table entry for
[3]:[0]:[32]:[192.168.100.1]/88, version 7
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: external, path is
valid, is best path, no labeled nexthop
Imported to 1 destination(s)
AS-Path: 65088 65012 , path sourced external to AS
192.168.100.1 (metric 0) from 192.168.77.88 (192.168.77.88)
Origin IGP, MED not set,
localpref 100, weight 0
Extcommunity: RT:65034:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress Replication
Label: 10000, Tunnel Id: 192.168.100.1
Path-id 1 not advertised to any peer
Route Distinguisher: 192.168.77.2:32777
BGP routing table entry for
[3]:[0]:[32]:[192.168.100.2]/88, version 15
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: external, path is
valid, is best path, no labeled nexthop
Imported to 1 destination(s)
AS-Path: 65088 65012 , path sourced external to AS
192.168.100.2 (metric 0) from 192.168.77.88 (192.168.77.88)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: RT:65034:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress Replication
Label: 10000, Tunnel Id: 192.168.100.2
Path-id 1 not advertised to any peer
Route Distinguisher:
192.168.77.3:32777 (L2VNI 10000)
BGP routing table entry for
[3]:[0]:[32]:[192.168.100.1]/88, version 11
Paths: (1 available, best #1)
Flags: (0x000012) (high32 00000000) on
xmit-list, is in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: external, path is
valid, is best path, no labeled nexthop
Imported from
192.168.77.1:32777:[3]:[0]:[32]:[192.168.100.1]/88
AS-Path: 65088 65012 , path sourced external to AS
192.168.100.1 (metric 0) from 192.168.77.88 (192.168.77.88)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: RT:65034:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress Replication
Label: 10000, Tunnel Id: 192.168.100.1
Path-id 1 not advertised to any peer
BGP routing table entry for
[3]:[0]:[32]:[192.168.100.2]/88, version 17
Paths: (1 available, best #1)
Flags: (0x000012) (high32 00000000) on
xmit-list, is in l2rib/evpn, is not in HW
Multipath: eBGP iBGP
Advertised path-id 1
Path type: external, path is
valid, is best path, no labeled nexthop
Imported from
192.168.77.2:32777:[3]:[0]:[32]:[192.168.100.2]/88
AS-Path: 65088 65012 , path sourced external to AS
192.168.100.2 (metric 0) from 192.168.77.88 (192.168.77.88)
Origin IGP, MED not set, localpref 100, weight 0
Extcommunity: RT:65034:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress Replication
Label: 10000, Tunnel Id: 192.168.100.2
Path-id 1 not advertised to any peer
BGP routing table entry for
[3]:[0]:[32]:[192.168.100.3]/88, version 3
Paths: (1 available, best #1)
Flags: (0x000002) (high32 00000000) on
xmit-list, is not in l2rib/evpn
Multipath: eBGP iBGP
Advertised path-id 1
Path type: local, path is valid, is best
path, no labeled nexthop
AS-Path: NONE, path locally originated
192.168.100.3 (metric 0) from 0.0.0.0 (192.168.77.3)
Origin IGP, MED not set, localpref 100, weight 32768
Origin flag 0x2
Extcommunity: RT:65034:10000 ENCAP:8
PMSI Tunnel Attribute:
flags: 0x00, Tunnel type: Ingress Replication
Label: 10000, Tunnel Id: 192.168.100.3
Path-id 1 advertised to peers:
192.168.77.88
Example 1-32: DF selection verification on BGW-2.
P-Multicast
Service Instance (PMSI) Path Attribute shown in capture 1-2 describes the PSMI
tunnel end-point for Multi-Destination tree over a Common EVPN domain for VNI
10000. BGW that acts as a kind of PE device
offers PMSI service for site-local devices, which means that the BGW switch has
to be able to forward Multi-Destination traffic received form CE device, which
in intra-site perspective are Leaf and Spine switches, over a Common EVPN
Domain to BGW switches located on remote-site and another way around. The binary figures in front of “MPLS label”
describes the Virtual Network Identifier (VNI) for this Multi-destination tree.
Binary value 0010.0111.0001 is in decimal notation 10000 (VNI used with VLAN
10).
Border Gateway Protocol - UPDATE Message
Marker: ffffffffffffffffffffffffffffffff
Length: 99
Type: UPDATE Message (2)
Withdrawn Routes Length: 0
Total Path Attribute Length: 76
Path attributes
Path Attribute - ORIGIN: IGP
Path Attribute - AS_PATH: 65012
Path Attribute - EXTENDED_COMMUNITIES
Flags: 0xc0, Optional, Transitive,
Complete
Type Code: EXTENDED_COMMUNITIES
(16)
Length: 16
Carried extended communities: (2
communities)
Route Target: 65012:10000
[Transitive 2-Octet AS-Specific]
Encapsulation: VXLAN
Encapsulation [Transitive Opaque]
Path Attribute - PMSI_TUNNEL_ATTRIBUTE
Flags: 0xc0, Optional, Transitive,
Complete
Type Code: PMSI_TUNNEL_ATTRIBUTE
(22)
Length: 9
Flags: 0
Tunnel Type: Ingress Replication
(6)
0000 0000 0010 0111 0001 .... =
MPLS Label: 625
Tunnel ID: tunnel end point ->
192.168.100.1
Path Attribute - MP_REACH_NLRI
Flags: 0x90, Optional,
Extended-Length, Non-transitive, Complete
Type Code: MP_REACH_NLRI (14)
Length: 28
Address family identifier (AFI):
Layer-2 VPN (25)
Subsequent address family
identifier (SAFI): EVPN (70)
Next hop network address (4 bytes)
Number of Subnetwork points of
attachment (SNPA): 0
Network layer reachability
information (19 bytes)
EVPN NLRI: Inclusive Multicast
Route
Route Type: Inclusive
Multicast Route (3)
Length: 17
Route Distinguisher:
0001c0a84d018009 (192.168.77.1:32777)
Ethernet Tag ID: 0
IP Address Length: 32
IPv4 address: 192.168.100.1
Capture 1-3: BGP L2VPN EVPN Route-Type 3 – Inclusive Multicast Route (captured from
BGW-1).
Figure 1-7 illustrates the Multi-Destination forwarding
path. PIM BiDir is used to build a Bidirectional Multicast tree in both
Intra-Sites. Spine switches are defined as Pseudo Rendezvous Point (Pseudo RP) for
Multicast Tree. Site-Local Leaf switches and BGW switches will join the
Multicast tree. On the Common EVPN Domain side, BGW switches located in the different site will form an Ingress-Replication path between each other. BGP L2VPN EVPN Route-Type 3 (Inclusive
Multicast Route) EVPN NLRI is used for
signaling.
In the case
where Leaf-101 receives ingress L2 BUM frame from its connected host from VLAN
10 (VNI10000), it will check the Multicast Group attached to VNI 10000
(238.0.0.10) and sends the frame to the Spine-11 that is Pseudo-RP for group
238.0.0.10. (3) Spine-11 will forward L2 BUM frame out of the interfaces found from
the Outgoing Interface List (OIL) for the Multicast Group
238.0.0.10. The OIL is build based on received PIM Join messages. Both BGW-1
and BGW-2 are joined to Mcast Group so Spine-11 will forward L2 BUM frame to
them. BGW-1 is selected to Designated
Forwarder (DF) for VLAN 10, so it will send L2 BUM frame to BGW-3 over the Ingress-Replication
tunnel formed over Common EVPN domain with VXLAN encapsulation. BGW-2 will not
forward fame. The source IP address used in outer
tunnel IP header is PIP of VGW-1. When BGW-3 receives the frame, it checks the
VNI found from the VXLAN header and de-capsulate the frame. It forwards the
frame to Mcast Group 238.0.0.10 (MG for VNI 10000 also in this site) RP
Spine-12. Spine checks the OIL list and forward frame to Leaf-102.
Figure 1-7: Overall Multi-Destination delivery path.
Fabric Link Failure
In
the case where BGW switch loses all intra-site links, it will stop
advertising Shared Virtual IP (VIP) to its DCI Underlay Network BGP IPv4
Unicast peer. This way it makes sure that other switches do not consider it as a valid next for in ECMP
decision process. In addition, it withdrawn all intra-site
host-related MAC address information
(Route-Type 2). It also stops advertising
itself as an Ingress-Tunnel Endpoint by withdrawing the Inclusive Multicast
Route (Route-Type 3). In addition, it withdrawn the Ethernet Segment Routes
(Route-Type 4) even though they are not used outside the local site. It also withdrawn learned IP prefix
routes (Route-Type 5), excluded locally connected prefixes from ether connected
host or external IPv4 peer.
Figure 1-8: Intra-site fabric-link failure on BGW-1.
Normal
State
Example
1-33 shows the BGP IPv4 Unicast entries installed into DC Core switch BGP table
before fabric-link failure. DC Core Switch has learned the Shared VIP address
used in Site-12 from its’ BGP IPv4 Unicast peer switches BGW-1 and BGW-2.
RouteServer-1# sh ip bgp
BGP routing table information for VRF
default, address family IPv4 Unicast
BGP table version is 21, Local Router ID
is 192.168.77.88
Status: s-suppressed, x-deleted,
S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external,
c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? -
incomplete, | - multipath, & - backup, 2 - best2
Network Next Hop Metric LocPrf
Weight Path
*>e10.1.88.0/24 10.1.88.1 0 0 65012 ?
*>e10.2.88.0/24 10.2.88.2 0 0 65012 ?
*>e10.3.88.0/24 10.3.88.3 0 0 65034 ?
*>e10.88.1.0/24 10.1.88.1 0 0 65012 ?
*>e10.88.2.0/24 10.2.88.2 0 0 65012 ?
*>e10.88.3.0/24 10.3.88.3 0 0 65034 ?
*>e192.168.0.1/32 10.1.88.1 0 0 65012 ?
*>e192.168.0.2/32 10.2.88.2 0 0 65012 ?
*>e192.168.0.3/32 10.3.88.3 0 0 65034 ?
*>e192.168.77.1/32 10.1.88.1 0 0 65012 ?
*>e192.168.77.2/32 10.2.88.2 0 0 65012 ?
*>e192.168.77.3/32 10.3.88.3 0 0 65034 ?
*>r192.168.77.88/32 0.0.0.0 0 100
32768 ?
*>e192.168.88.12/32 10.1.88.1 0 0 65012 ?
*|e 10.2.88.2 0 0 65012 ?
*>e192.168.88.34/32 10.3.88.3 0 0 65034 ?
*>r192.168.88.88/32 0.0.0.0 0 100
32768 ?
*>e192.168.100.1/32 10.1.88.1 0 0 65012 ?
*>e192.168.100.2/32 10.2.88.2 0 0 65012 ?
*>e192.168.100.3/32 10.3.88.3 0 0 65034 ?
Example 1-33 BGP IPv4 Uncast entries in DC Core switch.
Example
1-34 shows the BGP L2VPN EVPN entries installed into BGW-3 BGP table before
fabric-link failure. There is one Route-Type 4 entry (Ethernet Segment Route),
one Route-Type 3 entry (Inclusive Multicast Route) and two Route-Type 2 entries
(MAC Advertisement Route) first one for System MAC and the second one for host Abba.
BGW-3# sh bgp l2vpn evpn
BGP routing table information for VRF
default, address family L2VPN EVPN
BGP table version is 47, Local Router ID
is 192.168.77.3
Status: s-suppressed, x-deleted,
S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external,
c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? -
incomplete, | - multipath, & - backup, 2 - best2
Network Next Hop Metric LocPrf
Weight Path
Route Distinguisher: 192.168.77.1:27001
*>e[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.1]/136
192.168.100.1 0 65088 65012
i
Route Distinguisher: 192.168.77.1:32777
*>e[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216
192.168.100.1 0 65088 65012
i
*>e[3]:[0]:[32]:[192.168.100.1]/88
192.168.100.1 0 65088 65012
i
Route Distinguisher: 192.168.77.2:27001
*>e[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
192.168.77.1 0 65088
65012 i
Route Distinguisher: 192.168.77.2:32777
*>e[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 0 65088 65012
i
*>e[3]:[0]:[32]:[192.168.100.2]/88
192.168.100.2 0 65088 65012
i
Route Distinguisher:
192.168.77.3:27001 (ES
[0300.0000.0000.0c00.0309 0])
*>e[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.1]/136
192.168.100.1 0 65088 65012
i
*>e[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
192.168.77.1 0 65088 65012 i
*>l[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.3]/136
192.168.100.3 100 32768 i
Route Distinguisher:
192.168.77.3:32777 (L2VNI 10000)
*>e[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.88.12 0 65088 65012
i
*>e[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216
192.168.100.1 0 65088 65012
i
*>e[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 0 65088 65012
i
*>l[2]:[0]:[0]:[48]:[5000.0004.0007]:[0]:[0.0.0.0]/216
192.168.100.3 100 32768 i
*>e[3]:[0]:[32]:[192.168.100.1]/88
192.168.100.1 0 65088 65012
i
*>e[3]:[0]:[32]:[192.168.100.2]/88
192.168.100.2 0 65088 65012
i
*>l[3]:[0]:[32]:[192.168.100.3]/88
192.168.100.3 100 32768 i
Route Distinguisher:
192.168.77.101:32777
*>e[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.88.12 0 65088 65012
i
Example 1-34 BGP L2VPN EVPN entries in BGW-3.
The
Protocol, Link and Admin status of Loopback 88 (VIP) is UP.
BGW-1(config-if)# sh ip int bri
IP Interface Status for VRF
"default"(1)
Interface IP Address Interface Status
Lo0 192.168.0.1 protocol-up/link-up/admin-up
Lo77 192.168.77.1 protocol-up/link-up/admin-up
Lo88 192.168.88.12 protocol-up/link-up/admin-up
Lo100 192.168.100.1 protocol-up/link-up/admin-up
Eth1/1 10.1.11.1 protocol-up/link-up/admin-up
Eth1/2 10.1.88.1 protocol-up/link-up/admin-up
Eth1/3 10.11.1.1 protocol-up/link-up/admin-up
Eth1/4 10.88.1.1 protocol-up/link-up/admin-up
Example 1-35 Interface Loopback 88 UP on BGW-1.
Fabric-Link
Failure
The
fabric-link failure is simulated by shutting down the fabric-link Interface
e1/1. When BGW-1 notices this, it changes the Interface Loopback 88 link-state
to down.
BGW-1(config-if)# sh ip int bri
IP Interface Status for VRF
"default"(1)
Interface IP Address Interface Status
Lo0 192.168.0.1 protocol-up/link-up/admin-up
Lo77 192.168.77.1 protocol-up/link-up/admin-up
Lo88 192.168.88.12 protocol-down/link-down/admin-up
Lo100 192.168.100.1 protocol-up/link-up/admin-up
Eth1/1 10.1.11.1 protocol-down/link-down/admin-down
Eth1/2 10.1.88.1 protocol-up/link-up/admin-up
Eth1/3 10.11.1.1 protocol-up/link-up/admin-up
Eth1/4 10.88.1.1 protocol-up/link-up/admin-up
Example 1-36 Interface Loopback 88 DOWN on BGW-1.
Example
1-37 verifies that the Fabric-Link is also down.
BGW-1# sh nve multisite fabric-links
Interface State
--------- -----
Ethernet1/1 Down
Example 1-37 sh nve multisite fabric-links on
BGW-1.
Capture
1-4 shows that BGW-1 sends MP_Unreach_NLRI concerning the IP address of
Loopback 88 to DC Core switch over the BGP IPv4 Unicast peering.
Internet
Protocol Version 4, Src: 10.1.88.1, Dst: 10.1.88.88
Border Gateway Protocol - UPDATE Message
Marker:
ffffffffffffffffffffffffffffffff
Length: 35
Type: UPDATE
Message (2)
Withdrawn
Routes Length: 0
Total Path
Attribute Length: 12
Path
attributes
Path
Attribute - MP_UNREACH_NLRI
Flags: 0x90, Optional, Extended-Length, Non-transitive, Complete
Type
Code: MP_UNREACH_NLRI (15)
Length: 8
Address family identifier (AFI): IPv4 (1)
Subsequent address family identifier (SAFI): Unicast (1)
Withdrawn routes (5 bytes)
192.168.88.12/32
Capture 1-4: BGP L2VPN EVPN Route-Type 3 – Inclusive Multicast Route (captured from
BGW1).
As
a result, the DC Core switch removes the routing entry from its BGP IPv4 table (example 1-38).
RouteServer-1# sh ip bgp
BGP routing table information for VRF
default, address family IPv4 Unicast
BGP table version is 22, Local Router ID
is 192.168.77.88
Status: s-suppressed, x-deleted,
S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external,
c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? -
incomplete, | - multipath, & - backup, 2 - best2
Network Next Hop Metric LocPrf
Weight Path
*>e10.1.88.0/24 10.1.88.1 0 0 65012 ?
*>e10.2.88.0/24 10.2.88.2 0 0 65012 ?
*>e10.3.88.0/24 10.3.88.3 0 0 65034 ?
*>e10.88.1.0/24 10.1.88.1 0 0 65012 ?
*>e10.88.2.0/24 10.2.88.2 0 0 65012 ?
*>e10.88.3.0/24 10.3.88.3 0 0 65034 ?
*>e192.168.0.1/32 10.1.88.1 0 0 65012 ?
*>e192.168.0.2/32 10.2.88.2 0 0 65012 ?
*>e192.168.0.3/32 10.3.88.3 0 0 65034 ?
*>e192.168.77.1/32 10.1.88.1 0 0 65012 ?
*>e192.168.77.2/32 10.2.88.2 0 0 65012 ?
*>e192.168.77.3/32 10.3.88.3 0 0 65034 ?
*>r192.168.77.88/32 0.0.0.0 0
100 32768 ?
*>e192.168.88.12/32 10.2.88.2 0 0 65012 ?
*>e192.168.88.34/32 10.3.88.3 0 0 65034 ?
*>r192.168.88.88/32 0.0.0.0 0 100
32768 ?
*>e192.168.100.1/32 10.1.88.1 0 0 65012 ?
*>e192.168.100.2/32 10.2.88.2 0 0 65012 ?
*>e192.168.100.3/32 10.3.88.3 0 0 65034 ?
Example 1-38 Loopback88 of BGW-1 removed from the BGP IPv4 table of DC Core switch.
BGW-1
has also withdrawn all Route-type 2-5. Example 1-39 shows that route is removed from BGW-3 BGP L2VPN EVPN table.
BGW-3# sh bgp l2vpn evpn
BGP routing table information for VRF
default, address family L2VPN EVPN
BGP table version is 87, Local Router ID
is 192.168.77.3
Status: s-suppressed, x-deleted,
S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external,
c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? -
incomplete, | - multipath, & - backup, 2 - best2
Network Next Hop Metric LocPrf
Weight Path
Route Distinguisher: 192.168.77.2:27001
*>e[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
192.168.100.2 0 65088 65012
i
Route Distinguisher: 192.168.77.2:32777
*>e[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 0 65088 65012
i
*>e[3]:[0]:[32]:[192.168.100.2]/88
192.168.100.2 0 65088 65012
i
Route Distinguisher:
192.168.77.3:27001 (ES [0300.0000.0000.0c00.0309
0])
*>e[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
192.168.100.2 0 65088 65012
i
*>l[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.3]/136
192.168.100.3 100 32768 i
Route Distinguisher:
192.168.77.3:32777 (L2VNI 10000)
*>e[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.88.12 0 65088 65012
i
*>e[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 0 65088 65012
i
*>l[2]:[0]:[0]:[48]:[5000.0004.0007]:[0]:[0.0.0.0]/216
192.168.100.3 100 32768 i
*>e[3]:[0]:[32]:[192.168.100.2]/88
192.168.100.2 0 65088 65012
i
*>l[3]:[0]:[32]:[192.168.100.3]/88
192.168.100.3 100 32768 i
Route Distinguisher: 192.168.77.101:32777
*>e[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.88.12 0 65088 65012
i
Example 1-39 BGP table of BGW-3 after fabric-link failure in BGW-1.
Fabric-Link
Recovery
When
fabric-link is brought back up on BGW-1, the Admin state is changed to UP state while the Operational
state is still kept on DOWN state. BGW-1 starts the Delay-Restore Timer as can be seen from the example 1-40 and 1-41.
BGW-1# show nve interface nve 1 detail
Interface: nve1, State: Up,
encapsulation: VXLAN
VPC Capability: VPC-VIP-Only [not-notified]
Local Router MAC: 5000.0002.0007
Host Learning Mode: Control-Plane
Source-Interface: loopback100 (primary:
192.168.100.1, secondary: 0.0.0.0)
Source Interface State: Up
Virtual RMAC Advertisement: No
NVE Flags:
Interface Handle: 0x49000001
Source Interface hold-down-time: 180
Source Interface hold-up-time: 30
Remaining hold-down time: 0 seconds
Virtual Router MAC: N/A
Virtual Router MAC Re-origination:
0200.c0a8.580c
Interface state: nve-intf-add-complete
Multisite delay-restore time: 300 seconds
Multisite delay-restore time left: 236 seconds
Multisite bgw-if: loopback88 (ip:
192.168.88.12, admin: Up, oper: Down)
Multisite
bgw-if oper down reason:
Example 1-40 Delay Restore Timer on BGW-1.
BGW-1# show nve interface nve 1 detail | i Multisite
Multisite delay-restore time: 300 seconds
Multisite delay-restore time left: 20 seconds
Multisite bgw-if:
loopback88 (ip: 192.168.88.12, admin: Up, oper: Down)
Multisite bgw-if
oper down reason:
Example 1-41 Delay Restore Timer on BGW-1.
After
300 seconds, BGW-1 change the Operational state of Interface Loopback to UP state
as shown in examples 1-42 and 1-43.
BGW-1# show nve interface nve 1 detail | i Multisite
Multisite delay-restore time: 300 seconds
Multisite delay-restore time left: 0 seconds
Multisite bgw-if:
loopback88 (ip: 192.168.88.12, admin: Up, oper: Up)
Multisite bgw-if
oper down reason:
Example 1-42 Delay Restore Timer on BGW-1.
BGW-1# sh ip int bri
IP Interface Status for VRF
"default"(1)
Interface IP Address Interface Status
Lo0 192.168.0.1 protocol-up/link-up/admin-up
Lo77 192.168.77.1 protocol-up/link-up/admin-up
Lo88 192.168.88.12 protocol-up/link-up/admin-up
Lo100 192.168.100.1 protocol-up/link-up/admin-up
Eth1/1 10.1.11.1 protocol-up/link-up/admin-up
Eth1/2 10.1.88.1 protocol-up/link-up/admin-up
Eth1/3 10.11.1.1 protocol-up/link-up/admin-up
Eth1/4 10.88.1.1 protocol-up/link-up/admin-up
Example 1-43 Loopback 88 ststus after recovery on BGW-1.
The
network has recovered as can be seen from
the examples 1-44 and 1-45.
RouteServer-1# sh ip bgp
BGP routing table information for VRF
default, address family IPv4 Unicast
BGP table version is 23, Local Router ID
is 192.168.77.88
Status: s-suppressed, x-deleted,
S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external,
c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? -
incomplete, | - multipath, & - backup, 2 - best2
Network Next Hop Metric LocPrf
Weight Path
*>e10.1.88.0/24 10.1.88.1 0 0 65012 ?
*>e10.2.88.0/24 10.2.88.2 0 0 65012 ?
*>e10.3.88.0/24 10.3.88.3 0 0 65034 ?
*>e10.88.1.0/24 10.1.88.1 0 0 65012 ?
*>e10.88.2.0/24 10.2.88.2 0 0 65012 ?
*>e10.88.3.0/24 10.3.88.3 0 0 65034 ?
*>e192.168.0.1/32 10.1.88.1 0 0 65012 ?
*>e192.168.0.2/32 10.2.88.2 0 0 65012 ?
*>e192.168.0.3/32 10.3.88.3 0 0 65034 ?
*>e192.168.77.1/32 10.1.88.1 0 0 65012 ?
*>e192.168.77.2/32 10.2.88.2 0 0 65012 ?
*>e192.168.77.3/32 10.3.88.3 0 0 65034 ?
*>r192.168.77.88/32 0.0.0.0 0 100
32768 ?
*|e192.168.88.12/32 10.1.88.1 0 0 65012 ?
*>e 10.2.88.2 0 0 65012 ?
*>e192.168.88.34/32 10.3.88.3 0 0 65034 ?
*>r192.168.88.88/32 0.0.0.0 0 100
32768 ?
*>e192.168.100.1/32 10.1.88.1 0 0 65012 ?
*>e192.168.100.2/32 10.2.88.2 0 0 65012 ?
*>e192.168.100.3/32 10.3.88.3 0 0 65034 ?
Example 1-44:BGP IPv4 table
on DC Core Switch after recovery.
BGW-3# sh bgp l2vpn evpn
BGP routing table information for VRF
default, address family L2VPN EVPN
BGP table version is 103, Local Router
ID is 192.168.77.3
Status: s-suppressed, x-deleted,
S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external,
c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? -
incomplete, | - multipath, & - backup, 2 - best2
Network Next Hop Metric LocPrf
Weight Path
Route Distinguisher: 192.168.77.1:27001
*>e[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.1]/136
192.168.100.1 0 65088 65012
i
Route Distinguisher: 192.168.77.1:32777
*>e[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216
192.168.100.1 0 65088 65012
i
*>e[3]:[0]:[32]:[192.168.100.1]/88
192.168.100.1 0 65088 65012
i
Route Distinguisher: 192.168.77.2:27001
*>e[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
192.168.100.2 0 65088 65012
i
Route Distinguisher: 192.168.77.2:32777
*>e[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 0 65088 65012
i
*>e[3]:[0]:[32]:[192.168.100.2]/88
192.168.100.2 0 65088 65012
i
Route Distinguisher:
192.168.77.3:27001 (ES [0300.0000.0000.0c00.0309
0])
*>e[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.1]/136
192.168.100.1 0 65088 65012
i
*>e[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
192.168.100.2 0 65088 65012
i
*>l[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.3]/136
192.168.100.3 100 32768 i
Route Distinguisher:
192.168.77.3:32777 (L2VNI 10000)
*>e[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.88.12 0 65088 65012
i
*>e[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216
192.168.100.1 0 65088 65012 i
*>e[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 0 65088 65012
i
*>l[2]:[0]:[0]:[48]:[5000.0004.0007]:[0]:[0.0.0.0]/216
192.168.100.3 100 32768 i
*>e[3]:[0]:[32]:[192.168.100.1]/88
192.168.100.1 0 65088 65012
i
*>e[3]:[0]:[32]:[192.168.100.2]/88
192.168.100.2 0 65088 65012
i
*>l[3]:[0]:[32]:[192.168.100.3]/88
192.168.100.3 100 32768 i
Route Distinguisher:
192.168.77.101:32777
*>e[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.88.12 0 65088 65012
i
Example 1-45:BGP L2VPN EVPN
table on DC Core Switch after recovery.
DCI-Link
Failure
When
all of the DCI links of BGW are down, it stops
advertising VIP address to Intra-Site peer just like in case of previously
discussed Fabric-Link failure. Naturally, it also stops advertising routes learned via DCI link due to link failure.
What it still does, it continues acting as a regular Leaf switch. If it has
connected hosts or external peers, it continues to advertise prefix
attached/learned from those.
Figure 1-9: Inter-Site DCI-link failure on BGW-1.
Normal
State
Example
1-46 shows that Spine-11 has learned Site-12 Shared VIP from both Intra-Site
BGW switches via OSPF (Underlay Network).
Spine-11# sh ip route 192.168.88.12
IP Route Table for VRF
"default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes
VRF <string>
192.168.88.12/32, ubest/mbest:
2/0
*via 10.1.11.1, Eth1/1, [110/41], 01:28:36, ospf-UNDERLAY-NET, intra
*via 10.2.11.2, Eth1/2, [110/41], 01:28:36, ospf-UNDERLAY-NET, intra
Example 1-46:RIB on Spine-11 in normal situation.
Example
1-47 shows that Spine-11 use both BGW-1 and BG-2 for load sharing data to
Inter-Site. Note that the MAC address 5000.0004.0007 is the System MAC address
of BGW-3 on Site-34.
Spine-11# sh bgp l2vpn evpn
BGP routing table information for VRF
default, address family L2VPN EVPN
BGP table version is 122, Local Router
ID is 192.168.77.11
Status: s-suppressed, x-deleted,
S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external,
c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? -
incomplete, | - multipath, & - backup, 2 - best2
Network Next Hop Metric LocPrf
Weight Path
Route Distinguisher: 192.168.77.1:27001
*>i[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.1]/136
192.168.100.1 100 0 i
Route Distinguisher: 192.168.77.1:32777
*>i[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216
192.168.100.1 100 0 i
Route Distinguisher: 192.168.77.2:27001
*>i[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
192.168.100.2 100 0 i
Route Distinguisher: 192.168.77.2:32777
*>i[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 100 0 i
Route
Distinguisher: 192.168.77.3:32777
*>i[2]:[0]:[0]:[48]:[5000.0004.0007]:[0]:[0.0.0.0]/216
192.168.88.12 100 0 65088 65034 i
* i 192.168.88.12 100 0 65088 65034 i
Route Distinguisher:
192.168.77.101:32777
*>i[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.100.101 100 0 i
*>i[2]:[0]:[0]:[48]:[1000.0010.abba]:[32]:[172.16.10.101]/272
192.168.100.101 100 0 i
Example 1-47: BGP L2VPN EVPN table on Spine-11 in normal situation.
DCI
Link Failure
The
DCI link failure is demonstrated by shutting down the DCI interface e1/2. The
state of the link is verified on example 1-48 and 1-49.
BGW-1# sh nve multisite dci-links
Interface State
--------- -----
Ethernet1/2 Down
Example 1-48: sh nve multisite dci-links on Spine-11.
BGW-1# sh ip int bri
IP Interface Status for VRF
"default"(1)
Interface IP Address Interface Status
Lo0 192.168.0.1 protocol-up/link-up/admin-up
Lo77 192.168.77.1 protocol-up/link-up/admin-up
Lo88 192.168.88.12 protocol-down/link-down/admin-up
Lo100 192.168.100.1 protocol-up/link-up/admin-up
Eth1/1 10.1.11.1 protocol-up/link-up/admin-up
Eth1/2 10.1.88.1 protocol-down/link-down/admin-down
Eth1/3 10.11.1.1 protocol-up/link-up/admin-up
Eth1/4 10.88.1.1 protocol-up/link-up/admin-up
Example 1-49: sh ip int bri on BGW-1.
Example
1-50 below shows that reason for
Down-state is “DCI Isolated”.
BGW-1# show nve interface nve 1 detail
Interface: nve1, State: Up,
encapsulation: VXLAN
VPC Capability: VPC-VIP-Only [not-notified]
Local Router MAC: 5000.0002.0007
Host Learning Mode: Control-Plane
Source-Interface: loopback100 (primary:
192.168.100.1, secondary: 0.0.0.0)
Source Interface State: Up
Virtual RMAC Advertisement: No
NVE Flags:
Interface Handle: 0x49000001
Source Interface hold-down-time: 180
Source Interface hold-up-time: 30
Remaining hold-down time: 0 seconds
Virtual Router MAC: N/A
Virtual Router MAC Re-origination:
0200.c0a8.580c
Interface state: nve-intf-add-complete
Multisite delay-restore time: 300 seconds
Multisite delay-restore time left: 0 seconds
Multisite bgw-if: loopback88 (ip:
192.168.88.12, admin: Up, oper: Down)
Multisite bgw-if oper down reason: DCI isolated.
Example 1-50: show nve interface nve 1 detail on Spine-11.
BGW-1
withdrawn the VIP and now Spine-11 has only one destination to Intra-Site VIP
address via BGW-2.
Spine-11# sh ip route 192.168.88.12
IP Route Table for VRF
"default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes
VRF <string>
192.168.88.12/32, ubest/mbest:
1/0
*via 10.2.11.2, Eth1/2, [110/41], 00:00:45, ospf-UNDERLAY-NET, intra
Example 1-51: show ip route 192.168.88.12 on Spine-11.
BGW-1 also withdrawn all Route-Type 2-5 routes received via DCI link. Now
Spine-11 learns Inter-Site routes only via BGW-2.
Spine-11# sh bgp l2vpn evpn
BGP routing table information for VRF
default, address family L2VPN EVPN
BGP table version is 95, Local Router ID
is 192.168.77.11
Status: s-suppressed, x-deleted,
S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external,
c-confed, l-local, a-aggregate, r-redist, I-injected
Origin codes: i - IGP, e - EGP, ? -
incomplete, | - multipath, & - backup, 2 - best2
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 192.168.77.1:32777
*>i[2]:[0]:[0]:[48]:[5000.0002.0007]:[0]:[0.0.0.0]/216
192.168.100.1 100 0 i
Route Distinguisher: 192.168.77.2:27001
*>i[4]:[0300.0000.0000.0c00.0309]:[32]:[192.168.100.2]/136
192.168.100.2 100 0 i
Route Distinguisher: 192.168.77.2:32777
*>i[2]:[0]:[0]:[48]:[5000.0003.0007]:[0]:[0.0.0.0]/216
192.168.100.2 100 0 i
Route
Distinguisher: 192.168.77.3:32777
*>i[2]:[0]:[0]:[48]:[5000.0004.0007]:[0]:[0.0.0.0]/216
192.168.88.12 100 0 65088 65034 i
Route Distinguisher:
192.168.77.101:32777
*>i[2]:[0]:[0]:[48]:[1000.0010.abba]:[0]:[0.0.0.0]/216
192.168.100.101 100 0 i
*>i[2]:[0]:[0]:[48]:[1000.0010.abba]:[32]:[172.16.10.101]/272
192.168.100.101 100 0 i
Example 1-52: sh bgp l2vpn evpn on Spine-11.
DCI
Link Recovery
The
recovery process is the same than in the case
of Fabric-Link failure. BGW-1 starts a Delay-Restore timer that is set to 300
seconds.
BGW-1# show nve interface nve 1 detail
Interface: nve1, State: Up,
encapsulation: VXLAN
VPC Capability: VPC-VIP-Only [not-notified]
Local Router MAC: 5000.0002.0007
Host Learning Mode: Control-Plane
Source-Interface: loopback100 (primary:
192.168.100.1, secondary: 0.0.0.0)
Source Interface State: Up
Virtual RMAC Advertisement: No
NVE Flags:
Interface Handle: 0x49000001
Source Interface hold-down-time: 180
Source Interface hold-up-time: 30
Remaining hold-down time: 0 seconds
Virtual Router MAC: N/A
Virtual Router MAC Re-origination:
0200.c0a8.580c
Interface state: nve-intf-add-complete
Multisite delay-restore time: 300 seconds
Multisite delay-restore time left: 295 seconds
Multisite bgw-if: loopback88 (ip:
192.168.88.12, admin: Up, oper: Down)
Multisite bgw-if oper down reason:
Example 1-53: Delay-restore timer start on BGW-1.
BGW-1
change the interface Loopback 88 status to UP after 300 seconds and start the
normal operation.
BGW-1# show nve interface nve 1 detail | i Multisite
Multisite delay-restore time: 300 seconds
Multisite delay-restore time left: 0 seconds
Multisite bgw-if:
loopback88 (ip: 192.168.88.12, admin: Up, oper: Up)
Multisite bgw-if
oper down reason:
Example 1-54: Delay-restore timer stop on BGW-1.
Note
that during failure, the BGW-2 will take over the
Designated Forwarder role for all Intra-Site
VNIs.
Author:
Toni
Pasanen CCIE#28158
Published:
6
- August 2019
References
Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS
Perspective
ISBN-10: 1-58714-467-0 – Krattiger Lukas, Shyam Kapadia, and Jansen Davis
Internet Engineering Task
Force (IETF): Multicast in MPLS/BGP IP VPNs. 2012
Internet Engineering Task
Force (IETF): BGP Encodings and Procedures for Multicast in MPLS/BGP IP VPNs.
2012
Internet Engineering Task Force (IETF): BGP MPLS-Based
Ethernet VPN. 2015
BESS Working Group: Multi-site EVPN based VXLAN using
Border Gateways. 2018
Cisco.com: VXLAN EVPN
Multi-Site Design and Deployment:
Appendix
A.
Configuration
files for BGW switches and DC Core switch
BGW-1
Configuration
BGW-1# sh run
!Command: show running-config
!Running configuration last done at: Wed
Aug 7 09:19:25 2019
!Time: Wed Aug 7 09:21:43 2019
version 9.2(3) Bios:version
hostname BGW-1
vdc BGW-1 id 1
limit-resource vlan minimum 16 maximum 4094
limit-resource vrf minimum 2 maximum 4096
limit-resource port-channel minimum 0 maximum 511
limit-resource u4route-mem minimum 248 maximum 248
limit-resource u6route-mem minimum 96 maximum 96
limit-resource m4route-mem minimum 58 maximum 58
limit-resource m6route-mem minimum 8 maximum 8
nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature lacp
feature nv overlay
username admin password 5
$5$YTfyrnCx$D0BEzwcJJWm/PRjj/ykdkAySBr/9B6dsou/NWEAm6D
4
role network-admin
ip domain-lookup
copp profile strict
evpn multisite border-gateway 12
delay-restore time 300
snmp-server user admin network-admin
auth md5 0x42cd35684f49b26fca133253a1e0519d
priv 0x42cd35684f49b26fca133253a1e0519d
localizedkey
rmon event 1 description FATAL(1) owner
PMON@FATAL
rmon event 2 description CRITICAL(2)
owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner
PMON@ERROR
rmon event 4 description WARNING(4)
owner PMON@WARNING
rmon event 5 description INFORMATION(5)
owner PMON@INFO
fabric forwarding anycast-gateway-mac
0001.0001.0001
ip pim rp-address 192.168.238.1
group-list 238.0.0.0/24 bidir
ip pim ssm range 232.0.0.0/8
vlan 1,10,30,40,50,77
vlan 10
name L2VNI-for-VLAN10
vn-segment 10000
vlan 30
vn-segment 30000
vlan 40
vn-segment 40000
vlan 50
vn-segment 50000
vlan 77
name TENANT77
vn-segment 10077
route-map REDIST-TO-SITE-EXT-DCI permit
10
match tag 1234
vrf context TENANT77
vni 10077
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
vrf context management
hardware access-list tcam region racl
512
hardware access-list tcam region
vpc-convergence 256
hardware access-list tcam region
arp-ether 256 double-wide
interface Vlan1
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback100
multisite border-gateway interface loopback88
member vni 10000
multisite ingress-replication
mcast-group 238.0.0.10
member vni 10077 associate-vrf
member vni 30000
mcast-group 238.0.0.10
member vni 40000
mcast-group 238.0.0.10
member vni 50000
mcast-group 238.0.0.10
interface Ethernet1/1
description **Fabric Internal **
no switchport
mac-address b063.0001.1e11
medium p2p
ip address 10.1.11.1/24
ip ospf network point-to-point
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
evpn multisite fabric-tracking
no shutdown
interface Ethernet1/2
description ** DCI Interface **
no switchport
mac-address b063.0001.1e12
medium p2p
ip address 10.1.88.1/24 tag 1234
ip pim sparse-mode
evpn multisite dci-tracking
no shutdown
interface Ethernet1/3
description **Fabric Internal **
no switchport
mac-address b063.0001.1e13
medium p2p
ip address 10.11.1.1/24
ip ospf network point-to-point
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
no shutdown
interface Ethernet1/4
description ** DCI Interface **
no switchport
mac-address b063.0001.1e14
medium p2p
ip address 10.88.1.1/24 tag 1234
no shutdown
interface mgmt0
vrf member management
interface loopback0
description ** RID/Underlay **
ip address 192.168.0.1/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
interface loopback77
description ** BGP peering **
ip address 192.168.77.1/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
interface loopback88
description ** VIP for DCI-Inter-connect **
ip address 192.168.88.12/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
interface loopback100
description ** VTEP/Overlay **
ip address 192.168.100.1/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
line console
line vty
boot nxos bootflash:/nxos.9.2.3.bin
router ospf UNDERLAY-NET
router-id 192.168.0.1
router bgp 65012
router-id 192.168.77.1
no enforce-first-as
address-family ipv4 unicast
redistribute direct route-map REDIST-TO-SITE-EXT-DCI
address-family l2vpn evpn
neighbor 10.1.88.88
remote-as 65088
update-source Ethernet1/2
address-family ipv4 unicast
neighbor 192.168.77.11
remote-as 65012
description ** Spine-11 BGP-RR **
update-source loopback77
address-family l2vpn evpn
send-community extended
neighbor 192.168.77.88
remote-as 65088
update-source loopback77
ebgp-multihop 5
peer-type fabric-external
address-family l2vpn evpn
send-community
send-community extended
rewrite-evpn-rt-asn
vrf TENANT77
address-family ipv4 unicast
advertise l2vpn evpn
evpn
vni 10000 l2
rd auto
route-target import auto
route-target export auto
vni 30000 l2
rd auto
route-target import auto
route-target export auto
vni 40000 l2
rd auto
route-target import auto
route-target export auto
vni 50000 l2
rd auto
route-target import auto
route-target export auto
BGW-2
Configuration
BGW-2# sh run
!Command: show running-config
!Running configuration last done at: Wed
Aug 7 09:19:31 2019
!Time: Wed Aug 7 09:24:10 2019
version 9.2(3) Bios:version
hostname BGW-2
vdc BGW-2 id 1
limit-resource vlan minimum 16 maximum 4094
limit-resource vrf minimum 2 maximum 4096
limit-resource port-channel minimum 0 maximum 511
limit-resource u4route-mem minimum 248 maximum 248
limit-resource u6route-mem minimum 96 maximum 96
limit-resource m4route-mem minimum 58 maximum 58
limit-resource m6route-mem minimum 8 maximum 8
nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature lacp
feature nv overlay
username admin password 5
$5$6O5Ozded$6G9z9ZYJnto10KgJSqYou0dZilxI2abRLQOgpBTzu8
A
role network-admin
ip domain-lookup
copp profile strict
evpn multisite border-gateway 12
delay-restore time 300
snmp-server user admin network-admin
auth md5 0x9bcc18427d4176f2aec8419a200a8bbf
priv 0x9bcc18427d4176f2aec8419a200a8bbf
localizedkey
rmon event 1 description FATAL(1) owner
PMON@FATAL
rmon event 2 description CRITICAL(2)
owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner
PMON@ERROR
rmon event 4 description WARNING(4)
owner PMON@WARNING
rmon event 5 description INFORMATION(5)
owner PMON@INFO
fabric forwarding anycast-gateway-mac
0001.0001.0001
ip pim rp-address 192.168.238.1
group-list 238.0.0.0/24 bidir
ip pim ssm range 232.0.0.0/8
vlan 1,10,30,40,50,77
vlan 10
name L2VNI-for-VLAN10
vn-segment 10000
vlan 30
vn-segment 30000
vlan 40
vn-segment 40000
vlan 50
vn-segment 50000
vlan 77
name TENANT77
vn-segment 10077
route-map REDIST-TO-SITE-EXT-DCI permit
10
match tag 1234
vrf context TENANT77
vni 10077
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
vrf context management
hardware access-list tcam region racl
512
hardware access-list tcam region
vpc-convergence 256
hardware access-list tcam region
arp-ether 256 double-wide
interface Vlan1
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback100
multisite border-gateway interface loopback88
member vni 10000
multisite ingress-replication
mcast-group 238.0.0.10
member vni 10077 associate-vrf
member vni 30000
mcast-group 238.0.0.10
member vni 40000
mcast-group 238.0.0.10
member vni 50000
mcast-group 238.0.0.10
interface Ethernet1/1
description **Fabric Internal **
no switchport
mac-address b063.0002.1e11
medium p2p
ip address 10.2.11.2/24
ip ospf network point-to-point
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
evpn multisite fabric-tracking
no shutdown
interface Ethernet1/2
description ** DCI Interface **
no switchport
mac-address b063.0002.1e12
medium p2p
ip address 10.2.88.2/24 tag 1234
ip ospf network point-to-point
ip pim sparse-mode
evpn multisite dci-tracking
no shutdown
interface Ethernet1/3
description **Fabric Internal **
no switchport
mac-address b063.0002.1e13
medium p2p
ip address 10.11.2.2/24
ip ospf network point-to-point
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
no shutdown
interface Ethernet1/4
description ** DCI Interface **
no switchport
mac-address b063.0002.1e14
medium p2p
ip address 10.88.2.2/24 tag 1234
no shutdown
interface mgmt0
vrf member management
interface loopback0
description ** RID/Underlay **
ip address 192.168.0.2/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
interface loopback77
description ** BGP peering **
ip address 192.168.77.2/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
interface loopback88
description ** VIP for DCI-Inter-connect **
ip address 192.168.88.12/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
interface loopback100
description ** VTEP/Overlay **
ip address 192.168.100.2/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
line console
line vty
boot nxos bootflash:/nxos.9.2.3.bin
router ospf UNDERLAY-NET
router-id 192.168.0.2
router bgp 65012
router-id 192.168.77.2
no enforce-first-as
address-family ipv4 unicast
redistribute direct route-map REDIST-TO-SITE-EXT-DCI
address-family l2vpn evpn
neighbor 10.2.88.88
remote-as 65088
update-source Ethernet1/2
peer-type fabric-external
address-family ipv4 unicast
neighbor 192.168.77.11
remote-as 65012
description ** Spine-11 BGP-RR **
update-source loopback77
address-family l2vpn evpn
send-community extended
neighbor 192.168.77.88
remote-as 65088
update-source loopback77
ebgp-multihop 5
peer-type fabric-external
address-family l2vpn evpn
send-community
send-community extended
rewrite-evpn-rt-asn
vrf TENANT77
address-family ipv4 unicast
advertise l2vpn evpn
evpn
vni 10000 l2
rd auto
route-target import auto
route-target export auto
vni 30000 l2
rd auto
route-target import auto
route-target export auto
vni 40000 l2
rd auto
route-target import auto
route-target export auto
vni 50000 l2
rd auto
route-target import auto
route-target export auto
BGW-3
Configuration
BGW-3# sh run
!Command: show running-config
!No configuration change since last
restart
!Time: Wed Aug 7 09:36:25 2019
version 9.2(3) Bios:version
hostname BGW-3
vdc BGW-3 id 1
limit-resource vlan minimum 16 maximum 4094
limit-resource vrf minimum 2 maximum 4096
limit-resource port-channel minimum 0 maximum 511
limit-resource u4route-mem minimum 248 maximum 248
limit-resource u6route-mem minimum 96 maximum 96
limit-resource m4route-mem minimum 58 maximum 58
limit-resource m6route-mem minimum 8 maximum 8
nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature lacp
feature nv overlay
username admin password 5
$5$O9jHouJ4$gMMf.hMYXJRamUNys17VtdztzLMNq1PdMQDIc1xPZu
9
role network-admin
ip domain-lookup
copp profile strict
evpn multisite border-gateway 12
delay-restore time 300
snmp-server user admin network-admin
auth md5 0x423cb9002003f0f3c3acb917bba00bf8
priv 0x423cb9002003f0f3c3acb917bba00bf8
localizedkey
rmon event 1 description FATAL(1) owner
PMON@FATAL
rmon event 2 description CRITICAL(2)
owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner
PMON@ERROR
rmon event 4 description WARNING(4)
owner PMON@WARNING
rmon event 5 description INFORMATION(5)
owner PMON@INFO
fabric forwarding anycast-gateway-mac
0001.0001.0001
ip pim rp-address 192.168.238.1 group-list
238.0.0.0/24 bidir
ip pim ssm range 232.0.0.0/8
vlan 1,10,77
vlan 10
name L2VNI-for-VLAN10
vn-segment 10000
vlan 77
name TENANT77
vn-segment 10077
route-map REDIST-TO-SITE-EXT-DCI permit
10
match tag 1234
vrf context TENANT77
vni 10077
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
vrf context management
hardware access-list tcam region racl
512
hardware access-list tcam region
vpc-convergence 256
hardware access-list tcam region arp-ether
256 double-wide
interface Vlan1
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback100
multisite border-gateway interface loopback88
member vni 10000
multisite ingress-replication
mcast-group 238.0.0.10
member vni 10077 associate-vrf
interface Ethernet1/1
description **Fabric Internal **
no switchport
mac-address b063.0003.1e11
medium p2p
ip address 10.3.12.3/24
ip ospf network point-to-point
ip router ospf UNDERLAY-NET area 0.0.0.0
ip
pim sparse-mode
evpn multisite fabric-tracking
no shutdown
interface Ethernet1/2
description ** DCI Interface **
no switchport
mac-address b063.0003.1e12
medium p2p
ip address 10.3.88.3/24 tag 1234
evpn multisite dci-tracking
no shutdown
interface Ethernet1/3
description **Fabric Internal **
no switchport
mac-address b063.0003.1e13
medium p2p
ip address 10.12.3.3/24
ip ospf network point-to-point
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
no shutdown
interface Ethernet1/4
description ** DCI Interface **
no switchport
mac-address b063.0003.1e14
medium p2p
ip address 10.88.3.3/24 tag 1234
no shutdown
interface mgmt0
vrf member management
interface loopback0
description ** RID/Underlay **
ip address 192.168.0.3/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
interface loopback77
description ** BGP peering **
ip address 192.168.77.3/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
interface loopback88
description ** VIP for DCI-Inter-connect **
ip address 192.168.88.34/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
interface loopback100
description ** VTEP/Overlay **
ip address 192.168.100.3/32 tag 1234
ip router ospf UNDERLAY-NET area 0.0.0.0
ip pim sparse-mode
line console
line vty
boot nxos bootflash:/nxos.9.2.3.bin
router ospf UNDERLAY-NET
router-id 192.168.0.3
router bgp 65034
router-id 192.168.77.3
no enforce-first-as
address-family ipv4 unicast
redistribute direct route-map REDIST-TO-SITE-EXT-DCI
maximum-paths 5
maximum-paths ibgp 5
address-family l2vpn evpn
neighbor 10.3.88.88
remote-as 65088
update-source Ethernet1/2
address-family ipv4 unicast
neighbor 10.88.3.88
remote-as 65088
update-source Ethernet1/4
address-family ipv4 unicast
neighbor 192.168.77.12
remote-as 65034
description ** Spine-11 BGP-RR **
update-source loopback77
address-family l2vpn evpn
send-community extended
neighbor 192.168.77.88
remote-as 65088
update-source loopback77
ebgp-multihop 5
peer-type fabric-external
address-family l2vpn evpn
send-community
send-community extended
rewrite-evpn-rt-asn
vrf TENANT77
address-family ipv4 unicast
advertise l2vpn evpn
evpn
vni 10000 l2
rd auto
route-target import auto
route-target export auto
DC
Core switch (RouteServer) Configuration
RouteServer-1# sh run
!Command: show running-config
!No configuration change since last
restart
!Time: Wed Aug 7 09:38:18 2019
version 9.2(3) Bios:version
hostname RouteServer-1
vdc RouteServer-1 id 1
limit-resource vlan minimum 16 maximum 4094
limit-resource vrf minimum 2 maximum 4096
limit-resource port-channel minimum 0 maximum 511
limit-resource u4route-mem minimum 128 maximum 128
limit-resource u6route-mem minimum 96 maximum 96
limit-resource m4route-mem minimum 58 maximum 58
limit-resource m6route-mem minimum 8 maximum 8
nv overlay evpn
feature bgp
feature nv overlay
username admin password 5
$5$SAAwN66P$OSzsu5lztjirsP.UM0bkhSXhjkAqAnymcN0jNUwNc3
8
role network-admin
ip domain-lookup
copp profile strict
snmp-server user admin network-admin
auth md5 0x842c130e837d0182abbfc3c8010e25f1
priv 0x842c130e837d0182abbfc3c8010e25f1 localizedkey
rmon event 1 description FATAL(1) owner
PMON@FATAL
rmon event 2 description CRITICAL(2)
owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner
PMON@ERROR
rmon event 4 description WARNING(4)
owner PMON@WARNING
rmon event 5 description INFORMATION(5)
owner PMON@INFO
vlan 1
route-map REDIST-TO-SITE-EXT-DCI permit
10
match tag 1234
route-map RETAIN-NEXT-HOP permit 10
set ip next-hop unchanged
vrf context abba
address-family ipv4 unicast
route-target import 65088:1
route-target export 65088:1
route-target both auto
vrf context beef
address-family ipv4 unicast
route-target import 65088:2
route-target export 65088:2
vrf context management
hardware access-list tcam region racl
512
hardware access-list tcam region vpc-convergence
256
hardware access-list tcam region
arp-ether 256 double-wide
interface Ethernet1/1
description ** to BGW-1 **
no switchport
ip address 10.1.88.88/24
no shutdown
interface Ethernet1/2
description ** to BGW-2 **
no switchport
ip address 10.2.88.88/24
no shutdown
interface Ethernet1/3
description ** to BGW-3 **
no switchport
ip address 10.3.88.88/24
no shutdown
interface Ethernet1/4
description ** to BGW-4 **
no switchport
ip address 10.4.88.88/24
no shutdown
interface mgmt0
vrf member management
interface loopback77
ip address 192.168.77.88/32 tag 1234
interface loopback88
ip address 192.168.88.88/32 tag 1234
line console
line vty
boot nxos bootflash:/nxos.9.2.3.bin
router bgp 65088
router-id 192.168.77.88
address-family ipv4 unicast
redistribute direct route-map REDIST-TO-SITE-EXT-DCI
maximum-paths 2
address-family l2vpn evpn
maximum-paths 2
maximum-paths ibgp 2
retain route-target all
template peer MULTI-SITE-OVERLAY-PEERING
update-source loopback77
ebgp-multihop 5
address-family l2vpn evpn
send-community
send-community extended
route-map RETAIN-NEXT-HOP out
neighbor 10.1.88.1
remote-as 65012
address-family ipv4 unicast
neighbor 10.2.88.2
remote-as 65012
address-family ipv4 unicast
neighbor 10.3.88.3
remote-as 65034
address-family ipv4 unicast
neighbor 10.4.88.4
remote-as 65034
address-family ipv4 unicast
neighbor 192.168.77.1
inherit peer MULTI-SITE-OVERLAY-PEERING
remote-as 65012
address-family l2vpn evpn
rewrite-evpn-rt-asn
neighbor 192.168.77.2
inherit peer MULTI-SITE-OVERLAY-PEERING
remote-as 65012
address-family l2vpn evpn
rewrite-evpn-rt-asn
neighbor 192.168.77.3
inherit peer MULTI-SITE-OVERLAY-PEERING
remote-as 65034
address-family l2vpn evpn
rewrite-evpn-rt-asn
neighbor 192.168.77.4
inherit peer MULTI-SITE-OVERLAY-PEERING
remote-as 65034
address-family l2vpn evpn
rewrite-evpn-rt-asn
Hi Toni,
ReplyDeletewelcome back!
Michael
Thanks Michael!
DeleteToni
ReplyDeletethese days I am think of some questions and I hope you can help for answering.
1. apart from layer two extension and real physical redundany, vxlan actual help to get rid of L3 routing process, I believe it makes vxlan faster than traditional 3 Level network. but how faster it could be comparing to traditional routing? Or what kind of delay requests or traffic volumne request push us to change traditional network into vxlan.
2. I went through some reading and notice that, someone mentions, if vxlan network hsa the mtu of 1500+vxlan header, the actual thoughtput will be half of that a vlan can achieve, regardless of the physical redundancy vxlan has. while by increasing the mtu to 9000, vxlan shows the power and can reach almost twice the vlan can achieve. in your experience, do you notice about it?
3. vxlan is really cpu comsumping. I notice you are using n7k,so what is the cpu comsumption of each vetp tunnel on N7k and how many could it support. let us get ird of evpn and only values that for pure vxlan.
Really glad to see you again in the topic.
Yours Sincerely
Michael
Hi Michael,
DeleteI think that the driver for moving from traditional network solution (Spanning-Tree as an L2 Control Plane protocol) to VXLAN solution is more related to flexibility and reliability offered by VXLAN than speed. Here are just a few reasons why I like VXLAN:
A) No need for virtualized devices with common Control-/Data Plane for gateway function.
B) Multi-destination traffic over IP-Only Underlay
C) VLAN available where needed (by stitching not stretching)
D) Could be also the future solution for SD-WAN???
But as a tradeoff, from the Control Plane perspective, BGP EVPN VXLAN solution is more complex than STP-based solution (my personal opinion).
Unfortunately, I do not have any figures concerning forwarding rate or CPU usage. However, after Control Plane is converged (Underlay/Overlay/NVE peering...) and Forwarding tables are up to date, then the actual data forwarding is easy, just forwarding table lookup and encapsulation.
In this post, I am using NX-OSv 9.2(3).
And by the way, it is really nice to see that you are still reading my posts :-)
Cheers - Toni
Another great post, thanks. (yes, still reading it :))
ReplyDeleteI wonder how many "super-spine" architectures are actually implemented in production networks. It has a lot of boxes :)
The Multi-site BGW function can run on the spine switches effectively collapsing the two layers. Also, N7k, N3600-R and N9500-R support MPLS hand-off, a neat way to marry VxLAN/EVPN with MPLS/VPNv4 in a single box. This way the Multi-Site architecture can have only three layers - Leafs, Spines(with BGW) and BorderPEs.
Hi Toni, thanks for your excellent and in-depth posts on this blog.
ReplyDeleteI labbed this with Nexus 9000v switches, in my lab I replaced the core switch/route server with a small MPLS L3VPN and direct EBGP peerings between BGWs for the EVPN.
An issue I see, is that even though the BGW1 and BGW2 shared VIP is seen as the next hop in BGW3's "show l2route mac all" output for MACs on the BGW1/2 site, the traffic is replicated and duplicate packets sent to the BGW1 and BGW2 PIPs. It's being forwarded as though it was BUM traffic, and BGW3 is forwarding it as an unknown unicast. This is also resulting in duplicate packets arriving at the devices connected to the BGW1/2 leaf switches. This seems to be the same for both 9.3.1 and 9.2.3 images.
This only impacts intra-VLAN traffic, inter-VLAN traffic sent between sites is forwarded towards the VIP as expected.
I'm wondering if you, or any of your readers have seen similar behaviour when capturing on the DCI interface of BGW3? I'm trying to figure out if this is misconfiguration on my part or a limitation of the virtual switches.
I am also doing a similar lab. Basically no RS just 4 back-to-back BGWs in a square configuration. And I am also seeing the same problem, except for me, packets is blocked at BGW when it is not DF for that vlan. So ping only have 25% success rate. On the DCI link of DF BGW which didn't block the unicast traffic, I can see source and destination IP is PIP instead of VIP.
DeleteAs I answered to Matt, this might be related to NX-OSv. If I remember correctly the preferred Back-to-Back design requires also cross connection between BGWs.
DeleteI tested with 9.3.3 images with same issue, from packet capture I found the unicast will send to BGW1 and BGW2 because BGW1 and BGW2 are using same VIP. underlay is using ospf equal path, so from the underlay the packet will send one to BGW1, send the other one to BGW2. Because of BGW1 is primary, BGW2 will discard packet, so that's why you can see some packets are sent and some are not.
Deletethe solution I did is making BGW2 VIP loopback as ospf cost 2 using "ip ospf cost 2". so underlay will always choose BGW1 as primary path.
I found change ospf cost is not the solution. And this is the bug for nexus 9000v simulator because nexus 9000v does not have ASIC chipset, so the DF rules and split horizon rule are not taking in place. Refer below link:
Deletehttps://www.reddit.com/r/networking/comments/cruk37/nexus_9000v_vxlan_evpn_multisite_duplicate_looped/
Thanks Andy for sharing the link.
DeleteMotu Aur Patlu Ki Jodi
ReplyDeleteHi Matt, The problem might be related to NX-OSv. Have you tested it with physical devices?
ReplyDeleteHi Toni,
ReplyDeleteJust came across your blog recently. Alot of VXLAN-EVPN concepts became clearer after going through.
Please do you have any plans to do a write up on TRM ?
I am happy that you found this blog informative. I am working on with TRM document and it is 95% ready. It should be out within couple of days. I used to inform about new posts in Linkedin, so if we are not already connected, just sent me an invitation.
DeleteI have add a short desctiption about TRM in this blog. The whole chapter (36 pages) is included in VXLAN book available via Leanpub.com and soon also via Amazon (eBook and hard copy)
DeleteThanks a lot Toni.
DeleteI was expecting to get a notification of your reply. Just got back here to see you had replied a while back.
I've sent you linkedin request and just got a copy of your book as well.
That is what is meant by being creative. If you want to get more interesting details about sd-wan providers, head over to the website.
ReplyDeletethank you for Ahriman such a great post I am very happy to be here and read about this post
ReplyDeletethepiratebay mirror proxy
the pirate bay alternatives
torlock mirror proxy
torrentz2 mirror proxy
This comment has been removed by the author.
ReplyDeleteHi Guys,
ReplyDeletecan i work with tha feature distributed anycast (SVI, port mode access/trunk) on Anycast BGW switch?
Thank you,
Daniel Lima
I really learned a lot from your posts, thanks Toni. Got a copy of your book to support your great work.
ReplyDeleteI'm still new with the multisite configuration, after having tested your scenario with N9Kv (and it worked great) I tried to lab another scenario where I have a simple fabric: Leaf --- Spine --- BL --- ext router and I want to configure BGW on the BL. I noticed that as soon as I configure "evpn multisite border-gateway ___" on the BL, I lost the connectivity to the external networks (cannot ping anymore from LEAF to ext router). I don't know if technically we cannot configure BGW and BL on the same node or I missed a specific configuration to let it happens or it's simply a N9kv bug. I don't have any real switch to test and I probably should learn more but if you, or any readers have already tested this scenario and can share some experience it would be great. Thanks a lot.
Hi Toni,
ReplyDeleteHow is the RD generated for RT 4, especially the value after colon ?
192.168.77.1:27001 ? where does 27001 comes from ?