Wednesday, 19 June 2019

EVPN ESI Multihoming Part III: Data Flows and link failures


Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

This chapter explains the EVPN ESI Multihoming data flows. The first section explains the Intra-VNI flows (L2VNI) Unicast traffic and Second section introduces BUM traffic. Figure 1-1 shows the topology and addressing schemes used in this chapter. Complete configurations of Leaf-102 and Leaf-103 can be found at the end of the document.



Figure 1-1: Topology an addressing scheme.

Saturday, 8 June 2019

EVPN ESI Multihoming- Part II: Fast Convergence and Load Balancing


Now you can also download my VXLAN book from the Leanpub.com 


This chapter introduces the BGP EVPN Route Type1- Ethernet Auto-Discovery (Ethernet A-D) routes. The first section explains the Ethernet A-D per Ethernet Segment (ES) routes, which is mainly used for Fast Convergence. The second section discusses Ethernet A-D per EVI/ES route, which in turn is used for Load Balancing (also called Aliasing/Backup Path).



Figure 1-1: Ethernet A-D per Ethernet Segment (ES) route.

Wednesday, 29 May 2019

EVPN ESI Multihoming - Part I: EVPN Ethernet Segment (ES)


Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

This chapter introduces the standard based EVPN ESI Multi-homing solution in BGP EVPN VXLAN Fabric. It starts by explaining the mechanism of how CE device (Access switch or host) can be attached to two or more independent PE devices (Leaf switches) by using Port-Channel. This section discusses the concept of Ethernet Segment and Port-Channel. Next, this chapter explains how the BGP EVPN Route-Type 4 (Ethernet Segment Route) is for creating the redundancy group between the switches that share the ES. This section introduces the BGP EVPN Route-Type 4 NLRI address format. In addition, this chapter shows how switches belonging to the same redundancy group selects the Designated Forwarder (DF) for BUM traffic among themselves. Also, this chapter introduces the VLAN Consistency Check by using Cisco Fabric Service over IP (CFSoIP). The last two sections explain the Layer 2 Gateway Spanning-Tree (L2G-STP) mechanism and Core-Link Tracking system.

Part II introduces the BGP EVPN Route-Type 1 (Ethernet Auto-Discovery) and how it is used for convergence. Part III discusses the data flows between the hosts in normal and failure situation. Part II and III will be published later.



Figure 1-1: The VXLAN EVPN Multi-homing topology and addressing scheme.

Thursday, 9 May 2019

VXLAN Underlay Routing - Part V: Multi-AS eBGP

Now you can also download my VXLAN book from the Leanpub.com 
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

eBGP as an Underlay Network Routing Protocol: Multi-AS eBGP

This post introduces the Multi-AS eBGP solution in VXLAN Fabric. In this solution, a single AS number is assigned to all spine switches while each leaf switches (or pair of leaf switches) have unique BGP AS number. This solution neither requiresallowas-in” command in leaf switches nor “disable-peer-check” command in the spine switches, which are required in Two-AS solution. The “retain-route-target all” command and BGP L2VPN EVPN address family peer-specific route-map with an option “set ip next-hop-unchanged” is needed on the spine switch. This post also explains the requirements and processes for L2 EVPN VNI specific route import policy when automated derivation of Route-Targets is used. The same IP/MAC address scheme is used in this chapter than what was used in the previous post “VXLAN Underlay Routing - Part IV: Two-AS eBGP” but the Leaf-102 now belongs to BGP AS 65001.


Figure 1-1: The MAC/IP addressing scheme and eBGP peering model.

Sunday, 5 May 2019

VXLAN Underlay Routing - Part IV: Two-AS eBGP


Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

eBGP as an Underlay Network Routing Protocol: Two-AS eBGP

This post explains the Two-AS eBGP solution in VXLAN Fabric, where there is single AS Area for all Leaf switches and other AS Area for all Spine switches. It also discusses how the default operating model used in eBGP peering has to be modified in order to achieve a routing solution required by VXLAN Fabric. These modifications are mainly related to BGP loop prevention model and BGP next-hop path-attribute processing.

Figure 1-1 illustrates the topology used in this chapter. Leaf-101 and Leaf-102 both belong to BGP AS 65000, while Spine-11 belongs to BGP AS 65099. Loopback interfaces used for Overlay Network BGP peering (L100) and for NVE peering (L50) are advertised over BGP AFI IPv4 peering (Underlay Network Control Plane). Host MAC/IP address information is advertised over BGP AFI L2VPN EVPN peering (Overlay Network Control Plane). Ethernet frames between host Café and Abba are encapsulated with a VXLAN tunnel header where the source and destination IP addresses used in the outer IP header are taken from NVE1 interfaces.





Figure 1-1: High-Level operation of VXLAN Fabric

Thursday, 11 April 2019

VXLAN Underlay Routing - Part III: Internal BGP

Now you can also download my VXLAN book from the Leanpub.com 
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

BGP as an Underlay Network Routing Protocol


Using BGP instead of OSPF or IS-IS for Underlay Network routing in BGP VXLAN fabric simplifies the Control Plane operation because there is only one routing protocol running on fabric switches. However, there are some tradeoffs too. The BGP only solution requires at least two BGP Address-Families (afi) per switch, one for the Underlay (IPv4 Unicast) and one for the Overlay (L2VPN EVPN). In addition, if Border Leaf switches are connected to MPLS network, there is a third BGP afi for VPNv4. In some cases, multi-afi BGP makes troubleshooting a bit more complex compared to a single-afi solution where BGP is used only in Overlay Network. The focus of this chapter is VXLAN fabric Underlay Network with iBGP routing.


Figure 1-1: High-Level operation of VXLAN Fabric

Sunday, 24 March 2019

VXLAN Underlay Routing - Part II: OSPF and IS-IS from the VXLAN network perspective

Now you can also download my VXLAN book from the Leanpub.com 


This chapter discusses the differences between the OSPF and the IS-IS from the Network Virtualization Overlay (NVO) solution, especially from the VXLAN network perspective. First, this chapter shortly introduces some of the differences between these two protocols (terminology, timers, and LSAs). Next, this chapter explains the default behavior of the Shortest Path First (SPF) by explaining first the IS-IS reaction when Stub Network goes down. Then the same event is explained from the OSPF perspective. This chapter also introduces OSPF reaction when an Incremental SPF (iSPF) is enabled, and the interface on a link that is not belonging to the Shortest-Path Tree (SPT) goes down. The same event is also discussed with and without iSPF concerning IS-IS. 

Figure 1-1: Comparison of OSPF and IS-IS.

Sunday, 3 March 2019

VXLAN Underlay Routing - Part I: OSPF and Dijkstra/SPF algorithm

Now you can also download my VXLAN book from the Leanpub.com 
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

The role of the Underlay Network


Underlay Network the main job from the EVPN VXLAN Network Virtualization Overlay (NVO) solutions perspective is to offer resilient IP connectivity between the Network Virtualization Edge Interfaces (NVE) on VXLAN Tunnel End Point (VTEP) devices. In addition, the Underlay Network can be used for BUM traffic forwarding (Broadcast, Unknown Unicast, and Multicast) though this solution requires a Multicast Routing enabled on an Underlay Network. The common routing protocols choices for VXLAN Underlay Network are OSPF, IS-IS which are Link State Protocols and BGP which in turn is Path Vector Protocol. The focus of this chapter is the Dijkstra/Shortest Path First (SPF) algorithm that Link State Protocols uses for calculating the Shortest-Path Tree. Figure 1-1 shows the Link type-1 (point-to-point) and Link Lype-3 (Stub Network) Routers LSA originated by Leaf-101, Leaf-102, Spine-11, and Spine-12. In addition, figure 1-1 illustrates how routers form a topology based on received LSAs.

Figure 1-1: Examples of Link type-1 (p2p) and Link-Type 3 (Stub) Router LSAs.

Monday, 11 February 2019

Consideration when connection an MSTP Region with another MSTP Region or with a Rapid PVST+ Domain


Multiple Spanning-Tree maps the set of VLANs into MST instances (MSTI) which each has an instance-specific STP root switch. In addition, there is a region Internal Spanning Tree (IST) aka MSTI0 that is used for exchanging MSTP BPDUs for all MSTIs. IST BPDUs (capture 1-1) carries all the STP information inside an MSTP Region.

First, MSTP BPDU includes information related to IST such as switch Bridge Id, Root Bridge Id for the Common and Internal Spanning Tree Root (CIST Root) and a timer values (Max Age, Hello Time and Forward Delay). The timer values are used in each MSTP Instances.

Second, The MSTP BPDU carries an MST extension header that includes the name of the MST Region, its Config revision number and a hash value. The hash value is derived from the VLAN to MSTI mapping information, the actual 1:1 VLAN to MSTI information is not carried within BPDU packets. There is also information about the CIST Regional (Internal) Root switch. The difference between the CIST Root and the CIST Regional Root is that the CIST Root is used as an STP Root for all regions when there are multiple MSTP regions connected with each other. The CIST Regional Root in turns is used as an MST Region IST root. MST Extension header carries M-records, which contains the MST Instance specific information such as MSTI Regional Root that is used to create Instance specific loop-free Layer 2 path inside a region. The root election process is based on the Proposal/Agreement messages just like in the Rapid PVST+/RSTP.

Friday, 28 December 2018

VXLAN Part XV: Analysis of the BGP EVPN Control Plane Operation

Document Status: Unfinished
Edited: Monday, 7 January 2019

This chapter covers the following topics:

MAC address learning process (Intra-VNI switching): This section describes how the local VTEP switch learns the MAC addresses of its’ directly connected hosts from the ingress frame and installs the information into the MAC VRF in Layer 2 Routing Information Base (L2RIB) by the L2 forwarding component (L2FWDER). This section also shows how the local VTEP switch advertises the MAC address information to the remote VTEP switch by using BGP EVPN Route Type 2 advertisement (MAC Advertisement Route) and how the Remote VTEP switch installs information into MAC VRF in L2RIB and from there into MAC address table. Intra-L2VNI (Switching) Data Plane operation is explained at the end of the section with various frame capture examples. The white “MAC line” represents these processes in figure 7-1.

MAC-IP address learning process (ARP for Intra-VNI switching and ): This section gives a detailed description how the local VTEP switch learns the IP addresses of its’ locally connected hosts from ARP messages generated by the host and how the Host Mobility Manager component (HMM) installs the information into the IP VRF. This section also shows how the local VTEP switch advertises the IP address information to the remote VTEP switch by using BGP EVPN Route Type 2 (MAC Advertisement Route) advertisement and how the remote VTEP switch installs this information into IP VRF in L2RIB as well as into L3RIB of VRF TENANT77. In addition, this section explains how the ARP Suppression mechanism use MAC-IP binding information to reduce BUM (Broadcast, Unknown Unicast, and Multicast) traffic in VXLAN Fabric. The grey “IP line” represents these processes in figure 7-1.


Prefix advertisement: This section covers how the local VTEP switch redistributes its Anycast Gateway (AGW) subnets into BGP and advertises this information to the remote VTEP switch by using BGP EVPN Route Type 5 (IP Prefix Route) advertisement. This section also explains how the information is used to discover silent hosts. This section also describes how the remote VTEP installs the route from the BGP into local L3RIB. The black “Prefix line” represents these processes in figure 7-1.

Figure 1-1: BGP EVPN Control Plane Operational Overview.


Monday, 19 November 2018

VXLAN Part XIV: Control Plane Operation in BGP EVPN VXLAN Fabric

Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

The focus of this post is a Control Plane operation in VXLAN fabric. First, we are going to see how the local switch Leaf-101 learns and installs the MAC address and IP address information of host Beef into databases. Then, we are going to see how Leaf-101 advertises the information to remote Leaf-102 by using BGP EVPN. After that, we are going to see how remote switch Leaf-102 receives the BGP EVPN Update and import routes into MAC-VRF and from there into databases. Note that in Leaf-101 the VLAN 10 is attached to VNI 10000 while VLAN 20 is attached to the same vn-segment in Leaf-102.


Figure 14-1: IP- and MAC addressing and VLAN-to-VN-segment mapping.

Thursday, 18 October 2018

VXLAN Part XIII: Firewall Implementation to VXLAN Fabric

Now you can also download my VXLAN book from the Leanpub.com 
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

In this post, I am going to show how to implement Active/Standby FW Cluster into VXLAN Fabric. Figure 13-1 shows the logical view of example setup, where we have two server networks: 192.168.30.0/24 (VLAN30 - protected) and 192.168.11.0/24 (VLAN10 - non-protected). We also have an Active/Standby FW Cluster connected to dedicated Service Leaf vPC Cluster (Leaf-102 and Leaf-103). Anycast Gateway (AGW) for the network 192.168.11.0/24 resides in the Server Leaf-101 while the Gateway for the protected network 192.168.30.0/24 resides in the Firewall (Inside Zone). Protected hosts in VLAN 30 use the VXLAN Fabric only as an L2 transport network. For simplicity, the Spine switch is not shown in the figure 13-1.

Figure 13-1: Example Topology and IP addressing

Tuesday, 25 September 2018

VXLAN Part XII: Routing Exchange: intra/inter-L2VNI, EVPN-to-IP, EVPN-to-VPNv4

Edited: 25.9.2018 | Toni Pasanen

We are using BGP EVPN (MP-BGP AFI25/SAFI70-EVPN) to exchange MAC-IP (Type-2) and Prefix (Type-5) reachability information inside the VXLAN fabric between the VTEPs. Each BGP UPDATE message sent by VTEP includes L2VNI/L3VNI specific Route-Target (RT) Extended Community Path-Attribute. Based on these RTs, routes are imported to correct L2VNI/L3VNIs. Each L2VNI has VNI-specific RT, which is used for intra-VNI communication. Inside the Tenant, there is a common, Tenant specific RT used for inter-L2VNI communication.

The routing information between the external networks cannot rely only on Route-Targets. We could have an external connection over IPv4 networks by using eBGP or connection over the MPLS network by using MP-BGP (AFI1-IPv4/SAFI128-VPNv4). All of these three BGPs (BGP, BGP EVPN, and BGP VPNv4) use dissimilar address representation format in BGP updates. Let’s use the IPv4 address 192.168.100.1/24 as an example.

IPv4:    192.168.100.1/24
VPNv4: [RD]:192.168.100.1/11/112
EVPN:   [RD]:[Route-Type]:[ESI]:[MAC length]:[MAC]:[IP length]:192.168.100.1/272

Because of the different representation mode for the same address, we need to change the address format while exchanging the routing updates between BGP domains over the VXLAN Border-PE.

I am going to use the topology shown in figure 12-1 to do the deep dive to this subject.

Figure 12-1: Example Topology and IP addressing

Tuesday, 4 September 2018

VXLAN Part XI: Using vPC Peer Link as an Underlay Backup Path

Edited: Wednesday, 5 September 2018 | Toni Pasanen


This short post shows how VTEP Leaf switch can use a vPC peer link as a backup path to Spine switch in a situation where the Leaf switch loses connection to the Spine switch. This is recommended redundancy model when using vPC in VXLAN BGP EVPN fabric. Just like in my previous posts, I am using only one Spine switch to keep things as simple as possible.


Figure 11-1: Example Topology and IP addressing

Friday, 24 August 2018

VXLAN Part X: Recovery issue when BGP EVPN peering uses the same loopback interface as a source than VXLAN NVE1 interface

Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

Does it really matter if the NVE1 interface of a VTEP switch and BGP EVPN use the same Loopback interface IP address as a source or should there be a dedicated Loopback interface for BGP EVPN? In this post, I am trying to give an answer by showing the difference in BGP EVPN convergence process for both of these design options.

  Figure 10-1: VXLAN BGP EVPN Example Topology and IP addressing

Sunday, 19 August 2018

VXLAN Part IX: VXLAN BGP EVPN - vPC

This post describes how the Multi-Chassis Link Aggregation Group (MC-LAG) technology using virtual PortChannel (vPC) works in a VXLAN BGP EVPN fabric. I will first go through the vPC configuration with a short explanation and then I’ll show the Control- and Data Plane operation from VXLAN BGP EVPN perspective by using various show commands and packet capture. I am also going to explain the “Advertising VIP/PIP” options using the external connection. Example topology is shown in Figure 9-1. Complete configurations of vPC peer switches Leaf-102 and Leaf-103 (Leaf-101 and Spine-11 configuration are the same than in the previous post) can be found from the Appendix 1 at the end of the post.




Figure 9-1: VXLAN BGP EVPN vPC Example Topology and IP addressing

Tuesday, 5 June 2018

VXLAN Part VIII: VXLAN BGP EVPN – External Connection

This post shows how to connect an external network to our existing VXLAN fabric. From the two models, Border Leaf and Border Spine, I am going to use Border Leaf model since I do not want to install additional services to the Spine switches, which already hosts both Multicast Rendezvous Point (RP) and BGP Route Reflector (BGP RR). We could, of course, implement Border to Spine switches without having any performance issue, but then the Spine switches become VTEP switches, which means that they will do a VXLAN encapsulation and decapsulation. Keep it in mind that if we scale out the Spine layer by adding a new Spine switch, we also need to scale out the external connection. With the Border Leaf solutions, we get a dedicated border zone.
I am using full-mesh BGP model instead of a U-shaped model for a couple of reasons, it is the most resilient option, there will be no black holing in event of one link failure and there is no need for iBGP peering between Border Leaf switches.


Figure 8-1 shows the topology which we are going to build.

Figure 8-1: VXLAN Fabric external connection basic setup.

Sunday, 6 May 2018

VXLAN Part VII: VXLAN BGP EVPN –Control Plane operation

Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

In my previous post “VXLAN Part VI: VXLAN BGP EVPN – Basic Configurations”, I have shown how to configure the VXLAN BGP EVPN on Nexus 9000v. This post is about BGP EVPN Control Plane operation.

Figure 7-1 represents the logical structure of the example VXLAN fabric. BGP peering is established between the VTEP Leaf switches and the Spine-11 switch, which is BGP Route Reflector (not shown in figure 7-1). Both VTEP Leaf switches have a local VRF context TENANT77 that has VNI 10077 (L3VNI) attached to it and used for routing between the hosts in different vlan/vn-segment. Hosts Café and Beef are connected to vlan 10 (192.168.11.0/24), which in turns is attached to vn-segment 10000 (L2VNI). Hosts Abba and Babe are connected to vlan 20 (192.168.12.0/24), which in turns is attached to vn-segment 20000 (L2VNI). We are using auto-generated RD/RT values and ARP-suppression in both L2VNIs. Physical topology and the configurations of the switches is presented in Appendix 1 at the end of the document. For simplicity, I have used only one uplink in each VTEP switches.




 Figure 7-1: VXLAN Fabric logical structure

Tuesday, 17 April 2018

VXLAN Part VI: VXLAN BGP EVPN – Basic Configurations

In my previous post “VXLAN Part V: Flood and Learn”, I have shown, how VXLAN works without Control Plane protocol. In this post, I am going to show how to configure BGP EVPN on VXLAN fabric.

In Figure 1, you can see the high-level overview of our example VXLAN fabric design. We have one vrf context (=tenant) TENANT77 spread over the two VTEPs. We also have two VLANs; VLAN 10 (attached to L2VNI 10000) and VLAN 20 (attached to L2VNI 20000). On each VTEPs there are two connected hosts (Cafe and Abba on VTEP-101, Beef, and Babe on VTEP-102). The cross VLAN flows between the hosts in different VTEPs is routed over the L3VNI 10077. The reason why I start with the configurations is that I want to use show commands as well as Wireshark captures while explaining the theory in my next post.


Note! I am using Cisco VIRL with Nexus 9000v (nxos.7.0.3.I7.1.bin).


Figure 1: VXLAN BGP EVPN

Updated: February 21.4.2018 | Toni Pasanen

Monday, 26 March 2018

VXLAN Part V: Flood and Learn

In this chapter, I am going to show how the VXLAN Flood & Learn mac learning process works. I am going to ping from Host-1 to Host-2 and then walk through the Flood and Learn process starting from ARP request. I am using the same Lab that was used in VXLAN Part-IV. Configurations can be found from the VXLAN Part-1 and Part IV.

Figure 1: VXLAN Flood & Learn topology