Friday 28 December 2018

VXLAN Part XV: Analysis of the BGP EVPN Control Plane Operation

Document Status: Unfinished
Edited: Monday, 7 January 2019

This chapter covers the following topics:

MAC address learning process (Intra-VNI switching): This section describes how the local VTEP switch learns the MAC addresses of its’ directly connected hosts from the ingress frame and installs the information into the MAC VRF in Layer 2 Routing Information Base (L2RIB) by the L2 forwarding component (L2FWDER). This section also shows how the local VTEP switch advertises the MAC address information to the remote VTEP switch by using BGP EVPN Route Type 2 advertisement (MAC Advertisement Route) and how the Remote VTEP switch installs information into MAC VRF in L2RIB and from there into MAC address table. Intra-L2VNI (Switching) Data Plane operation is explained at the end of the section with various frame capture examples. The white “MAC line” represents these processes in figure 7-1.

MAC-IP address learning process (ARP for Intra-VNI switching and ): This section gives a detailed description how the local VTEP switch learns the IP addresses of its’ locally connected hosts from ARP messages generated by the host and how the Host Mobility Manager component (HMM) installs the information into the IP VRF. This section also shows how the local VTEP switch advertises the IP address information to the remote VTEP switch by using BGP EVPN Route Type 2 (MAC Advertisement Route) advertisement and how the remote VTEP switch installs this information into IP VRF in L2RIB as well as into L3RIB of VRF TENANT77. In addition, this section explains how the ARP Suppression mechanism use MAC-IP binding information to reduce BUM (Broadcast, Unknown Unicast, and Multicast) traffic in VXLAN Fabric. The grey “IP line” represents these processes in figure 7-1.


Prefix advertisement: This section covers how the local VTEP switch redistributes its Anycast Gateway (AGW) subnets into BGP and advertises this information to the remote VTEP switch by using BGP EVPN Route Type 5 (IP Prefix Route) advertisement. This section also explains how the information is used to discover silent hosts. This section also describes how the remote VTEP installs the route from the BGP into local L3RIB. The black “Prefix line” represents these processes in figure 7-1.

Figure 1-1: BGP EVPN Control Plane Operational Overview.


Monday 19 November 2018

VXLAN Part XIV: Control Plane Operation in BGP EVPN VXLAN Fabric

Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

The focus of this post is a Control Plane operation in VXLAN fabric. First, we are going to see how the local switch Leaf-101 learns and installs the MAC address and IP address information of host Beef into databases. Then, we are going to see how Leaf-101 advertises the information to remote Leaf-102 by using BGP EVPN. After that, we are going to see how remote switch Leaf-102 receives the BGP EVPN Update and import routes into MAC-VRF and from there into databases. Note that in Leaf-101 the VLAN 10 is attached to VNI 10000 while VLAN 20 is attached to the same vn-segment in Leaf-102.


Figure 14-1: IP- and MAC addressing and VLAN-to-VN-segment mapping.

Thursday 18 October 2018

VXLAN Part XIII: Firewall Implementation to VXLAN Fabric

Now you can also download my VXLAN book from the Leanpub.com 
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

In this post, I am going to show how to implement Active/Standby FW Cluster into VXLAN Fabric. Figure 13-1 shows the logical view of example setup, where we have two server networks: 192.168.30.0/24 (VLAN30 - protected) and 192.168.11.0/24 (VLAN10 - non-protected). We also have an Active/Standby FW Cluster connected to dedicated Service Leaf vPC Cluster (Leaf-102 and Leaf-103). Anycast Gateway (AGW) for the network 192.168.11.0/24 resides in the Server Leaf-101 while the Gateway for the protected network 192.168.30.0/24 resides in the Firewall (Inside Zone). Protected hosts in VLAN 30 use the VXLAN Fabric only as an L2 transport network. For simplicity, the Spine switch is not shown in the figure 13-1.

Figure 13-1: Example Topology and IP addressing

Tuesday 25 September 2018

VXLAN Part XII: Routing Exchange: intra/inter-L2VNI, EVPN-to-IP, EVPN-to-VPNv4

Edited: 25.9.2018 | Toni Pasanen

We are using BGP EVPN (MP-BGP AFI25/SAFI70-EVPN) to exchange MAC-IP (Type-2) and Prefix (Type-5) reachability information inside the VXLAN fabric between the VTEPs. Each BGP UPDATE message sent by VTEP includes L2VNI/L3VNI specific Route-Target (RT) Extended Community Path-Attribute. Based on these RTs, routes are imported to correct L2VNI/L3VNIs. Each L2VNI has VNI-specific RT, which is used for intra-VNI communication. Inside the Tenant, there is a common, Tenant specific RT used for inter-L2VNI communication.

The routing information between the external networks cannot rely only on Route-Targets. We could have an external connection over IPv4 networks by using eBGP or connection over the MPLS network by using MP-BGP (AFI1-IPv4/SAFI128-VPNv4). All of these three BGPs (BGP, BGP EVPN, and BGP VPNv4) use dissimilar address representation format in BGP updates. Let’s use the IPv4 address 192.168.100.1/24 as an example.

IPv4:    192.168.100.1/24
VPNv4: [RD]:192.168.100.1/11/112
EVPN:   [RD]:[Route-Type]:[ESI]:[MAC length]:[MAC]:[IP length]:192.168.100.1/272

Because of the different representation mode for the same address, we need to change the address format while exchanging the routing updates between BGP domains over the VXLAN Border-PE.

I am going to use the topology shown in figure 12-1 to do the deep dive to this subject.

Figure 12-1: Example Topology and IP addressing

Tuesday 4 September 2018

VXLAN Part XI: Using vPC Peer Link as an Underlay Backup Path

Edited: Wednesday, 5 September 2018 | Toni Pasanen


This short post shows how VTEP Leaf switch can use a vPC peer link as a backup path to Spine switch in a situation where the Leaf switch loses connection to the Spine switch. This is recommended redundancy model when using vPC in VXLAN BGP EVPN fabric. Just like in my previous posts, I am using only one Spine switch to keep things as simple as possible.


Figure 11-1: Example Topology and IP addressing

Friday 24 August 2018

VXLAN Part X: Recovery issue when BGP EVPN peering uses the same loopback interface as a source than VXLAN NVE1 interface

Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

Does it really matter if the NVE1 interface of a VTEP switch and BGP EVPN use the same Loopback interface IP address as a source or should there be a dedicated Loopback interface for BGP EVPN? In this post, I am trying to give an answer by showing the difference in BGP EVPN convergence process for both of these design options.

  Figure 10-1: VXLAN BGP EVPN Example Topology and IP addressing

Sunday 19 August 2018

VXLAN Part IX: VXLAN BGP EVPN - vPC

This post describes how the Multi-Chassis Link Aggregation Group (MC-LAG) technology using virtual PortChannel (vPC) works in a VXLAN BGP EVPN fabric. I will first go through the vPC configuration with a short explanation and then I’ll show the Control- and Data Plane operation from VXLAN BGP EVPN perspective by using various show commands and packet capture. I am also going to explain the “Advertising VIP/PIP” options using the external connection. Example topology is shown in Figure 9-1. Complete configurations of vPC peer switches Leaf-102 and Leaf-103 (Leaf-101 and Spine-11 configuration are the same than in the previous post) can be found from the Appendix 1 at the end of the post.




Figure 9-1: VXLAN BGP EVPN vPC Example Topology and IP addressing

Tuesday 5 June 2018

VXLAN Part VIII: VXLAN BGP EVPN – External Connection

This post shows how to connect an external network to our existing VXLAN fabric. From the two models, Border Leaf and Border Spine, I am going to use Border Leaf model since I do not want to install additional services to the Spine switches, which already hosts both Multicast Rendezvous Point (RP) and BGP Route Reflector (BGP RR). We could, of course, implement Border to Spine switches without having any performance issue, but then the Spine switches become VTEP switches, which means that they will do a VXLAN encapsulation and decapsulation. Keep it in mind that if we scale out the Spine layer by adding a new Spine switch, we also need to scale out the external connection. With the Border Leaf solutions, we get a dedicated border zone.
I am using full-mesh BGP model instead of a U-shaped model for a couple of reasons, it is the most resilient option, there will be no black holing in event of one link failure and there is no need for iBGP peering between Border Leaf switches.


Figure 8-1 shows the topology which we are going to build.

Figure 8-1: VXLAN Fabric external connection basic setup.

Sunday 6 May 2018

VXLAN Part VII: VXLAN BGP EVPN –Control Plane operation

Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

In my previous post “VXLAN Part VI: VXLAN BGP EVPN – Basic Configurations”, I have shown how to configure the VXLAN BGP EVPN on Nexus 9000v. This post is about BGP EVPN Control Plane operation.

Figure 7-1 represents the logical structure of the example VXLAN fabric. BGP peering is established between the VTEP Leaf switches and the Spine-11 switch, which is BGP Route Reflector (not shown in figure 7-1). Both VTEP Leaf switches have a local VRF context TENANT77 that has VNI 10077 (L3VNI) attached to it and used for routing between the hosts in different vlan/vn-segment. Hosts Café and Beef are connected to vlan 10 (192.168.11.0/24), which in turns is attached to vn-segment 10000 (L2VNI). Hosts Abba and Babe are connected to vlan 20 (192.168.12.0/24), which in turns is attached to vn-segment 20000 (L2VNI). We are using auto-generated RD/RT values and ARP-suppression in both L2VNIs. Physical topology and the configurations of the switches is presented in Appendix 1 at the end of the document. For simplicity, I have used only one uplink in each VTEP switches.




 Figure 7-1: VXLAN Fabric logical structure

Tuesday 17 April 2018

VXLAN Part VI: VXLAN BGP EVPN – Basic Configurations

In my previous post “VXLAN Part V: Flood and Learn”, I have shown, how VXLAN works without Control Plane protocol. In this post, I am going to show how to configure BGP EVPN on VXLAN fabric.

In Figure 1, you can see the high-level overview of our example VXLAN fabric design. We have one vrf context (=tenant) TENANT77 spread over the two VTEPs. We also have two VLANs; VLAN 10 (attached to L2VNI 10000) and VLAN 20 (attached to L2VNI 20000). On each VTEPs there are two connected hosts (Cafe and Abba on VTEP-101, Beef, and Babe on VTEP-102). The cross VLAN flows between the hosts in different VTEPs is routed over the L3VNI 10077. The reason why I start with the configurations is that I want to use show commands as well as Wireshark captures while explaining the theory in my next post.


Note! I am using Cisco VIRL with Nexus 9000v (nxos.7.0.3.I7.1.bin).


Figure 1: VXLAN BGP EVPN

Updated: February 21.4.2018 | Toni Pasanen

Monday 26 March 2018

VXLAN Part V: Flood and Learn

In this chapter, I am going to show how the VXLAN Flood & Learn mac learning process works. I am going to ping from Host-1 to Host-2 and then walk through the Flood and Learn process starting from ARP request. I am using the same Lab that was used in VXLAN Part-IV. Configurations can be found from the VXLAN Part-1 and Part IV.

Figure 1: VXLAN Flood & Learn topology

Tuesday 20 March 2018

VXLAN Part IV: The Underlay Network – Multidestination Traffic: PIM BiDir

My Last post, VXLAN Part III, introduces VXLAN Fabric L2VNI service with Anycast-RP PIM (RFC4610 and RFC 7761). In this chapter, I will show how the PIM BiDir (RFC5015) with Phantom-RP can be used for the same purpose. I will use configurations, show commands and Wireshark captures to explain the theory part.

Figure 1: Example VIRL topology

Sunday 18 March 2018

VXLAN Part III: The Underlay Network – Multidestination Traffic: Anycast-RP with PIM

The role of the Underlay Network, related to BUM traffic in the VXLAN fabric, is to transport ARP, ND, DHCP and other Layer 2 BUM (Broadcast, Unknown Unicast, and Multicast) traffic between the hosts connected to different VTEPs. For the Layer 3 Multicast traffic between hosts, there should be an overlay Multicast routing design. This chapter shows how an Anycast-RP with PIM can be used in a VXLAN fabric. In figure 1, we can see our example topology used in this chapter. There are two Spine switches, which shares the same Anycast-RP IP address and belongs to the same “Anycast-RP set” group (Loopback 238). In addition to that, there is an another loopback interface, which must be unique in each Spine (Loopback 511 and 512). These addresses are used as an Anycast-RP group member Id. Both addresses, shared and unique, needs to be reachable for all switches. Complete configuration can be found from the Appendix 1 at the end of the document.

Note! I am using Cisco VIRL with nxos.7.0.3.I7.1



Figure 1: Example topology with Anycast-RP - IP addresses.

Saturday 10 March 2018

VXLAN Part II. The Underlay network – Unicast Routing

Introduction


VXLAN is MAC-over-IP / UDP tunneling mechanism that allows the Layer2 segments to be stretched over the Layer3 network (Underlay/Transport). In this chapter, I will show one possible design of the Underlay network. I will also show basic configurations and monitor commands. At the end of this article, you can find a mindmap for memory builder.

Our example network consists of four Cisco Nexus 9000 switches. The edge switches Leaf-101 and Leaf-102 works as a VTEP (VXLAN Tunnel Endpoint) devices. VTEPs are responsible for encapsulation of Ethernet frames received from directly connected hosts with VXLAN header as well as removing VXLAN header from the packet received from another VTEP switch. Spine-11 and Spine-12 are the core switches. These switches are not aware of hosts/VMs behind the VTEP Leaf switches, Spine switches only route packet between VTEP switches.


Figure-1: Example topology


Friday 23 February 2018

VXLAN Part I. Why do we need VXLAN?

Introduction


This section examines the challenges that virtualization of servers causes for Datacenter networks with traditional three-layer architecture and how VXLAN can respond to these challenges. At the end of this article, you can find a mindmap for memory builder.

Challenges for existing Datacenter networks
Figure 1-1 shows a hypothetical 3-tier Cloud Service Provider DC network consisting of the following components.

  • Access layer (L2): Twenty of the 48-port switches. Access - Distribution links 2 x 10Gbps MEC (Multichassis EtherChannel).
  • Distribution layer (L2 / L3): Two distribution switches, which together form a virtualized switch. The default gateway for server segments is in distribution switches. Distribution - Core Links are L3.
  • Core Layer (L3): Two Core switches



Figure 1-1: The hypothetical Cloud SP Datacenter network.