Tuesday, 7 July 2020

BGP EVPN Underlay Network with OSPF

Introduction


The foundation of a modern Datacenter fabric is an Underlay Network and it is crucial to understand the operation of the Control-Plane protocol solution used in it. The focus of this chapter is OSPF. The first section starts by introducing the network topology and AS numbering scheme used throughout this book. The second section explains how OSPF speakers connected to the same segment become fully adjacent. The third section discusses the process of how OSPF speakers exchange Link State information and build a Link-State Database (LSDB) which is used as an information source for calculating Shortest Path Tree (SPT) towards each destination using Dijkstra algorithm. The focus of the fourth section is an OSPF LSA flooding process. It strat by explaining how local OSPF speaker sends Link State Advertisements wrapped inside a Link-State Update message to its adjacent router and how receiving OSPF speakers a) installs information into LSDB, b) Acknowledge the packet, and c) floods it out of OSPF interfaces. The fifth section discusses of LSA and SPF timers. At the end of this chapter, there are OSPF related configurations from every device.

Infrastructure AS Numbering and IP Addressing Scheme


Figure 1-1 illustrates an AS numbering and an IP address scheme used throughout this book. All Leaf switches have dedicated BGP Private AS number while spine switches in the same cluster share the same AS number. Inter-Switch links use Unnumbered IP addressing using (interface Loopback 0) which is also used as OSPF Router-Id. Loopback 0 is not advertised by any device. OSPF type for Inter-Switch link is point-to-point so there is no DR/BDR election process. Leaf switches also have interface Loopback 30 that is used as a VTEP (VXLAN Tunnel End Point) address. Loopback 30 IP addresses are advertised by Leaf switches. All Loopback interfaces are in OSPF passive interface mode. At this stage, all switches belong to OSPF Area 0.0.0.0.


Figure 1-1: AS Numbering and IP Addressing Scheme.

Wednesday, 25 March 2020

Comparing Internet Connection used in AWS and LISP Based Networks


Forewords

This post starts by discussing the Internet connection from the AWS VPC Control Plane operation perspective. The public AWS documentation only describes the basic components, such as an  Internet Gateway (IGW) and a subnet specific Implicit Routers. However, the public AWS documentation does not describe the Control Plane operation related to distributing the default route from IGWs to IMRs. The AWS VPC Control Plane part in this post is based on my assumptions, so be critical of what you read. The second part of this post shortly explains the Control-Plane operation of the Internet connection used in LISP based network. By comparing the AWS VPC to LISP based network I just want to point out that even though some might think that cloud-based networking is much simple than traditional on-premise networking, it is not. People tend to trust network solutions used in clouds (AWS, Azure, etc.) and there is no debate about (a) what hardware is used, (b) how the redundancy works, (c),  are solutions standard-based and so on. Now it is more like, I do not care how it works as long as it works. Good or bad, I do not know.

Thursday, 12 March 2020

Intra-Subnet Communication: AWS VPC versus LISP Based Campus Fabric


Forewords


This article introduces the principles of the Amazon Web Service Virtual Private Cloud (AWS VPC) Control-Plane operation and Data-Plane encapsulation. Also, this document explains how the same kind of forwarding model can be achieved using standard protocols. Amazon has not published details of its VPC networking solution, and this document relies on publically available information and the author’s studies. The motivation for writing this document was that I wanted to point out that no matter how simple and easy to manage Cloud Networking looks and feels like, those still are as complex as any other large scale networks.

Example Environment


Figure 1-1 illustrates an example AWS VPC environment running on an imaginary application on two Elastic Cloud Computing (EC2) Instances, EC2-A and EC2-B. The instance EC2-A will be launched in physical server Host-A while the instance EC2-B will later be launched in physical server Host-B. The VPC vpc-1a2b3c4d is created in Stockholm (eu-north-1) Region in Availability Zone (AZ) eu-north-1c. The subnet 172.16.31.0/20 can be used in AZ eu-north-1c. The subnet for instances is 172.31.10.0/24. Elastic Network Interface-1 (ENI1) with IP address 172.31.10.10 will be attached to the instance EC2-A and ENI2 with IP address 172.31.10.20 will be attached to the instance EC2-B. For simplicity, the same Security Group (SG) “sg-nwktimes”, allowing all data traffic between EC2-A and EC2-B) is attached to both instances.

Inside both physical servers, there is a software router, Router-1 in Host-A and Router-2 in Host-B. Servers use offload NICs for connection to AZ Underlay Network and data traffic from instances is sent out of the server straight to offload NIC bypassing the hypervisor. The AZ Backbone includes three routers, Router-3, Router-4, and Router-5. Also, there is a Mapping Service that represents the centralized Control Plane. It holds an  Instance-to-Location Mapping Database that has information about every EC2 Instances running on a given VPC. Routers, servers and Mapping Service use IPv6 addressing.

Figure 1-1: Overall example topology and IP addressing scheme.

Monday, 2 March 2020

Similarities Between AWS VPC and Cisco SDA – Intra-Subnet Communication


Update March 6, 2020: This post will be obsolete soon by a new  version


Forewords


This article explains the similarities between a LISP/VXLAN based Campus Fabric and AWS Virtual Private Cloud (VPC) from the Intra-Subnet Control-Plane and Data-Plane operation perspective. The AWS VPC solution details are not publicly available and the information included in this article is based on the author's own study using publically available AWS VPC documentation. 

There are two main reasons for writing this document: 

First, Cisco SDA is an on-prem LAN model while the AWS VPC is an off-prem DC solution. I wanted to point out that these two solutions, even though used for very different purposes, use the same kind of Control-Plane operation and Data-Plane encapsulation and are managed via QUI. This is kind of my answer to ever going discussion about is there DC-networks, Campus-networks and so on, or is there just networks.

Second, my own curiosity to understand the operation of AWS VPC.

I usually start by introducing the example environment and then explaining the configuration, moving to Control-Plane operation and then to Data-Plane operation. However, this time I take a different approach. This article first introduces the example environment but then the Data-Plane operation is discussed before Control-Plane operation. This way it is easier to understand what information is needed and how that information is gathered.

Thursday, 30 January 2020

LISP Control-Plane in Campus Fabric: Table of Contents

This is the table of contents of my book "LISP Control-Plane in Campus Fabric". The book is available at https://leanpub.com/lispcontrol-planeincampusfabric
The book is now complete. It soon will be available also in Amazon.


Sunday, 12 January 2020

VXLAN Book Errata 12-January 2020


Editions have done on 12 January 2020: These updates are made in both pdf-book (available at Leanpub.com) and the paperback version (available at Amazon.com).


Monday, 2 December 2019

VXLAN Book errata updates 30-Nov, 2019


Editions done on 30 November 2019: These updates are made in both pdf-book (available at Leanpub.com) and the paperback version (available at Amazon).

Wednesday, 6 November 2019

Virtual Extensible LAN – VXLAN: Book Updates and Errata


This is an errata for the book "Virtual Extensible LAN - Practical Guide to VXLAN Solution".
Book is available as a pdf-eBook at leanpub.com and as a paperback at Amazon.com.
The book is constantly updated and changes are informed here.

Upload date: 5 November 2019:
Click read more to view the updates.

Saturday, 19 October 2019

Tenant Routed Multicast in VXLAN Fabric


This chapter introduces the “Tenant Routed Multicast” (TRM) solution in BGP EVPN VXLAN fabric. TRM relies on standard-based BGP IPv4 MVPN Address-Family [RFC 6513] and [RFC 6514]. Figure 19-1 illustrates the basic idea of TRM operation. (1) Leaf switches establish a Multicast tunnel per tenant, which they are using for forwarding tenant-specific Intra/Inter-VN Multicast traffic. (2) When Leaf -101 starts receiving Multicast flow from host Cafe to group 239.77.77.77, it updates its tenant specific MRIB table and generates an MVPN route-type 5 “Source Active Auto-Discovery (SA A-D)” route, where the MP-REACH-NLRI carries information about Source-Specific group (S, G). This route-type is used for discovering if there are any Multicast receivers behind remote leafs. When Leaf-102 receives the BGP Update message, it imports information into the BGP table. (3) Next, host Bebe sends an IGMP join message. (5) Leaf-102 updates its MRIB and then it generates the MVPN route-type 7 “Source-Tree Join” route. By doing this, it informs the source that it has local receivers for Multicast group 239.77.77.77. Leaf-101 installs the route into BGP table and updates its MRIB by adding the NVE interface into group-specific OIL. Then it starts forwarding Multicast flow received from host Cafe to the core over Source-Specific Multicast delivery tree which is actually tunneled over tenant-specific Multicast tunnel. In other words, the destination IP address in outer IP header use Multicast tunnel-group address 238.101.102.103 and the source IP address is taken from interface NVE1. By doing this, the actual tenant-specific Inter-VNI Multicast flows are totally transparent to the Spine switch.

This chapter starts by explaining how Multicast tunnels used for Intra-VN (L2), and Inter-VN (L3) are established and how MRIB is constructed. Then it introduces the configuration required for TRM. The last two-section discusses BGP MVPN Control Plane operation and Multicast data forwarding Data Plane operation.


Figure-19-1: Tenant Routed Multicast (TRM) Topology.


The rest 35 pages can be read from my VXLAN book "VXLAN-A Practical guide to VXLAN" published in Leanpub.com

Wednesday, 7 August 2019

VXLAN EVPN Multi-Site



Now you can also download my VXLAN book from the Leanpub.com 
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

This chapter introduces the VXLAN EVPN Multi-Site (EVPN-MS) architecture for interconnecting EVPN Domains. The first section discusses the limitations of flat VXLAN EVPN fabric and the improvements that can be achieved with EVPN-MS. The second section focuses on the technical details of EVPN-MS solutions by using various configuration examples and packet captures.


Figure 1-1: Characteristics of Super-Spine VXLAN fabric.

Wednesday, 19 June 2019

EVPN ESI Multihoming Part III: Data Flows and link failures


Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

This chapter explains the EVPN ESI Multihoming data flows. The first section explains the Intra-VNI flows (L2VNI) Unicast traffic and Second section introduces BUM traffic. Figure 1-1 shows the topology and addressing schemes used in this chapter. Complete configurations of Leaf-102 and Leaf-103 can be found at the end of the document.



Figure 1-1: Topology an addressing scheme.

Saturday, 8 June 2019

EVPN ESI Multihoming- Part II: Fast Convergence and Load Balancing


Now you can also download my VXLAN book from the Leanpub.com 


This chapter introduces the BGP EVPN Route Type1- Ethernet Auto-Discovery (Ethernet A-D) routes. The first section explains the Ethernet A-D per Ethernet Segment (ES) routes, which is mainly used for Fast Convergence. The second section discusses Ethernet A-D per EVI/ES route, which in turn is used for Load Balancing (also called Aliasing/Backup Path).



Figure 1-1: Ethernet A-D per Ethernet Segment (ES) route.

Wednesday, 29 May 2019

EVPN ESI Multihoming - Part I: EVPN Ethernet Segment (ES)


Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

This chapter introduces the standard based EVPN ESI Multi-homing solution in BGP EVPN VXLAN Fabric. It starts by explaining the mechanism of how CE device (Access switch or host) can be attached to two or more independent PE devices (Leaf switches) by using Port-Channel. This section discusses the concept of Ethernet Segment and Port-Channel. Next, this chapter explains how the BGP EVPN Route-Type 4 (Ethernet Segment Route) is for creating the redundancy group between the switches that share the ES. This section introduces the BGP EVPN Route-Type 4 NLRI address format. In addition, this chapter shows how switches belonging to the same redundancy group selects the Designated Forwarder (DF) for BUM traffic among themselves. Also, this chapter introduces the VLAN Consistency Check by using Cisco Fabric Service over IP (CFSoIP). The last two sections explain the Layer 2 Gateway Spanning-Tree (L2G-STP) mechanism and Core-Link Tracking system.

Part II introduces the BGP EVPN Route-Type 1 (Ethernet Auto-Discovery) and how it is used for convergence. Part III discusses the data flows between the hosts in normal and failure situation. Part II and III will be published later.



Figure 1-1: The VXLAN EVPN Multi-homing topology and addressing scheme.

Thursday, 9 May 2019

VXLAN Underlay Routing - Part V: Multi-AS eBGP

Now you can also download my VXLAN book from the Leanpub.com 
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

eBGP as an Underlay Network Routing Protocol: Multi-AS eBGP

This post introduces the Multi-AS eBGP solution in VXLAN Fabric. In this solution, a single AS number is assigned to all spine switches while each leaf switches (or pair of leaf switches) have unique BGP AS number. This solution neither requiresallowas-in” command in leaf switches nor “disable-peer-check” command in the spine switches, which are required in Two-AS solution. The “retain-route-target all” command and BGP L2VPN EVPN address family peer-specific route-map with an option “set ip next-hop-unchanged” is needed on the spine switch. This post also explains the requirements and processes for L2 EVPN VNI specific route import policy when automated derivation of Route-Targets is used. The same IP/MAC address scheme is used in this chapter than what was used in the previous post “VXLAN Underlay Routing - Part IV: Two-AS eBGP” but the Leaf-102 now belongs to BGP AS 65001.


Figure 1-1: The MAC/IP addressing scheme and eBGP peering model.

Sunday, 5 May 2019

VXLAN Underlay Routing - Part IV: Two-AS eBGP


Now you can also download my VXLAN book from the Leanpub.com 

"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

eBGP as an Underlay Network Routing Protocol: Two-AS eBGP

This post explains the Two-AS eBGP solution in VXLAN Fabric, where there is single AS Area for all Leaf switches and other AS Area for all Spine switches. It also discusses how the default operating model used in eBGP peering has to be modified in order to achieve a routing solution required by VXLAN Fabric. These modifications are mainly related to BGP loop prevention model and BGP next-hop path-attribute processing.

Figure 1-1 illustrates the topology used in this chapter. Leaf-101 and Leaf-102 both belong to BGP AS 65000, while Spine-11 belongs to BGP AS 65099. Loopback interfaces used for Overlay Network BGP peering (L100) and for NVE peering (L50) are advertised over BGP AFI IPv4 peering (Underlay Network Control Plane). Host MAC/IP address information is advertised over BGP AFI L2VPN EVPN peering (Overlay Network Control Plane). Ethernet frames between host Café and Abba are encapsulated with a VXLAN tunnel header where the source and destination IP addresses used in the outer IP header are taken from NVE1 interfaces.





Figure 1-1: High-Level operation of VXLAN Fabric

Thursday, 11 April 2019

VXLAN Underlay Routing - Part III: Internal BGP

Now you can also download my VXLAN book from the Leanpub.com 
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

BGP as an Underlay Network Routing Protocol


Using BGP instead of OSPF or IS-IS for Underlay Network routing in BGP VXLAN fabric simplifies the Control Plane operation because there is only one routing protocol running on fabric switches. However, there are some tradeoffs too. The BGP only solution requires at least two BGP Address-Families (afi) per switch, one for the Underlay (IPv4 Unicast) and one for the Overlay (L2VPN EVPN). In addition, if Border Leaf switches are connected to MPLS network, there is a third BGP afi for VPNv4. In some cases, multi-afi BGP makes troubleshooting a bit more complex compared to a single-afi solution where BGP is used only in Overlay Network. The focus of this chapter is VXLAN fabric Underlay Network with iBGP routing.


Figure 1-1: High-Level operation of VXLAN Fabric

Sunday, 24 March 2019

VXLAN Underlay Routing - Part II: OSPF and IS-IS from the VXLAN network perspective

Now you can also download my VXLAN book from the Leanpub.com 


This chapter discusses the differences between the OSPF and the IS-IS from the Network Virtualization Overlay (NVO) solution, especially from the VXLAN network perspective. First, this chapter shortly introduces some of the differences between these two protocols (terminology, timers, and LSAs). Next, this chapter explains the default behavior of the Shortest Path First (SPF) by explaining first the IS-IS reaction when Stub Network goes down. Then the same event is explained from the OSPF perspective. This chapter also introduces OSPF reaction when an Incremental SPF (iSPF) is enabled, and the interface on a link that is not belonging to the Shortest-Path Tree (SPT) goes down. The same event is also discussed with and without iSPF concerning IS-IS. 

Figure 1-1: Comparison of OSPF and IS-IS.

Sunday, 3 March 2019

VXLAN Underlay Routing - Part I: OSPF and Dijkstra/SPF algorithm

Now you can also download my VXLAN book from the Leanpub.com 
"Virtual Extensible LAN VXLAN - A Practical guide to VXLAN Solution Part 1. (373 pages)

The role of the Underlay Network


Underlay Network the main job from the EVPN VXLAN Network Virtualization Overlay (NVO) solutions perspective is to offer resilient IP connectivity between the Network Virtualization Edge Interfaces (NVE) on VXLAN Tunnel End Point (VTEP) devices. In addition, the Underlay Network can be used for BUM traffic forwarding (Broadcast, Unknown Unicast, and Multicast) though this solution requires a Multicast Routing enabled on an Underlay Network. The common routing protocols choices for VXLAN Underlay Network are OSPF, IS-IS which are Link State Protocols and BGP which in turn is Path Vector Protocol. The focus of this chapter is the Dijkstra/Shortest Path First (SPF) algorithm that Link State Protocols uses for calculating the Shortest-Path Tree. Figure 1-1 shows the Link type-1 (point-to-point) and Link Lype-3 (Stub Network) Routers LSA originated by Leaf-101, Leaf-102, Spine-11, and Spine-12. In addition, figure 1-1 illustrates how routers form a topology based on received LSAs.

Figure 1-1: Examples of Link type-1 (p2p) and Link-Type 3 (Stub) Router LSAs.

Monday, 11 February 2019

Consideration when connection an MSTP Region with another MSTP Region or with a Rapid PVST+ Domain


Multiple Spanning-Tree maps the set of VLANs into MST instances (MSTI) which each has an instance-specific STP root switch. In addition, there is a region Internal Spanning Tree (IST) aka MSTI0 that is used for exchanging MSTP BPDUs for all MSTIs. IST BPDUs (capture 1-1) carries all the STP information inside an MSTP Region.

First, MSTP BPDU includes information related to IST such as switch Bridge Id, Root Bridge Id for the Common and Internal Spanning Tree Root (CIST Root) and a timer values (Max Age, Hello Time and Forward Delay). The timer values are used in each MSTP Instances.

Second, The MSTP BPDU carries an MST extension header that includes the name of the MST Region, its Config revision number and a hash value. The hash value is derived from the VLAN to MSTI mapping information, the actual 1:1 VLAN to MSTI information is not carried within BPDU packets. There is also information about the CIST Regional (Internal) Root switch. The difference between the CIST Root and the CIST Regional Root is that the CIST Root is used as an STP Root for all regions when there are multiple MSTP regions connected with each other. The CIST Regional Root in turns is used as an MST Region IST root. MST Extension header carries M-records, which contains the MST Instance specific information such as MSTI Regional Root that is used to create Instance specific loop-free Layer 2 path inside a region. The root election process is based on the Proposal/Agreement messages just like in the Rapid PVST+/RSTP.

Friday, 28 December 2018

VXLAN Part XV: Analysis of the BGP EVPN Control Plane Operation

Document Status: Unfinished
Edited: Monday, 7 January 2019

This chapter covers the following topics:

MAC address learning process (Intra-VNI switching): This section describes how the local VTEP switch learns the MAC addresses of its’ directly connected hosts from the ingress frame and installs the information into the MAC VRF in Layer 2 Routing Information Base (L2RIB) by the L2 forwarding component (L2FWDER). This section also shows how the local VTEP switch advertises the MAC address information to the remote VTEP switch by using BGP EVPN Route Type 2 advertisement (MAC Advertisement Route) and how the Remote VTEP switch installs information into MAC VRF in L2RIB and from there into MAC address table. Intra-L2VNI (Switching) Data Plane operation is explained at the end of the section with various frame capture examples. The white “MAC line” represents these processes in figure 7-1.

MAC-IP address learning process (ARP for Intra-VNI switching and ): This section gives a detailed description how the local VTEP switch learns the IP addresses of its’ locally connected hosts from ARP messages generated by the host and how the Host Mobility Manager component (HMM) installs the information into the IP VRF. This section also shows how the local VTEP switch advertises the IP address information to the remote VTEP switch by using BGP EVPN Route Type 2 (MAC Advertisement Route) advertisement and how the remote VTEP switch installs this information into IP VRF in L2RIB as well as into L3RIB of VRF TENANT77. In addition, this section explains how the ARP Suppression mechanism use MAC-IP binding information to reduce BUM (Broadcast, Unknown Unicast, and Multicast) traffic in VXLAN Fabric. The grey “IP line” represents these processes in figure 7-1.


Prefix advertisement: This section covers how the local VTEP switch redistributes its Anycast Gateway (AGW) subnets into BGP and advertises this information to the remote VTEP switch by using BGP EVPN Route Type 5 (IP Prefix Route) advertisement. This section also explains how the information is used to discover silent hosts. This section also describes how the remote VTEP installs the route from the BGP into local L3RIB. The black “Prefix line” represents these processes in figure 7-1.

Figure 1-1: BGP EVPN Control Plane Operational Overview.