Saturday, 10 March 2018

VXLAN Part II. The Underlay network – Unicast Routing

Introduction


VXLAN is MAC-over-IP / UDP tunneling mechanism that allows the Layer2 segments to be stretched over the Layer3 network (Underlay/Transport). In this chapter, I will show one possible design of the Underlay network. I will also show basic configurations and monitor commands. At the end of this article, you can find a mindmap for memory builder.

Our example network consists of four Cisco Nexus 9000 switches. The edge switches Leaf-101 and Leaf-102 works as a VTEP (VXLAN Tunnel Endpoint) devices. VTEPs are responsible for encapsulation of Ethernet frames received from directly connected hosts with VXLAN header as well as removing VXLAN header from the packet received from another VTEP switch. Spine-11 and Spine-12 are the core switches. These switches are not aware of hosts/VMs behind the VTEP Leaf switches, Spine switches only route packet between VTEP switches.


Figure-1: Example topology



Routing protocols:


Routing protocols can be divided into three main groups; 1) Hop-count (RIP), 2) Link-State (IS-IS, OSPF) and 3) Vector Based Protocols (EIGRP: distance-vector and BGP: path-vector). Link-State protocols calculate the best loop-free path through the network by using SPF algorithm. Link-State protocols observe the link speed when calculating the best path. Link-State protocols also support load sharing with equal cost links (ECMP). When using the Link State protocol, each router in the routing area has unified information about network topology, while EIGRP and RIP believe what neighbor router tells them (routing by the rumor). BGP is often used in an Underlay network, but unlike Link-State protocols, its route selection is based on path attributes such as AS-path length, it does not consider link speeds when selecting the best path. For these reasons, I have chosen OSPF for Underlay routing (and I know it better than IS-IS). 

IP addressing 


Inter-switch link:
All links between switches are Point-to-Point (P2P) links. It is common practice to use network mask / 30 or / 31 on P2P links. Instead of using dedicated sub-network between switches, I am going to use an unnumbered IP-addressing scheme where link addresses are borrowed from the Loopback 0 interface. 

Loopback 0:
As already mentioned, Inter-switch links borrow the Loopback 0 ip address. Loopback0 is also used for underlay routing and as an OSPF RID. 

Loopback 100:
Is used as a VTEP address. We could use the Loopback 0 address for both RID and VTEP address but by using dedicated VTEP IP-address, we can remove the Leaf switch from the VXLAN domain by shutting down the Loopback 100. In this way, we can remove the switch from the VXLAN domain without removing it from the Underlay network and we can investigate possible problems in the underlay network without disturbing server traffic. 

Configuration examples


Note that “ip host” configurations in line four to eight are optional as well the last line “name-lookup” under the OSPF configuration. By using those optional commands, we get VTEP names on the “show ip ospf neighbor” instead of RID IP-address.

Configuration example 1: Leaf-101.
hostname Leaf-101
feature ospf
!
ip host Leaf-101 192.168.0.101
ip host Leaf-102 192.168.0.102
ip host Spine-11 192.168.0.11
ip host Spine-12 192.168.0.12
!
interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  no shutdown
interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  no shutdown
interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.101/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
!
interface loopback100
  description ** VTEP/Overlay **
  ip address 192.168.100.101/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
!
router ospf UNDERLAY-NET
  router-id 192.168.0.101
  name-lookup

Configuration example 2: Spine-11.
hostname Spine-11
feature ospf
ip host Leaf-101 192.168.0.101
ip host Spine-12 192.168.0.12
ip host Spine-11 192.168.0.11
ip host Leaf-102 192.168.0.102
!
interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  no shutdown
interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  no shutdown
interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.11/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
!
router ospf UNDERLAY-NET
  router-id 192.168.0.11
  name-lookup

Monitoring

Show command example 1: Leaf-101 – show ip ospf neighbors.
Leaf-101# sh ip ospf neighbors
 OSPF Process ID UNDERLAY-NET VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 Spine-11          1 FULL/ -          00:04:34 192.168.0.11    Eth1/1
 Spine-12          1 FULL/ -          00:03:24 192.168.0.12    Eth1/2

Show command example 2: Spine-11 – show ip ospf neighbors.
Spine-11# sh ip ospf neighbors
 OSPF Process ID UNDERLAY-NET VRF default
 Total number of neighbors: 2
 Neighbor ID     Pri State            Up Time  Address         Interface
 Leaf-101          1 FULL/ -          00:05:18 192.168.0.101   Eth1/1
 Leaf-102          1 FULL/ -          00:04:32 192.168.0.102   Eth1/2
There are two equal costs links between the Leaf switches and OSPF will use both links. Note! ECMP load sharing is based on 5-tuple (src/dst IP, Transport protocol and src/dst ports of transport protocol). In VXLAN header, the only changing value is source UDP port number, which is calculated from the inner frame. This way the traffic flows from hosts/VMs can be differentiated and send over the different physical links.
Show command example 3: leaf-102 – show ip route ospf.
Leaf-102# sh ip route ospf
IP Route Table for VRF "default"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.0.11/32, ubest/mbest: 1/0
    *via 192.168.0.11, Eth1/1, [110/41], 00:05:39, ospf-UNDERLAY-NET, intra
192.168.0.12/32, ubest/mbest: 1/0
    *via 192.168.0.12, Eth1/2, [110/41], 00:05:16, ospf-UNDERLAY-NET, intra
192.168.0.101/32, ubest/mbest: 2/0
    *via 192.168.0.11, Eth1/1, [110/81], 00:05:16, ospf-UNDERLAY-NET, intra
    *via 192.168.0.12, Eth1/2, [110/81], 00:05:16, ospf-UNDERLAY-NET, intra
192.168.100.101/32, ubest/mbest: 2/0
    *via 192.168.0.11, Eth1/1, [110/81], 00:05:16, ospf-UNDERLAY-NET, intra
    *via 192.168.0.12, Eth1/2, [110/81], 00:05:16, ospf-UNDERLAY-NET, intra

VXLAN Unicast Routing Mind Map

Figure-2: Mind Map
Edited: February 9.3.2018 | Toni Pasanen CCIE#28158
Next part: VXLAN Part III. The Underlay network – Multidestination Traffic: Anycast-RP with PIM
References:
RFC 7348: Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks.
Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective
ISBN-10: 1-58714-467-0

13 comments:

  1. Amazing.. loving this articles!!

    ReplyDelete
  2. Excellent document. Thank you again. I read each and every line. Interesting notes

    ReplyDelete
    Replies
    1. The newest article on this site is about "VXLAN Underlay Routing - Part I: OSPF and Dijkstra/SPF algorithm". You might want to check that out too. I am currently writing a document that describes the differences between the OSPF and the IS-IS protocols from the VTEP switches perspective.

      Delete
    2. Toni Pasanen
      Congratulations for your excellent articles, the high level design are so helpful and explanation is so clear!
      I am planning to follow every lesson including the labs,
      This is a "work of a life" Thanks a lot for sharing!!!

      Delete
    3. I really appreciate your comment, big thanks!

      Delete
  3. Fantastic article. I really appreciate your detailed explanations. I hope you will allow one question on one point which is quite interesting, but not clear to me. You state that different loopbacks for the underlay and the overlay allows to shut nve interface and isolate the vtep from the vxlan network without disturbing host traffic, but wouldn't that isolate all hosts attached to that vtep from the rest of the vxlan network?

    ReplyDelete
    Replies
    1. Hi, I agree shutting down the NVE will restrict connected hosts from the fabric. But what it does not disturb is the underlay network. I have explained Loopback interface numbering scheme and recover process in greater detail in this post: https://nwktimes.blogspot.com/2018/08/vxlan-part-x-vpc-and-gir-bgp-evpn.html

      Delete
  4. Crystal clear explanation!

    ReplyDelete