Friday 23 February 2018

VXLAN Part I. Why do we need VXLAN?

Introduction


This section examines the challenges that virtualization of servers causes for Datacenter networks with traditional three-layer architecture and how VXLAN can respond to these challenges. At the end of this article, you can find a mindmap for memory builder.

Challenges for existing Datacenter networks
Figure 1-1 shows a hypothetical 3-tier Cloud Service Provider DC network consisting of the following components.

  • Access layer (L2): Twenty of the 48-port switches. Access - Distribution links 2 x 10Gbps MEC (Multichassis EtherChannel).
  • Distribution layer (L2 / L3): Two distribution switches, which together form a virtualized switch. The default gateway for server segments is in distribution switches. Distribution - Core Links are L3.
  • Core Layer (L3): Two Core switches



Figure 1-1: The hypothetical Cloud SP Datacenter network.


Assume that there are 48 physical servers connected to each access switch. Each of these servers includes five different tenants (dedicated virtualized environments) with their own virtual routing (VRF). One tenant consists of three broadcast domains; Presentation, Application and Database, each with two virtual machines backing up each other. The customer manages his own tenant and can define the VLANs IDs, the mac addresses of virtual machines and the IP address architecture. The mobility of virtual machines is unlimited. Based on this information, theoretically, there can be:

  • Physical servers: 960 => 20 (ToR) x 48 (port per ToR)
  • VMs / Mac addresses / ARP entries: 28 800 => 960 (hosts) x 30 (VM per host)
  • Broadcast Domains: 14 400 => 5 (tenant per host) x 3 (VLANs per tenant) x 960 (hosts)
  • Tenants / VRF: 4800 => 960 hosts x 5 tenants



Although our example is purely theoretical, it can identify the challenges that today's Datacenter service providers have:

VLAN id limitation: with 12-bits there can only be 4096 different VLANs. In small and medium-size data centers this is more than enough, but in massive Public Cloud Service Provider data centers, this may not be enough.

Multi-tenant: In multi-tenant environments where customers can define both VLAN ids and mac addresses of virtual machines, overlapping may occur.

Mac table size: There are 28,800 virtual machines connected to our example network, which means that switches might have 28 000 mac addresses in their mac address table. Our example demonstrates that the number of mac entries on switches can be considerably large through the server virtualization. If there are more mac addresses that can be stored in switch mac table, the switch may not learn new mac addresses before the unused mac addresses are aged out. This could lead to unnecessary flooding due to unknown destination mac-addresses.

Note! Cisco Nexus 9500/9300 Series Switches have tested support for 90,000 mac addresses.                             

ARP table size: In our example network, the gateway function is on distribution layer switches. Server virtualization also increases the number of IP-MAC entries stored on the ARP table. There can be more than 28 000 IP-MAC entries in our distribution switches.

Note! Cisco Nexus 9500 Series Switches have tested support for 60,000 IPv4 ARPs and 30,000 IPv6 NDs. The corresponding figures for Nexus 9300 series switches are 45,000 (IPv4 ARP) and 20,000 (IPv6 ND).

Spanning-tree (STP): In the traditional layer 2 networks, the Control Plane protocol is STP, which provides a loop-free L2 topology for the hosts. Data Plane is formed by "Flood and Learn" principle, where switches learn the mac addresses from the received Ethernet frames and flood the BUM traffic (Broadcast, Unknown unicast and Multicast). Without the STP-like loop prevention mechanism, the network could choke excessive broadcast messages. Since STP does not support load balancing between links, some of the links may not be actively utilized for traffic transfer. However, load balancing can be achieved by using proper STP design where the root switches of different VLANs or MST instances are decentralized or Multichassis EtherChannel technology is used. In principle, the network could also be constructed using a routed-access model (link switches L3), but this would prevent VM machines from being mobilized between the physical hosts located at a different switch.

How VXLAN responds to the challenges

VXLAN is a MAC-over-IP / UDP tunneling mechanism that allows Layer2 network segments to be "stretched" over Layer 3 network. Each stretched Layer 2 network is represented as a VXLAN segment identified by a 24-bit segment-id (VXLAN Network - VNI). With VNI, we can identify 16 million VXLAN segments. Virtual machines belonging to different VXLAN segments may have overlapping mac addresses or VLANs since only hosts inside VXLAN segment can have Layer 2 connection between each other.

Because VXLAN segments are tunneled over the Layer 3 network, no Spanning Tree Protocol is required. In VXLAN technology-based DC, VLANs no longer has global significance since VLANs are a switch or even switch port specific, meaning that host-A, on subnet 192.168.10.0/24 in Leaf switch 101 may belong to VLANs 200 while host B in the same subnet on different Leaf switch 102 may belong to VLAN 201.

In a VXLAN-based DC network, the Leaf (access) switch link to Spine Switches (Distribution + Core) is a Layer 3 connection, so other than the Leaf switches are not aware of mac addresses of Virtual Machines.

VXLAN enables the use of anycast gateway, where the routing of client networks is distributed between Leaf Switches. This means that gateway address of the network 192.168.10.0/24 (192.168.1.1) is found on each Leaf switch. As the virtual machine moves to a new host connected to the different switch, its gateway is still directly connected. The decentralized anycast gateway greatly reduces the number of mac addresses on individual switch ARP table.


Why VXLAN - Mindmap





Figure 1-2: The Mind Map.


Edited February 9.3.2018 | Toni Pasanen CCIE#28158

Next part: VXLAN Part II. The Underlay network – Unicast Routing  (Downloaded)

Sources:
RFC 7348: Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks.

36 comments:

  1. very useful stuff , you make vxlan easy

    ReplyDelete
    Replies
    1. Thanks for the comment. It is nice to hear that you found it useful!

      Delete
    2. When the Next post, VXLAN Part VIII: BGP EVPN - External Connection will be published ? i eagerly waiting for it !!!

      Delete
    3. I am trying to get it ready during the next week, hopefully before Friday.

      Delete
    4. will you cover " Asymmetric and Symmetric IRB routing " ??? , this subject is important

      Delete
    5. There will be a couple words about it in my next post about external connections. I might write a short post about Asymmetric and Symmetric IRB later. As you said it is an important subject.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Toni, very nice postings. I want sk if you have configure Tennt Routd Mcast for this. It would be help for understand how this work to setup please. thank you

    ReplyDelete
    Replies
    1. HI VPZ, Tenant Routed Multicast (TRM) will be a subject for my future VXLAN post, but before that I am going to write couple other posts.

      Delete
    2. thank you Toni.. you do great work.. you great engineer.. keep go higher

      Delete
  4. Best introduction I ever read!
    One think should be noted - even though there are 24 bits for VXLAN ID which offers 16 000 000+ IDs, inside one particular switch it is still possible to practically use only 4000+ IDs, because configuration is still VLAN-based, that is to say - each VXLAN must be mapped to VLAN and switch can not accept pure VXLAN-encapsulated frame (at least at the time of this comment I do not know any switch model which can accept VXLAN-encapsulated frame on 'access' interface and process it accordingly in the data plane).

    ReplyDelete
    Replies
    1. UPD: for except when we use per-port per-vlan mapping

      Delete
    2. Thank for the comment DN3D. I am currently writing an article about differences between the OSPF and IS-IS from the VXLAN Fabric perspective but I write an article which includes VLAN to VNI mapping section in the near future :)

      Delete
  5. After a long search i got this article published by you. My doubt about "Why VXLAN and what Problem it solves" got clarified. Thank you so much

    ReplyDelete
    Replies
    1. I am really glad that you got some answers to your questions. This kind of comments encourages me to continue writing.

      Delete
  6. Thank you Toni for taking the time to write these blogs, and explaining the topics so clear. It's so relevant in networking today!

    ReplyDelete
  7. I am truly inspired by this online journal! Extremely clear clarification of issues is given and it is open to every living soul. I have perused your post, truly you have given this extraordinary informative data about it.
    HPE ProLiant Micro Gen10

    ReplyDelete
    Replies
    1. I have got so much from the network community so now I try to give something back. I appreciate your comment, thanks!

      Delete
  8. This blog resolved all my queries I had in my mind. Really helpful and supportive subject matter written in all the points. Hard to find such kind of blogs as descriptive and accountable to your doubts.
    QNAP TS X77 Series

    ReplyDelete
  9. This blog aware me about different programs which can become very useful for our friends and kids. Few websites provide combined courses and few of the are separately for single subject. Glad to get this information.
    Precision T3630 MT

    ReplyDelete
  10. Somewhere the content of the blog surrounded by little arguments. Yes it is healthy for readers. They can include this kind of language in their writing skill as well as while group discussion in college.
    CISCO Cisco Meraki MR30H

    ReplyDelete
  11. I am grateful to you on the grounds that your article is exceptionally useful for me to continue with my exploration in same region. Your cited illustrations are all that much significant to my exploration field.This is extraordinary! It really exhibits to me where to broaden my online diary
    APC Smart UPS SRT

    ReplyDelete
    Replies
    1. One of the nicest comment that I ever received, thanks.

      Delete
  12. I gained new knowledge from well written content of this blog. It is showing some different kind of strategy to keep work better and improve with every new assignment. Gracefully written blog
    QNAP TS 228A Series

    ReplyDelete
  13. Nice post Toni. It cleared some of my doubts.

    ReplyDelete
  14. very good explanation!

    ReplyDelete
  15. Good article. I will be facing many of these issues as well..

    ReplyDelete
  16. Hi Toni,
    Could you kindly elobrate on the following..

    Each of these servers includes five different tenants (dedicated virtualized environments) with their own virtual routing (VRF). One tenant consists of three broadcast domains; Presentation, Application and Database, each with two virtual machines backing up each other. The customer manages his own tenant and can define the VLANs IDs, the mac addresses of virtual machines and the IP address architecture. The mobility of virtual machines is unlimited. Based on this information, theoretically, there can be:

    ReplyDelete
    Replies
    1. Hi Toni, Well explained for the DC for a cloud provider. But we see VxLAN becoming a popular choice for campus networks too , where there is no multi-tenancy , the scale on VLAN's is also not much, lesser apps , more users .What could be the driving factors there for a vxlan?

      Delete
    2. Using BGP EVPN/VXLAN we get Spanning-tree free routed underlay network. In addition, there is no need for core virtualization (VSS or some other), which makes OS upgrades and horizontal scaling easier. The tradeoff is that implementation and Layer 3 Network Virtualization technologies might be considered more complex than traditional switched Layer2 switched networks. My opinion is that routed underlay with BGP EVPN/LISP overlay with VXLAN encapsulation is more predictive and simpler than Spanning-Tree with its various versions (PVST, Rapid PVST, MSTP...). If subnets can be restricted within one Access switch without stretching it anywhere, routed network without any L2 virtualization is the most simple solution (in my mind:)

      Delete

Note: only a member of this blog may post a comment.