Monday 26 March 2018

VXLAN Part V: Flood and Learn

In this chapter, I am going to show how the VXLAN Flood & Learn mac learning process works. I am going to ping from Host-1 to Host-2 and then walk through the Flood and Learn process starting from ARP request. I am using the same Lab that was used in VXLAN Part-IV. Configurations can be found from the VXLAN Part-1 and Part IV.

Figure 1: VXLAN Flood & Learn topology


Step 1: Host-1 starts pinging Host-2. Based on Host-2 IP address, Host-1 knows that they are in same Layer 2 Broadcast domain. To be able to send a Ping request to target in same Layer 2 domain, Host-1 has to resolve the Host-2 Mac address first and it sends an ARP request, where destination mac is broadcast address (ff:ff:ff:ff:ff:ff) address and source is Host-1 own Mac address (fa:16:3e:2d:98:c5).  When VTEP-101 receives the frame, it will learn Host-1 Mac address from source Mac field. This is traditional Mac learning process.

Step 2: Since received destination mac address of Ethernet frame is a broadcast address, VTEP-101 first check for the VLAN/VNI (VXLAN Network Id = VXLAN Segment) association and then sends the broadcast frame towards RP of Multicast groups attach to the VNI. In our example, VLAN 10 belongs VNI1000, which uses Multicast group 238.0.0.10 for BUM traffic.

Through the traditional mac learning process and VLAN to VNI mapping VTEP-101 creates a Layer 2 entry [mac fa:16:3e:2d:98.c5 <=> VNI 10000 <=> E1/3]

VTEP-101 Configurations related to this is shown below.
Vlan 10
  vn-segment 10000
!
interface nve1
  source-interface loopback100
  member vni 10000
    mcast-group 238.0.0.10

Step-3: From the Capture 1, we can see how VTEP-101 handles the broadcast frame. First, it adds a VXLAN header with VNI 10000 in front of the original Ethernet frame. Then it adds a UDP header (VXLAN is MAC over IP-UDP) where the destination port is well-known port 4789. Then it adds an outer IP header (tunnel header) with the destination IP 238.0.0.10 and source IP 192.168.100.101 (Loopback 100 used by NVE 1) and a new Ethernet header. Based on the Multicast group 238.0.0.10 OIL (Outgoing Interface List), VTEP-101 sends encapsulated ARP request out of the Ethernet 1/1.

Leaf-101 MRIB
(*, 238.0.0.10/32), bidir, uptime: 00:25:52, nve ip pim
  Incoming interface: Ethernet1/1, RPF nbr: 192.168.0.11
  Outgoing interface list: (count: 2)
    Ethernet1/1, uptime: 00:07:51, pim, (RPF)
    nve1, uptime: 00:25:52, nve

Figure 2: ARP from Host-1 to Host


Capture 1: ARP request from Host-1

When Spine-11 receives the Multicast Packet from VTEP—101, it routes the packet based on the group 238.0.0.10 OIL (Outgoing Interface List) towards VTEP-102.

Spine-11 MRIB
(*, 238.0.0.10/32), bidir, uptime: 00:54:51, pim ip
  Incoming interface: loopback238, RPF nbr: 192.168.238.1
  Outgoing interface list: (count: 3)
    Ethernet1/2, uptime: 00:08:10, pim
    Ethernet1/1, uptime: 00:08:10, pim
    loopback238, uptime: 00:08:11, pim, (RPF)

VTEP-102 receives the packet (Figure 3) and processes it since it has been joined to Multicast Group 238.0.0.10 (It also has group 238.0.0.10 assigned to VN10000). It first removes the outer IP header and based on the UDP destination port number 4789, it knows that next header is VXLAN header. Then, based on VNI field in the VXLAN header, it notices that the packet belongs to VLAN 10 (VLAN 10 is assigned to VNI 10000). As a last two steps, it first removes the VXLAN header and sends the original Ethernet frame out of all ports that belong to VLAN 10. It also learns, that source mac fa:16:3e:2d:98.c5 (Host-1) belongs to VNI 10000 and is located behind the IP 192.168.100.101 (NVE 1 IP of VTEP-101).

Now VTEP-102 has a L2 entry [mac fa:16:3e:2d:98.c5 <=> VNI 10000 <=> 192.168.100.101].

Figure 3: VTEP-102 operation

Host 2 receives the ARP request and sends an ARP reply as a Layer 2 unicast to mac address fa:16:3e:2d:98.c5 learned from ARP request source field. When VTEP-102 receives this frame, it sends it as an unicast to VTEP-101 based Layer 2 entry [mac fa:16:3e:2d:98.c5 <=> VNI 10000 <=> 192.168.100.101]. It also learns the Mac address of Host 2, and updates the Layer 2 forwarding table. Now it has two entries:

[mac fa:16:3e:2d:98.c5 <=> VNI 10000 <=> 192.168.100.101]
[mac fa:16:3e:0c:3d.b2 <=> VNI 10000 <=> E1/3]

From the Capture 2, we can see that the ARP reply is encapsulated with VXLAN header (VNI 10000) and that the destination IP address is now VTEP-101 NVE 1 address 192.168.100.101 instead of Multicast Address 238.0.0.10. When the Spine-11 receives the packet, it just routes it based on outer IP headers destination IP address to VTEP-101. VTEP-101 receives the packet and notices that the destination address is its own IP address, so it removes the outer IP header first. Then, based on the UDP destination port number 4789, it knows that the next header is a VXLAN header. Based on VNI field in VXLAN header it also knows that the inner frame belongs to its local VLAN 10 (VLAN to VNI mapping). It removes VXLAN header switches frame based on mac table out from E1/3 [mac fa:16:3e:2d:98.c5 <=> VNI 10000 <=> E1/3]. VTEP-101 check the original frames mac address and creates a new entry to its mac table.

[mac fa:16:3e:2d:98.c5 <=> VNI 10000 <=> E1/3]
[mac fa:16:3e:0c:3d.b2 <=> VNI 10000 <=> 192.168.100.102]

Figure4

Capture 2: ARP reply from Host 2

From the last two captured frames, we can see that ICMP request and reply are sent successfully over the VXLAN Fabric. Captures are taken from the link between VTEP-101 and Spine-102. 

Capture 4: ICMP echo request from Host 1 (connected to VTEP-101).

Capture 5: ICMP echo reply from Host 2 (connected to VTEP-101). 

And now we are ready.

Edited: February 21.3.2018 | Toni Pasanen CCIE#28158
ReferencesBuilding Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective
ISBN-10: 1-58714-467-0

8 comments:

  1. Replies
    1. Thanks, nice the hear that you liked it! It is quite hard to try to keep things simple when writing about complex subjects...

      Delete
  2. Great explanation , thank you.

    ReplyDelete
  3. This was awesome, do you have the VX-LAN BGP EVPN control process written up?

    ReplyDelete
    Replies
    1. I wrote three posts about Control-Plane process. This one is one of them: https://nwktimes.blogspot.com/2018/12/vxlan-part-xv-analysis-of-bgp-evpn.html

      Delete
  4. Awesome. Thanks for the post.

    ReplyDelete
  5. Hi Toni,
    Why does the return traffic get decapsulated?
    I thought it would get encapsulated regardless by VXLAN

    ReplyDelete

Note: only a member of this blog may post a comment.