Sunday 18 March 2018

VXLAN Part III: The Underlay Network – Multidestination Traffic: Anycast-RP with PIM

The role of the Underlay Network, related to BUM traffic in the VXLAN fabric, is to transport ARP, ND, DHCP and other Layer 2 BUM (Broadcast, Unknown Unicast, and Multicast) traffic between the hosts connected to different VTEPs. For the Layer 3 Multicast traffic between hosts, there should be an overlay Multicast routing design. This chapter shows how an Anycast-RP with PIM can be used in a VXLAN fabric. In figure 1, we can see our example topology used in this chapter. There are two Spine switches, which shares the same Anycast-RP IP address and belongs to the same “Anycast-RP set” group (Loopback 238). In addition to that, there is an another loopback interface, which must be unique in each Spine (Loopback 511 and 512). These addresses are used as an Anycast-RP group member Id. Both addresses, shared and unique, needs to be reachable for all switches. Complete configuration can be found from the Appendix 1 at the end of the document.

Note! I am using Cisco VIRL with nxos.7.0.3.I7.1



Figure 1: Example topology with Anycast-RP - IP addresses.


I am going to build an Underlay network Multicast routing using Anycast-RP with PIM. During the implementation process, I am also going to explain the theory part.

Figure 2: Anycast-RP ip addresses.

Configuring Anycast-RP cluster

Step-1: enable PIM on both switches.

feature pim

Step-2: Configure a loopback interface for an Anycast-RP shared between the cluster member. This configuration is identical in both Spine switches. Since this interface is used as an RP address, it has to be reachable for all switches. We enable both PIM-SM and OSPF on the new Loopback interface.

!
interface loopback238
  description ** Anycast-RP address **
  ip address 192.168.238.238/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

Step-3: Configure Unique IP address for each Anycast-RP cluster member and enable PIM-SM and OSPF on it. This address is used as a cluster member ID. Also, define the other Anycast-RP cluster members. Our Example configuration is taken from Spine-11. 

Figure 3: Unique IP addressing for Anycast-RP cluster member

interface loopback511
  description ** Unique Address for Anycast-RP **
  ip address 192.168.238.11/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
!
ip pim anycast-rp 192.168.238.238 192.168.238.11
ip pim anycast-rp 192.168.238.238 192.168.238.12

Step-4: Configure the IP address of the RP on all switches and optionally define the Multicast groups for this RP. Also, make sure that PIM-SM and OSPF are enabled on each loopback interface and on each inter-switch link shown in figure 4.

ip pim rp-address 192.168.238.238 group-list 224.0.0.0/4


Figure 4: PIM enabled interfaces

We can verify that both Spines belongs to Anycast-RP cluster.

Spine-11# sh ip pim rp vrf default
PIM RP Status Information for VRF "default"
BSR disabled
Auto-RP disabled
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None

Anycast-RP 192.168.238.238 members:
  192.168.238.11*  192.168.238.12 

RP: 192.168.238.238*, (0),
 uptime: 02:28:30   priority: 255,
 RP-source: (local), 
 group ranges:
 224.0.0.0/4  

Spine-12# sh ip pim rp vrf default
PIM RP Status Information for VRF "default"
BSR disabled
Auto-RP disabled
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None

Anycast-RP 192.168.238.238 members:
  192.168.238.11  192.168.238.12* 

RP: 192.168.238.238*, (0),
 uptime: 02:09:53   priority: 255,
 RP-source: (local), 
 group ranges:
 224.0.0.0/4  

Now we should have a complete Anycast-RP configuration and our VXLAN Fabric should be able to provide L2VNI service to clients. But how this set up actually work? Before continuing, we will configure the NVE 1 (Network Virtualization Edge) interfaces to VTEP switches. In VXLAN Fabrics VTEP switches uses this interface address as a source and destination for all VXLAN encapsulated packets. In other words, NVE interface is the termination point of VXLAN tunnels. So we can say that VTEP (VXLAN Tunnel End Point) is a physical edge/border device between Ethernet and VXLAN domains and NVE interface is the logical border inside the VTEP.


Figure 5: NVE, VNI, VLAN configuration

Configure NVE interface

Step 1: Configure NVE Interface and use Loopback 100 as a source. Attach the VNI 10000 Id to NVE interface and define the Multicast group where the Layer 2 BUM traffic (ARP, DHCP, ND and so on) will be sent. VNI identifies the VXLAN Segments where that VNI belongs to and this information is sent inside the VXLAN header. The last step is to attach Layer 2 VLAN to VNI.

interface nve1
  no shutdown
  source-interface loopback100
  member vni 10000
    mcast-group 238.0.0.10
!
vlan 10
  vn-segment 10000

Now we are ready to see how this really works. What I am going to do is shut down the NVE interfaces on both VTEPs and turn on the packet capture on all inter-switch links. Then I am going to re-open the NVE interfaces, first in VTEP-101 and then in VTEP-102. This way we can see the Multicast Join > Register > Register-Stop process of VTEPs towards the Anycast-RP.

At this moment NVE interfaces are down. Here is the MRIB from both Spine switches.

Spine-11# sh ip mroute
IP Multicast Routing Table for VRF "default"

(*, 232.0.0.0/8), uptime: 00:00:01, pim ip
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)

Spine-12# sh ip mroute
IP Multicast Routing Table for VRF "default"

(*, 232.0.0.0/8), uptime: 00:00:01, pim ip
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)

Now I am bringing up the NVE 1 interface on VTEP1. When the NVE 1 is up, VTEP starts to join process to group 238.0.0.10. The PIM Join message (to group 238.0.0.10) is sent only towards Spine-11 based on 5-tuple hash (Figure 6 and Capture 1). The source address is VTEP-101s Underlay Network IP address 192.168.0.101 and the destination address is 224.0.0.13 (All PIM routers) This way the VTEP-101 joins the RPT (Root Path Tree). 



Figure 6: PIM Join from VTEP-101.


Capture 1. PIM Join from VTEP-101 to RP.

Now the VTEP has been joined to RPT of group 238.0.0.10. Then it sends a PIM registration message, this time VTEP-101 chooses another link towards Spine-12. Why is that? In figure 7 and capture 2 we can see that PIM registration packet is sent as a unicast towards the Anycast-RP address 192.168.238.238 (not to 224.0.0.13) by using NVE 1 ip address as a source. This way the 5-tuple hash is different and may end up to different link (as in case of our example).

Figure 7: PIM Register message from VTEP-101 to Spine-12

Capture 2. PIM register message from VTEP-101.

Next, the Spine-12 will instruct VTEP-101 to stop encapsulation the Multicast traffic to group by Sending Register-Stop message. As can be seen from figure 8 and capture 3, it uses Anycast-RP address as a source and VTEP-101 NVE as a destination as can be seen from Figure 8 and Capture 3. This is the reason why all previously configured Loopback addresses have to be reachable for all switches.

Figure 8: PIM Register –Stop from Spine-12 to VTEP-101

Capture 3: PIM Register-Stop from Spine-12 to VTEP-101

From VTEP point of view, we are done. But how Spine-11 knows that VTEP-101 is registered to group 238.0.0.10 since the registration is sent only to Spine-12? The answer can be found from the RFC4610 (section 3). What happens when Spine-12 receives the PIM Register message from VTEP-101? It will forward it to its Anycast-RP cluster member Spine-11. This can be seen from figure 9 and captures 4 and 5. It uses its Anycast-RP Unique address 192.168.238.12 as a source and the destination address is Spine-11 Anycast-RP unique address 192.168.238.11. Receiving end (Spine-11) can verify that message is received from valid Anycast-RP cluster peer from messages source address (we have configured peers statically). This also explains why the Unique address has to be known by every switch.

Figure 9: VTEP-101 PIM register message relayed by Spine-12 to Spine-11.

Capture 4: VTEP-101 PIM register message relayed by Spine-12 to Spine-11 (Part 1).


Capture 5: VTEP-101 PIM register message relayed by Spine-12 to Spine-11 (Part 2).

And now the PIM registration process is ready. We can see that VTEP-101 is known as a source for Group 238.0.0.10

Spine-11# sh ip mroute
IP Multicast Routing Table for VRF "default"

(*, 232.0.0.0/8), uptime: 00:03:57, pim ip
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)


(*, 238.0.0.10/32), uptime: 00:00:06, pim ip
  Incoming interface: loopback238, RPF nbr: 192.168.238.238
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 00:00:06, pim


(192.168.100.101/32, 238.0.0.10/32), uptime: 00:03:07, pim ip
  Incoming interface: Ethernet1/1, RPF nbr: 192.168.0.101, internal
  Outgoing interface list: (count: 0)

Spine-12# sh ip mroute
IP Multicast Routing Table for VRF "default"

(*, 232.0.0.0/8), uptime: 00:04:24, pim ip
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)


(192.168.100.101/32, 238.0.0.10/32), uptime: 00:03:46, pim ip
  Incoming interface: Ethernet1/1, RPF nbr: 192.168.0.101, internal
  Outgoing interface list: (count: 0)

When enabling Interface NVE 1 on VTEP-102, the same Join > Register > Register-Stop process is done and both VTEPs has joined to group 238.0.0.10. In this phase both VTEPs are known as a source for the group 238.0.0.10.

Spine-11#

Spine-11# sh ip mroute
IP Multicast Routing Table for VRF "default"

(*, 232.0.0.0/8), uptime: 00:11:36, pim ip
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)


(*, 238.0.0.10/32), uptime: 00:07:45, pim ip
  Incoming interface: loopback238, RPF nbr: 192.168.238.238
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 00:07:45, pim


(192.168.100.101/32, 238.0.0.10/32), uptime: 00:10:46, pim ip
  Incoming interface: Ethernet1/1, RPF nbr: 192.168.0.101, internal
  Outgoing interface list: (count: 0)


(192.168.100.102/32, 238.0.0.10/32), uptime: 00:00:45, pim mrib ip
  Incoming interface: Ethernet1/2, RPF nbr: 192.168.0.102, internal
  Outgoing interface list: (count: 1)
    Ethernet1/1, uptime: 00:00:45, pim

Spine-12# sh ip mroute
IP Multicast Routing Table for VRF "default"

(*, 232.0.0.0/8), uptime: 00:11:59, pim ip
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)


(*, 238.0.0.10/32), uptime: 00:02:17, pim ip
  Incoming interface: loopback238, RPF nbr: 192.168.238.238
  Outgoing interface list: (count: 1)
    Ethernet1/2, uptime: 00:02:17, pim


(192.168.100.101/32, 238.0.0.10/32), uptime: 00:11:20, pim ip
  Incoming interface: Ethernet1/1, RPF nbr: 192.168.0.101, internal
  Outgoing interface list: (count: 1)
    Ethernet1/2, uptime: 00:02:17, pim


(192.168.100.102/32, 238.0.0.10/32), uptime: 00:01:19, pim mrib ip
  Incoming interface: Ethernet1/2, RPF nbr: 192.168.0.102, internal
  Outgoing interface list: (count: 0)


Even though not shown in previous pictures, there is a Host-1 with IP 192.168.11.11/24 connected to interface eth 1/3 on VTEP-101. In VTEP-102 there is a Host-2 with IP 192.168.11.12/24 connected to Interface eth1/3. For the verification that our L2VNI is up and also capable of transport L2 BUM (ARP in this case), we ping from Host-1 to Host-2. As can be seen, we lost a couple of ping packets at the beginning because of the ARP process, but then it starts to work.

Host-1#ping 192.168.11.12
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.11.12, timeout is 2 seconds:
...!!
Success rate is 40 percent (2/5), round-trip min/avg/max = 18/22/26 ms
Host-1#

And just for the verification, here are couple captures, where we can see that ARP messages are encapsulated with VXLAN header by VTEP 101 and those are sent to Group 238.0.0.10 with a source address of interface NVE 1. 


Capture 6: ARP request sent by Host 192.168.11.11 encapsulated by VTEP-101.

Here we can see the response from VTEP-102. ARP reply is sent as a Unicast message to VTEP-101 NVE interface address. Do not get confused even though the frame numbers are same (Frame 8) in both captures. The ARP request message was sent over the different link than where the ARP Reply message was received (ECMP).

Capture 7: ARP reply from Host 192.168.11.12 to Host 192.168.11.11 sent by VTEP-102.

I will describe the Flood & Learn process in my becoming posts. But before that, I will write articles about PIM BiDir and Ingress Replication.

Edited: February 17.3.2018 | Toni Pasanen CCIE#28158
Next part: VXLAN Part IV. The Underlay network – Multicast Routing (PIM BiDir)

References:
RFC 4610: Anycast-RP Using Protocol Independent Multicast (PIM)

Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective
ISBN-10: 1-58714-467-0

Appendix 1. Configurations

Leaf-101
Leaf-101# sh run

!Command: show running-config
!Time: Fri Mar 16 12:01:47 2018

version 7.0(3)I7(1)
hostname Leaf-101
vdc Leaf-101 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 128 maximum 128
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay

no password strength-check
username admin password 5 $5$4dqvZbsf$hSqYx5Vb6kNO/UFBzuK2CfAVzDYW7iJMisF3GboHwn
4  role network-admin
ip domain-lookup
ip host Leaf-101 192.168.0.101
ip host Leaf-102 192.168.0.102
ip host Spine-11 192.168.0.11
ip host Spine-12 192.168.0.12
snmp-server user admin network-admin auth md5 0x223cfb63ca87c5b4856c960235329cff
 priv 0x223cfb63ca87c5b4856c960235329cff localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

ip pim rp-address 192.168.238.238 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
vlan 1,10
vlan 10
  vn-segment 10000

vrf context management

interface nve1
  no shutdown
  source-interface loopback100
  member vni 10000
    mcast-group 238.0.0.10

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/3
  switchport access vlan 10
!
interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.101/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback100
  description ** VTEP/Overlay **
  ip address 192.168.100.101/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.101
  name-lookup


Leaf-101#  

Leaf-102
Leaf-102# sh run

!Command: show running-config
!Time: Fri Mar 16 12:02:33 2018

version 7.0(3)I7(1)
hostname Leaf-102
vdc Leaf-102 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 128 maximum 128
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay

username admin password 5 $5$r25DfmPc$EvUgSVebL3gCPQ8e1ngSTxeKYIk4yuuPIomJKa5Lp/
3  role network-admin
ip domain-lookup
ip host Leaf-101 192.168.0.101
ip host Leaf-102 192.168.0.102
ip host Spine-11 192.168.0.11
ip host Spine-12 192.168.0.12
snmp-server user admin network-admin auth md5 0x713961e592dd5c2401317a7e674464ac
 priv 0x713961e592dd5c2401317a7e674464ac localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

ip pim rp-address 192.168.238.238 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
vlan 1,10
vlan 10
  vn-segment 10000

vrf context management

interface nve1
  no shutdown
  source-interface loopback100
  member vni 10000
    mcast-group 238.0.0.10

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/3
  switchport access vlan 10
!
interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.102/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback100
  description ** VTEP/Overlay **
  ip address 192.168.100.102/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.102
  name-lookup


Leaf-102#

Spine-11
Spine-11# sh run

!Command: show running-config
!Time: Fri Mar 16 12:03:18 2018

version 7.0(3)I7(1)
hostname Spine-11
vdc Spine-11 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 128 maximum 128
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

feature ospf
feature pim

no password strength-check
username admin password 5 $5$60DVUPIV$uZWPu6ufHQOJSG18SK5b9/5kpZnV5E4/EFapzQP5CI
/  role network-admin
ip domain-lookup
ip host Leaf-101 192.168.0.101
ip host Spine-12 192.168.0.12
ip host Leaf-102 192.168.0.102
snmp-server user admin network-admin auth md5 0xd177fd3448eab21dd2feb16d54938469
 priv 0xd177fd3448eab21dd2feb16d54938469 localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

ip pim rp-address 192.168.238.238 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
ip pim anycast-rp 192.168.238.238 192.168.238.11
ip pim anycast-rp 192.168.238.238 192.168.238.12
vlan 1

vrf context management

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown
!
interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.11/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback238
  description ** Anycast-RP address **
  ip address 192.168.238.238/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback511
  description ** Unique Address for Anycast-RP **
  ip address 192.168.238.11/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.11
  name-lookup


Spine-11#

Spine-12
Spine-12# sh run

!Command: show running-config
!Time: Fri Mar 16 12:04:04 2018

version 7.0(3)I7(1)
hostname Spine-12
vdc Spine-12 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 128 maximum 128
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

feature ospf
feature pim

no password strength-check
username admin password 5 $5$CnfXhejK$UE7azuRSVXBSEVTPYeW4fI1.UTH3x69GU22CBnVhOA
8  role network-admin
ip domain-lookup
ip host Leaf-101 192.168.0.101
ip host Spine-12 192.168.0.12
ip host Spine-11 192.168.0.11
ip host Leaf-102 192.168.0.102
snmp-server user admin network-admin auth md5 0x40c5b687ff82eb6f487bbafc8a2cf722
 priv 0x40c5b687ff82eb6f487bbafc8a2cf722 localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

ip pim rp-address 192.168.238.238 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
ip pim anycast-rp 192.168.238.238 192.168.238.11
ip pim anycast-rp 192.168.238.238 192.168.238.12
vlan 1

vrf context management

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown
!
interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.12/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback238
  description ** Anycast-RP address **
  ip address 192.168.238.238/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback512
  description ** Unique Address for Anycast-RP **
  ip address 192.168.238.12/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.12
  name-lookup


Spine-12#


23 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hi Toni,
    I'm working on this configuration and everything is okay but i noticed that we need to enable the below
    features which are not mentioned in the post:
    feature interface-vlan
    feature vn-segment-vlan-based
    feature nv overlay

    Also i can't see the configuration for end hosts that you've used for testing , i tried to do it by myself but still can't ping the interface vlan from VPC , my configuration in nexus is as below ,

    interface Vlan10
    no shutdown
    ip address 192.168.10.10/24

    interface Ethernet1/3
    switchport access vlan 10

    and VPC is : ip 192.168.10.1 192.168.10.10 255.255.255.0
    What I'm missing my friend to be able to ping from VPC to Nexus leaf switch and vice versa , kindly advise because i can't do the testing until now

    ReplyDelete
    Replies
    1. Hi Mahmoud,

      I have added a complete configuration at the end of the post. You can compare your configuration to it. Hopefully you can find the answer from there. Let me know if there still is some issues.

      Delete
  3. Hi Toni, I really appriciate your blog!

    Can you just confirm that you have been running the above setup in VIRL? - What nx9000v version are you using on your nodes?

    ReplyDelete
    Replies
    1. Hi Simon, and thanks for visitng. I am using Cisco VIRL with nxos.7.0.3.I7.1. Cheers - Toni

      Delete
    2. Ok thanks for the info!

      Have you also been able to setup a working (pingable) vPC leaf setup in VIRL?

      Delete
    3. Yes I have, and I have also wrote three articles about it:
      VXLAN Part IX: VXLAN BGP EVPN - vPC
      VXLAN Part X: Recovery issue when BGP EVPN peering uses the same loopback interface as a source than VXLAN NVE1 interface
      VXLAN Part XI: Using vPC Peer Link as an Underlay Backup Path
      I have use only VIRL in each article

      Delete
  4. Hi Toni,

    In Figure 5 , mcast group under VTEP-102 has a typo.
    mcast-group 239.0.0.10 is mentioned instead of mcast-group 238.0.0.10

    ReplyDelete
  5. Thanks for notifying. I will fix it asap.

    ReplyDelete
  6. Great work sir...

    ReplyDelete
  7. Hi Tony,
    Thanks a lot your hard work and posting all the details. I just noticed VTEP IP (Loopback ip address) is mentioned wrong in Fig 6,7,8,9.
    Can you cross check and update the VTEP NVE Loopback Ip address.

    ReplyDelete
  8. Thank you Toni for the wonderful detail working of VXLAN, why are underlay loopbacks required on leaf 1 and 2 , 192.168.0.101/102, looks like only TEP loopbacks should suffice.

    ReplyDelete
    Replies
    1. Hi Kedar,
      OSPF is used in Underlay Network and its "job" is to offer an IP connectivity between VTEPs. In OSPF LSDB each router is identified with its unique OSPF RID. If you have to troubleshoot OSPF you might want to check the LSDBs and verify that the advertising routers are available. If you do not advertise Loopbacks used as OSPF RID, you are not able to test the reachability of the router. You could use same RID for all routing protools but I'd rather use different RIDs for IGP and BGP.

      Delete
  9. Hi Toni,

    First of all thank you for the excellent article. I really appreciate the effort you have put into making this.

    I have a question which is really bugging me. Going back to your example of how the pim control plane is built, after you no shut the nve1 interface you say that vtep101 will first send a pim join for group 238.0.0.10 and then register itself as a source for that group - that's all good. What I am struggling with is your statement that the pim join will be sent as determined by a 5-tuple hash - this does not make sense to me, to my understanding when a pim router wants to join the shared tree / rpt tree, it checks to see who is the rp for the group which in this case is the anycast adress shared by both spines and then it consults the unicast route table to check the interface closest to that anycast adress and sends the pim join out thata interface. Now here we would have 2 x ecmp routes to the two spines and to my knowledge the join should be send to the rpf neighbor with higher ip adress - no 5 tuple hash. Could you please comment. And sorry about the long question. 🙂

    ReplyDelete
    Replies
    1. Hi, very good question. PIM Join Messages are sent by using MG address (AllPImRouters) as destination and sent based on MRIB RPF. However, PIM Register Messages are sent to RP address (Unicast) that in our case is Anycast address shared by two Spines. In that sence, device w have to make a decission to what ECMP link to use.

      Delete
    2. Hi Toni,

      Thank you for the reply. I agree that PIM register messages are unicast to the RP address and PIM join messages are multicast to the 224.0.0.13 address. However my question really is why do you think that a hash is used ot select the RPF interface in case of ECMP? I have tried, but failed to find this information in the N9K PIM config guide. I have to say that based on your tests it does look like the switches do not use the highest next hop IP as the tie to determine the RPF interface since in your case we see the PIM join sent to the lower IP. If a hash is used, do you know what's the input of this hash and how we can determine in advance which link would be used to forward the PIM join?

      Sorry for bothering you and thank you again :)

      Petko

      Delete
    3. To aganyone who is interested in the same question I found the answer.

      There's a hidden CLI on N9Ks and indeed the default behavior is a hash of the s,g see below:
      N9K# show ip multicast vrf default
      Multicast Routing VRFs (1 VRFs)
      VRF Name VRF Table Route Group Source (*,G) State
      ID ID Count Count Count Count

      default 1 0x00000001 1 0 0 0 Up
      Multipath configuration (1): s-g-hash <<<<<<<<<<<<<<<<<<<<<<<<
      Resilient configuration: Disabled

      Delete
  10. Excellent article. Thanks a bunch.

    ReplyDelete

Note: only a member of this blog may post a comment.