Tuesday, 17 April 2018

VXLAN Part VI: VXLAN BGP EVPN – Basic Configurations

In my previous post “VXLAN Part V: Flood and Learn”, I have shown, how VXLAN works without Control Plane protocol. In this post, I am going to show how to configure BGP EVPN on VXLAN fabric.

In Figure 1, you can see the high-level overview of our example VXLAN fabric design. We have one vrf context (=tenant) TENANT77 spread over the two VTEPs. We also have two VLANs; VLAN 10 (attached to L2VNI 10000) and VLAN 20 (attached to L2VNI 20000). On each VTEPs there are two connected hosts (Cafe and Abba on VTEP-101, Beef, and Babe on VTEP-102). The cross VLAN flows between the hosts in different VTEPs is routed over the L3VNI 10077. The reason why I start with the configurations is that I want to use show commands as well as Wireshark captures while explaining the theory in my next post.


Note! I am using Cisco VIRL with Nexus 9000v (nxos.7.0.3.I7.1.bin).


Figure 1: VXLAN BGP EVPN

Updated: February 21.4.2018 | Toni Pasanen


 Configuration

The Underlay Network IP connectivity configuration can be found from my previous posts:


You will find the complete configurations of all devices on Appendix 1 at the end of this document as well as a diagram of building blocks and their relationship.


Enabling features

First, we need to enable vxlan and related features as well as routing protocols needed for underlay and overlay:

nv overlay: enables VXLAN.
feature nv overlay evpn: enables EVPN Control Plane
feature fabric forwarding: enables Host Mobility Manager
feature vn-segment-vlan-based: enables VLAN based VXLAN


nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay

The rest of the configurations are divided into two main parts:

Control Plane and tenant configuration (BGP, VRF Context, and EVPN)
Adding a customer network to the tenant

Configuring BGP

In our example, all switches belong to AS65000. Spine-11 is BGP Route Reflector (RR) and VTEPs are RR clients. I am going to use dedicated loopback IP addresses for the BGP peering even though we could also use the same address used with OSPF RID. The reason for dedicated IP address for BGP and OSPF is that I want to draw a clear line between the protocols used in Underlay and Overlay networks. In this his way, we can simplify the troubleshooting process.

In leaf switch VTEP-101, we use ip address 192.168.77.101 (loopback 77  ) as a BGP router ID and we also use it as the source address in iBGP peering with Spine-11 (192.168.77.11).

We want to send and receive the BGP EVPN NLRIs (Network Layer Reachability Information = routing updates), that is why the “address-family l2vpn evpn” is needed in addition to ipv4 unicast afi. What address-family actually is? Well, it describes the type of the information that is carried inside the NLRI (IPv4, IPv6, vpnv4, evpn…).The Address-Family identifier (AFI) number for Layer2 NLRI information is 25 and the Subsequent AFI (SAFI) for EVPN is 70. Under the l2vpn afi, we define the BGP community types that we want to carry with BGP update messages. We are going to use Route-Targets (RT) for importing/exporting routes to and from the BGP process. Since RTs are extended communities and only standard BGP communities are added to NLRI by default, we need to add them to the address-family l2vpn evpn configuration.


router bgp 65000
  router-id 192.168.77.101
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
!
interface loopback77
  description ** BGP peering **
  ip address 192.168.77.101/32
  ip router ospf UNDERLAY-NET area 0.0.0.0

A couple of words about the IP addressing and IP connectivity. In figure 2, we can see that there are three logical Loopback interfaces in each VTEP switch.

Loopback 0: Instead of configuring dedicated ip address on Inter-switch link, I have used “ip unnumbered loopback 0” configuration. This saves ip addresses compared to dedicated subnets in each inter-switch link.

Loopback 100: is used for VXLAN tunnel addressing. NVE 1 interface use Loopback 100 as a source interface.

Loopback 77: Is used for BGP peering. The “MP_REACH_NLRI” Path Attribute in BGP Update message use the ip address of the NVE 1 interface in the “Next Hop Address” field. The tunnel address has to be the next-hop-address of all NLRIs and if eBGP is used Spine switches have to retain the original next-hop-address while forwarding the routing update. Note that BGP RR does not change ANY of the Path Attributes of the reflected route, so the source address in our case is retained automatically.


I have written the article “VXLAN Part X: Recovery issue when BGP EVPN peering uses the same loopback interface as a source than VXLAN NVE1 interface” in which the meaning of Loopback addresses is analyzed in more detail.

Figure 2: BGP and IP addressing

We can verify the BGP peering with show bgp l2vpn evpn summary.

Leaf-101# sh bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 192.168.77.101, local AS number 65000
BGP table version is 181, L2VPN EVPN config peers 1, capable peers 1
2 network entries and 2 paths using 332 bytes of memory
BGP attribute entries [1/160], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [1/4]

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.77.11   4 65000     356     327      181    0    0 04:58:48 1



Configuring VRF Context

VRF context in VXLAN fabric has a dedicated Virtual Network Id (VNI).  When routing traffic between two hosts behind the different VTEPs in different subnets, packets are routed over the L3VNI (Figure 3). VXLAN headers for these routed packets uses L3VNI instead of L2VNI. We are using symmetric Integrated Route and Bridge (IRB) model where all routed traffic inside a tenant will use the same L3VNI.

Note! I am using term “vrf” for virtual routing inside a single box (local). I am using term “tenant” while speaking about the virtual L2/l3 domain spread over the fabric

Figure 3: Routing over between different subnets.

We will set up the vrf context TENANT77 and attach L3VNI 10077 to it (Figure 4). Since we use MP-BGP, we also need to define a Route Distinguisher (RD), as well as Route Targets (RT) specified under the ipv4 unicast afi (routed traffic is Unicast).

RD in VXLAN perspective is an IPv4 address extension, which is used by BGP Route Reflector to differentiate possible overlapping networks in different VRFs/Tenants (Spine BGP RR is not VRF aware). We are going to use automatic RD mode, where RD is formed based on the BGP RID and VRF ID.

Address-family IPv4 unicast in vrf context is used for exporting/importing routes with BGP process. To be able to do that, we also need to attach RT values in each BGP NLRI updates. Since RTs are used for import/export policy, RTs has to be consistent in each VTEP switch. We will use RT auto format, which generates the RT values by combining BGP AS number and L3VNI. Since we are using iBGP peering (all switches belongs to same AS), we can use the auto-generation mode. If each VTEPs are in its own AS (eBGP) then manual mode has to be used, otherwise we end up the situation where each VTEP has a different value for RT and even though routes will successfully be exported to BGP, no one will import those.

After creating the vrf context, we are going to attach it to BGP process.


vrf context TENANT77
  vni 10077
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
!
router bgp 65000
  router-id 192.168.77.101
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn

Figure 4: VRF Context

As can be seen from the output below, the BGP RID for Leaf-101 is 192.168.77.101 and the TENANT77 VRF_ID is 3. These together give us auto-generated RD value 192.168.77.101:3.

Leaf-101# sh vrf
VRF-Name                           VRF-ID State   Reason                       
TENANT77                                3 Up      --      

Leaf-101# sh run bgp | i router
router bgp 65000
  router-id 192.168.77.101                          

Leaf-101# show bgp l2vpn evpn vni-id 10077 | i 10077
Route Distinguisher: 192.168.77.101:3    (L3VNI 10077)


Configuring L2 vlan and L3 vlan interface for L3VNI service

For a routed packet, we need a layer 3 interface and layer 2 vlan. First, we create layer 2 vlan (in our case with id 77) and assign it to vn-segment 10077. Next, we create a layer 3 interface for the vlan and attach it to the vrf context TENANT77. Layer 3 interface does not have an ip address and we are going to use the command “ip forward”, which allows ipv4 traffic on an interface that has no ip address.

Figure 5: L2/L3 VLAN for inter-tenant routing

Configuration examples are taken from VTEP-101.
 vlan 77
  name TENANT77
  vn-segment 10077
!
interface Vlan77
  no shutdown
  mtu 9216
  vrf member TENANT77
  ip forward
!
interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback100
  member vni 10077 associate-vrf
!
evpn


Adding customer vlan to EVPN instance

As the last configuration step, I am going to add two customer subnets in our example VXLAN fabric. We are going to create two VLANs 10 and 20. First, we create layer 2 vlan and attach it to vn-segment (vlan 10 = VNI 10000 and vlan 20 = VNI 20000). We are using anycast-gateway ip address (AGW IP), where the gateway ip for the specific subnet is the same in all VTEPs (vlan 10 = 192.168.11.1 and vlan 20 = 192.168.12.1). Anycast gateway in VXLAN fabric uses AGW MAC address, which is the same across all VTEPs and all of the subnets. We are going to use AGW MAC 0001.0001.0001. Customer layer 3 interfaces are attached to vrf context TENANT77.

To be able to export/import host mac/ip reachability information to/from BGP process we need to add the specific vn-segment (VNI) to EVPN instance with RD and RT values. For the uniqueness of routes, we need to have RD (as we need it in L3VNI) and for the routing policy, we need to have dedicated RT for the VNI (same in each VTEP). In EVPN instance, RD is formed from BGP RD and value 32767 + VLAN ID, which gives us RD: 192.168.77.101:32777. RT is delivered from the BGP ASN and VNI, which gives us RT: 65000:10000.

The last thing to do is attach VNIs associated with vlan to NVE interface. Note that we are using the same mcast group for bum traffic of both VLANs. We are also using ARP-suppression to prevent unnecessary ARP flooding. Even though not shown in the configuration we need to configure the host-facing interfaces to correct vlan.


Note! When a host joins to network, it might use some Address Conflict Detection mechanism to prevent duplicate ip addresses. This can be done with Gratuitous ARP, where a host sends an ARP request by using its own ip addresses in both Sender- and Target IP address fields (see Figure 10 in Appendix 1.). Based on normal mac learning process, VTEP switch learns the mac/ip addresses of connected host and then send a BGP EVPN update to other VTEPs. Note also that the ARP suppression is L2VNI specific.


Figure 6: EVPN instance (EVI)

Template configuration for all VTEPs
 fabric forwarding anycast-gateway-mac 0001.0001.0001
 !
vlan 10
  name L2VNI-for-VLAN10
  vn-segment 10000
!
interface Vlan10
  no shutdown
  vrf member TENANT77
  ip address 192.168.11.1/24
  fabric forwarding mode anycast-gateway
!
evpn
  vni 10000 l2
    rd auto
    route-target import auto
    route-target export auto
!
int nve 1
member vni 10000
    suppress-arp
    mcast-group 238.0.0.10




Basic connectivity test

We are going to test basic connectivity between the hosts with ping.

Ping from Café to Beef (L2VNI service over VXLAN fabric)
Figure 7: ping Café to Beef

Cafe#ping 192.168.11.11
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.11.11, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms




Ping from Café to Abba (Local routing)
Figure 8: ping Café to Abba

Cafe#ping 192.168.12.11
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.12.11, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 2/8/13 ms

Ping from Café to Babe (L3VNI service over VXLAN fabric)
Figure 9: ping Café to Babe

Cafe#ping 192.168.12.12
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.12.12, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/23/29 ms

That’s it. I will go through the operation and theory of the VXLAN BGP EVPN from both Control and Data Plane in my next post.


Author: Toni Pasanen CCIE#28158
Published: 17.4.2018
Updated: 24-May 2018 by Toni Pasanen


References:
Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective
ISBN-10: 1-58714-467-0 – Krattiger Lukas, Shyam Kapadia, and Jansen Davis

BRKDCN-3040: Troubleshooting VxLAN BGP EVPN – Vinit Jain

212682-virtual-extensible-lan-and-ethernet-virt: Virtual Extensible LAN and Ethernet Virtual Private Network - Sabyasachi Kar


APPENDIX 1.

Gratuitous ARP

This Wireshark capture is taken during the time that host Cafe joins to the network for the very first time.


Figure 10: Gratuitous ARP sends by host cafe when joining the network.


Building blocks and relationships in VXLAN.




Figure 11: VXLAN BGP EVPN building blocks.





Complete Configurations

Leaf-101
Leaf-101# sh run

!Command: show running-config
!Time: Mon Apr 16 12:48:51 2018

version 7.0(3)I7(1)
hostname Leaf-101
vdc Leaf-101 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 128 maximum 128
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay

no password strength-check
username admin password 5 $5$aV2kcO97$7ioNn2XTmsfuFj62MLL/wcMnEoJE9ifSY/AFfWPY2/
/  role network-admin
ip domain-lookup
ip host Spine-12 192.168.0.12
snmp-server user admin network-admin auth md5 0x223cfb63ca87c5b4856c960235329cff
 priv 0x223cfb63ca87c5b4856c960235329cff localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

fabric forwarding anycast-gateway-mac 0001.0001.0001
ip pim rp-address 192.168.238.1 group-list 238.0.0.0/24 bidir
ip pim ssm range 232.0.0.0/8
vlan 1,10,20,77
vlan 10
  name L2VNI-for-VLAN10
  vn-segment 10000
vlan 20
  name L2VNI-for-VLAN20
  vn-segment 20000
vlan 77
  name TENANT77
  vn-segment 10077

vrf context TENANT77
  vni 10077
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
vrf context management
hardware access-list tcam region racl 512
hardware access-list tcam region arp-ether 256 double-wide


interface Vlan1
  no shutdown

interface Vlan10
  no shutdown
  vrf member TENANT77
  ip address 192.168.11.1/24
  fabric forwarding mode anycast-gateway

interface Vlan20
  no shutdown
  vrf member TENANT77
  ip address 192.168.12.1/24
  fabric forwarding mode anycast-gateway

interface Vlan77
  no shutdown
  vrf member TENANT77
  ip forward

interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback100
  member vni 10000
    suppress-arp
    mcast-group 238.0.0.10
  member vni 10077 associate-vrf
  member vni 20000
    suppress-arp
    mcast-group 238.0.0.10

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/3
  switchport access vlan 10

interface Ethernet1/4
  switchport access vlan 20

<empty interfaces removed from configuration output>

interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.101/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback77
  description ** BGP peering **
  ip address 192.168.77.101/32
  ip router ospf UNDERLAY-NET area 0.0.0.0

interface loopback100
  description ** VTEP/Overlay **
  ip address 192.168.100.101/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.101
  name-lookup
router bgp 65000
  router-id 192.168.77.101
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
evpn
  vni 10000 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 20000 l2
    rd auto
    route-target import auto
    route-target export auto


Leaf-101#  

Leaf-102
Leaf-102# sh run

!Command: show running-config
!Time: Mon Apr 16 12:51:04 2018

version 7.0(3)I7(1)
hostname Leaf-102
vdc Leaf-102 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 128 maximum 128
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay

username admin password 5 $5$r25DfmPc$EvUgSVebL3gCPQ8e1ngSTxeKYIk4yuuPIomJKa5Lp/
3  role network-admin
ip domain-lookup
ip host Leaf-102 192.168.0.102
ip host Spine-11 192.168.0.11
snmp-server user admin network-admin auth md5 0x713961e592dd5c2401317a7e674464ac
 priv 0x713961e592dd5c2401317a7e674464ac localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

fabric forwarding anycast-gateway-mac 0001.0001.0001
ip pim rp-address 192.168.238.1 group-list 238.0.0.0/24 bidir
ip pim ssm range 232.0.0.0/8
vlan 1,10,20,77
vlan 10
  name L2VNI-for-VLAN10
  vn-segment 10000
vlan 20
  name L2VNI-for-VLAN20
  vn-segment 20000
vlan 77
  name TENANT77
  vn-segment 10077

vrf context TENANT77
  vni 10077
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
vrf context management
hardware access-list tcam region racl 512
hardware access-list tcam region arp-ether 256 double-wide


interface Vlan1
  no shutdown

interface Vlan10
  no shutdown
  vrf member TENANT77
  ip address 192.168.11.1/24
  fabric forwarding mode anycast-gateway

interface Vlan20
  no shutdown
  vrf member TENANT77
  ip address 192.168.12.1/24
  fabric forwarding mode anycast-gateway

interface Vlan77
  no shutdown
  vrf member TENANT77
  ip forward

interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback100
  member vni 10000
    suppress-arp
    mcast-group 238.0.0.10
  member vni 10077 associate-vrf
  member vni 20000
    suppress-arp
    mcast-group 238.0.0.10

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/3
  switchport access vlan 10

interface Ethernet1/4
  switchport access vlan 20

<empty interfaces removed from configuration output>

interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.102/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback77
  description ** BGP peering **
  ip address 192.168.77.102/32
  ip router ospf UNDERLAY-NET area 0.0.0.0

interface loopback100
  description ** VTEP/Overlay **
  ip address 192.168.100.102/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.102
  name-lookup
router bgp 65000
  router-id 192.168.77.102
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
evpn
  vni 10000 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 20000 l2
    rd auto
    route-target import auto
    route-target export auto


Leaf-102#


Spine-11
Spine-11# sh run

!Command: show running-config
!Time: Mon Apr 16 12:53:17 2018

version 7.0(3)I7(1)
hostname Spine-11
vdc Spine-11 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 128 maximum 128
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

nv overlay evpn
feature ospf
feature bgp
feature pim
feature vn-segment-vlan-based
feature nv overlay

no password strength-check
username admin password 5 $5$60DVUPIV$uZWPu6ufHQOJSG18SK5b9/5kpZnV5E4/EFapzQP5CI
/  role network-admin
ip domain-lookup
ip host Spine-12 192.168.0.12
ip host Leaf-102 192.168.0.102
snmp-server user admin network-admin auth md5 0xd177fd3448eab21dd2feb16d54938469
 priv 0xd177fd3448eab21dd2feb16d54938469 localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

ip pim rp-address 192.168.238.1 group-list 238.0.0.0/24 bidir
ip pim ssm range 232.0.0.0/8
vlan 1

vrf context management

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

<empty interfaces removed from configuration output>

interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.11/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback77
  description ** BGP peering **
  ip address 192.168.77.11/32
  ip router ospf UNDERLAY-NET area 0.0.0.0

interface loopback238
  description ** Anycast-RP address **
  ip address 192.168.238.6/29
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.11
  name-lookup
router bgp 65000
  router-id 192.168.77.111
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.101
    remote-as 65000
    update-source loopback77
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client
  neighbor 192.168.77.102
    remote-as 65000
    update-source loopback77
    address-family l2vpn evpn
      send-community
      send-community extended
      route-reflector-client


Spine-11#  

30 comments:

  1. Thanks man , no doubt you've spent a lot of effort on this VXLAN posts , we appreciate that

    ReplyDelete
    Replies
    1. I took time but I also learned a lot during writing process. Thanks for the comment!

      Delete
  2. Hi Toni ,
    I was studying this post , but i need simple and detailed explanation about the below points :

    1. What is BGP extended communities and standard BGP communities, and what role each one play in VXLAN EVPN BGP configuration ?

    2. What is Route Distinguisher (RD), as well as Route Targets (RT), and what role each one play in VXLAN EVPN BGP configuration ?

    Please explain those two point along with the commands that you've used in the configuration , thanks a lot.

    ReplyDelete
    Replies
    1. The configuration snippet is taken from the VTEP Leaf-101
      evpn
      vni 10000 l2
      rd auto
      route-target import auto
      route-target export auto
      vni 20000 l2
      rd auto
      route-target import auto
      route-target export auto

      First, the Layer 2 stuff: The "rd auto" generates the route-distinguisher value BGP RID + (VLAN id mapped to VNI + 32767). So in our case the value will be 192.168.77.10:(10 + 32767) => 192.168.77.101:32777. This "prefix" is associated to each MAC route advertisement in VNI. We also have VLAN 20 attached to VNI 20000. RD value for VNI 20000 is 192.168.77.101:32787. So we have unique RD "prefixes", for each VNIs.

      192.168.77.101:32777 for the VNI 10000
      192.168.77.101:32787 for the VNI 20000

      Now we can use overlapping MAC addresses between VLANs since the mac 1000:cafe:beef looks like:

      192.168.77.101:32777:1000:cafe:beef in VNI 10000
      and
      192.168.77.101:32787:1000:cafe:beef in VNI 20000

      As a summary: RD makes it possible to use an overlapping MAC-addresses. This is especially important in BGP route reflectors point of view which has no idea about EVPN instances or VRFs.
      If you look at the one of the BGP captures, you can see that the RD is a part of the EVPN NLRI information = Part of the advertised address.
      Route-Targets are generated from the BGP AS number and L2VNI. So for The VNI10000, we get the RT value 65000:10000 and for the VNI 20000, we get the RT value 65000:20000. When an ingress VTEP Leaf-101 notices that host Cafe (mac 1000:0010:cafe) in vlan 10 (mapped to VNI10000) joins the network, it will sen a BGP Update with the Extended Community RT65000:10000 (route-target export clause). When another host in the same vlan joins the network, Leaf-101 send a BGP Update again tagged with extended community 65000:10000. So in this way these both mac addresses have a common "tag". Remote Leaf switches, belonging to the same VNI, with the same route-target values defined under the EVPN vni instance will import these routes based on RT. If the import clause is missing, then the route will not be imported by remote VTEP.

      Second, the Layer3 stuff:
      This is configuration snippet again from Leaf-101:

      vrf context TENANT77
      vni 10077
      rd auto
      address-family ipv4 unicast
      route-target both auto
      route-target both auto evpn

      RD is derived from BGP RID + VRF Id, In our case, VRF TEANANT77 gets an RD 192.168.77.101:3. RD is used in the same way that it is used in case of mac addresses, it adds the "prefix" in front of IP address. This makes it possible to use overlapping IP addresses between VRF Context (Inside VRF we still have to have unique IP addresses). If we have two VRFs; RED (id 4) and Blue (Id 5), then, if we use the IP address 192.168.11.11/32 in both VRFs, routers are able to differentiate those snce they have different RD value:

      192.168.77.101:4:192.168.11.11 (RED)
      192.168.77.101:5:192.168.11.11 (Blue)

      The same concept is used in MPLS network where IP addresses extended with RD are called VPNV4 addresses.

      The purpose of RT under VRF context configuration is used in the same way than with a Mac address, by using RT we group the set of IP addresses to one group which use one common tag. That tag is then used for import policy.

      Here is a couple of RFC document that you may see useful.

      RFC 4360 - BGP Extended Communities Attribute
      RFC 1997 - BGP Communities Attribute

      Delete
    2. Thank you so much , very simple and very clear explanation , I'm lucky to know you

      Delete
    3. Thank you Mahmoud for the very kind comment.

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Hi,

    Would you be able to say what the difference is between these two commands under vrf config:
    1) route-target both auto
    2) route-target both auto evpn

    I do understand the purpose of RT, but I have not found what exactly "evpn" keyword does

    I also managed to make it work (L3 routing in VXLAN) with this exact config for vrf:
    vrf context t77
    vni 770000
    rd auto
    address-family ipv4 unicast
    !
    so no "route-target" commands for vrf at all. I am running nxos.9.2.1 on EVE-NG

    ReplyDelete
    Replies
    1. Hi Yuriu,
      Great question! You just gave me a subject for the next post. I will try to answer your question with examples in my next post. By the way, I assume that your ping was taken between the two hosts located in different VTEP but belonging to the same L2VNI? I also assume that you have "route-target import/export auto" commands configured under the evpn => VNI l2 section? What happens if you try to ping between the hosts in different VNI inside a Tenant? But as said, I will get back to your question on my next post, which hopefully is ready during next week.

      Delete
    2. Hello Toni,

      "your ping was taken between the two hosts located in different VTEP but belonging to the same L2VNI?" - no, ping works between hosts that are in different L2 VNI; hosts are in different subnets as well

      "I also assume that you have "route-target import/export auto" commands configured under the evpn => VNI l2 section?" - correct

      Delete
    3. Hello Toni,

      I think I got it.

      "route-target both auto" - defines RTs for VPNv4 prefixes
      "route-target both auto evpn" - defines "RT"s, "ENCAP" and "Router MAC" extended communities for L2VPN EVPN prefixes Type-2 host prefixes and Type-5 prefixes

      router bgp 1
      vrf t77
      address-family ipv4 unicast
      advertise l2vpn evpn - redistributes routes from between VPNv4 L2VPN EVPN

      Routes from VPNv4 will be injected in L2VPN EVPN only if RT for VPNv4 and L2VPN EVPN matches. This is valid only for nx-os 7.

      In nx-os 9 there is no "advertise l2vpn evpn" command, and RTs under vrf are not necessary to define

      Delete
    4. I guess this is part of "VXLAN CLI Simplification—Support added for the reduction of CLI commands." from release notes - https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/9-x/release/notes/921_9000_nxos_rn.html

      Delete
    5. You really got it! I am going to verify this with a simple test. If you take a look at the topology on VXLAN Part IX (figure 9-1), there is an external router Ext-Ro02, which has connected network 172.16.77.0/24. First, we have the following configuration in Leaf-101:
      -------------------------------------
      vrf context TENANT77
      vni 10077
      rd auto
      address-family ipv4 unicast
      route-target both auto
      route-target both auto evpn

      And Leaf-101 has learned the route to network 172.16.77.0 (Route Type-5)
      ------------------------------------
      BGP routing table entry for [5]:[0]:[0]:[24]:[172.16.77.0]:[0.0.0.0]/224, version 192
      Paths: (1 available, best #1)
      Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW

      Advertised path-id 1
      Path type: internal, path is valid, is best path
      Imported to 2 destination(s)
      AS-Path: 64577 , path sourced external to AS
      192.168.100.103 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED 0, localpref 100, weight 0
      Received label 10077
      Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0006.0007
      Originator: 192.168.77.103 Cluster list: 192.168.77.111
      ------------------------

      And there is also IP connectivity between Ext-Ro02 (172.16.77.1) and Host Beef (192.168.11.12) which is verified with ping.
      -----------------------------
      Ext-Ro02#ping 192.168.11.12 source 172.16.77.1
      Type escape sequence to abort.
      Sending 5, 100-byte ICMP Echos to 192.168.11.12, timeout is 2 seconds:
      Packet sent with a source address of 172.16.77.1
      !!!!!
      Success rate is 100 percent (5/5), round-trip min/avg/max = 17/175/739 ms
      -----------------------------------
      Then I am going to remove the command "route-target both auto evpn" under vrf context TENANT77 from Leaf-101 configuration. And now the information has gone...
      -------------------------------------
      Leaf-101# sh bgp l2vpn evpn 172.16.77.0
      BGP routing table information for VRF default, address family L2VPN EVPN
      Leaf-101#
      -----------------------------------------
      By removing command "route-target both" from Leaf-101, we still got the route to 172.16.77.0/24 (not shown here).
      Thank you very much for sharing the information and especially for pointing out the release note.

      Delete
  5. Hi Toni,
    Thanks for the complete list of articles about VXLAN, i have been following all them.
    Some deployments can use a lot of vlans across multiple VTPs. Have you considered using VLAN-aware bundles to reduce the configurations. Do you oversee and impact in the performance by the increased number of MAC-vrfs using this type of service?

    Thanks for your help.

    Leo Espinosa

    ReplyDelete
  6. I'm still not quite sure what the evpn command really does, e.g.

    evpn
    vni 10000 l2

    etc

    ReplyDelete
    Replies
    1. "evpn" part is for MAC-VRF tables. same principle as in MPLS. check out this link - https://yurmagccie.wordpress.com/2018/08/21/vxlan-part-2-bgp-evpn-l2-vni/

      Delete
    2. Hmm still don't get it. If I build a BGP based layer 2 evpn between 2 leaves, client endpoints can ping each other without adding:

      evpn
      vni 999100 l2
      rd auto
      route-target import auto
      route-target export auto


      This is the output from a leaf which does not have the evpn configuration element
      9k-11# show bgp l2vpn evpn vni-id 999100
      BGP routing table information for VRF default, address family L2VPN EVPN
      BGP table version is 9, Local Router ID is 1.1.1.1


      Network Next Hop Metric LocPrf Weight Path
      Route Distinguisher: 1.1.1.1:32867 (L2VNI 999100)
      *>l[2]:[0]:[0]:[48]:[c40b.21bf.0000]:[0]:[0.0.0.0]/216
      1.1.1.1 100 32768 i
      *>i[2]:[0]:[0]:[48]:[c40c.21d7.0000]:[0]:[0.0.0.0]/216
      2.2.2.2 100 0 i
      *>l[3]:[0]:[32]:[1.1.1.1]/88
      1.1.1.1 100 32768 i
      *>i[3]:[0]:[32]:[2.2.2.2]/88
      2.2.2.2 100 0 i



      Delete
    3. Mark, my guess it that you have not read the link that I shared carefully enough. In regards to evpn section, Starting NXOS 9.2.(x), RD/RT values are generated by default. Can be overwritten with user-defined configuration.
      In NXOS 7.(x), RD/RT configuration for L2 is required

      Delete
  7. Hi
    Sorry I mean that there is no "evpn" configuration at all

    https://pastebin.com/bgjPkUym

    ReplyDelete
    Replies
    1. Based on the output, you are using BGP based ingress replication under nve interface. Could you add third leaf with the same configuration into topology. Then test ping again. Check if the icmp packets are received by both remote switches by using wireshark for example. If so, icmp messages are sent over ingres replication tunnels as unkown unicast, not as actual known unicast.

      Delete
    2. Hi Tony, will do. When you say switches, do you mean the PE or CE devices?

      Delete
    3. Hi Toni
      Using the following diagram https://postimg.cc/qhmvvHgg and configurations https://pastebin.com/mrfTp1ey I ran the test. Wireshark attached to the link between 9k-23 and WAN-3; I pinged between T3-R4 and R5 and saw no ICMP, and both mac addresses from R4 and R5 were in the l2vpn table. Only when I pinged to R6 did I see ICMP

      Delete
    4. As an update I decided to add the following config

      evpn
      vni 999100 l2
      rd 888:888
      route-target import 999:999
      route-target export 999:999


      Looking at Wireshark, I could see BGP withdrawn route refresh messages for vni 999100. Looking at the same MAC address from 9k-21 as the previous paste, the RD and RT have been updated as expected

      https://pastebin.com/q5R6D2nZ

      So as to the post that Yuriy linked to previously, it seems that the "evpn" commands aren't required in an all Cisco environment, although IP info is missing from the bgp l2vpn evpn output now.

      Delete
  8. Well that is a great news, five lines less configuration :)

    ReplyDelete
  9. Hi Toni,
    I'm interested from where you have all these Visio stencils you use in your layouts.
    Would be cool if you have a site for download these stencils.
    Thank you and kind regards
    Willi

    ReplyDelete
  10. Trying to understand your topology here.i see 2 interfaces 1/1 and 1/2 on both leafs but only one spine. which interface is going to spine?

    ReplyDelete
  11. Hi Tomy

    i used nexus 9000v on pnetlab but I can't configure advertised l2vpn evpn under vrf on bgp configuration

    ReplyDelete
  12. Hi Toni Pasanen,
    Thanks a lot for your time and efforts in preparing this document to make us learn & understand the VXLAN in a simple way :)
    Thanks again for sharing your knowledge :)

    ReplyDelete