Tuesday 5 June 2018

VXLAN Part VIII: VXLAN BGP EVPN – External Connection

This post shows how to connect an external network to our existing VXLAN fabric. From the two models, Border Leaf and Border Spine, I am going to use Border Leaf model since I do not want to install additional services to the Spine switches, which already hosts both Multicast Rendezvous Point (RP) and BGP Route Reflector (BGP RR). We could, of course, implement Border to Spine switches without having any performance issue, but then the Spine switches become VTEP switches, which means that they will do a VXLAN encapsulation and decapsulation. Keep it in mind that if we scale out the Spine layer by adding a new Spine switch, we also need to scale out the external connection. With the Border Leaf solutions, we get a dedicated border zone.
I am using full-mesh BGP model instead of a U-shaped model for a couple of reasons, it is the most resilient option, there will be no black holing in event of one link failure and there is no need for iBGP peering between Border Leaf switches.


Figure 8-1 shows the topology which we are going to build.

Figure 8-1: VXLAN Fabric external connection basic setup.

eBGP Configuration between Border Leaf-102 and Ext-Ro01

Before moving to dual homed full-mesh external BGP peering solution, we are going to go through the theory part with simple, single-homed topology.

Figure 8-2 shows the IP addressing and logical structure of the example lab. There are a sub-interface e1/4.77 in Border Leaf-102 and interface g0/1.77 in Ext-Ro01, both of these interfaces belong to the vrf TENANT77. An eBGP peering is established between these two interfaces. Our VXLAN Fabric belongs to BGP AS65000 and Ext-Ro01 belongs to BGP AS64577.

Figure 8-2: VXLAN Fabric external connection topology

Example 8-1 shows the BGP configurations on Border Leaf-102. Five last lines are related to peering with Ext-Ro01.


router bgp 65000
  router-id 192.168.77.102
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
    neighbor 10.102.77.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.77
      address-family ipv4 unicast

Example 8-1: BGP configuration of Border Leaf-102


Example 8-2 shows the vrf TENANT77 specific configurations on Ext-Ro01. Note that we are not advertising anything between VXLAN Fabric and external network.


Ext-Ro01#sh run vrf TENANT77
<snipped>
ip vrf TENANT77
 rd 65077:1
 route-target export 65077:1
 route-target import 65077:1
!
<snipped>
!
interface GigabitEthernet0/1.77
 encapsulation dot1Q 77
 ip vrf forwarding TENANT77
 ip address 10.102.77.1 255.255.255.0
!
interface Loopback161
 description ** This Interface simulates external net 172.16.1.0/24 **
 ip vrf forwarding TENANT77
 ip address 172.16.1.1 255.255.255.0
!
router bgp 64577
 !
 address-family ipv4 vrf TENANT77
  neighbor 10.102.77.102 remote-as 65000
  neighbor 10.102.77.102 description ** VXLAN Fabric Border Leaf-102 **
  neighbor 10.102.77.102 update-source GigabitEthernet0/1.77
  neighbor 10.102.77.102 activate
 exit-address-family
!
End

Example 8-2: VRF TENANT77 configurations on Border Leaf-102


Starting point

Host Cafe 192.168.11.11/32 is not yet connected to Leaf-101 neither Ext-Ro01 does not advertise network 172.16.1.0/24 to Border Leaf-102.

Step-1: Ext-Ro01 starts advertising network 172.16.1.0/24 to its eBGP peer Border Leaf-102 (Figure 8-3). It generates a BGP Update message. 

Figure 8-3: BGP Update from Ext-Ro01

From the Capture 8-1, we can see the BGP Update message taken from the router Ext-Ro01 interface G0/1.77. BGP Update message includes Path Attributes: Origin, AS_Path, and Next_Hop and of course NLRI which defines the actual network.

Capture 8-1: BGP NLRI update from Ext-Ro01 to Border Leaf-102.

Step-2: Border Leaf-102 receives the BGP Update message from its interface E1/4.77. It creates two BGP routing table entries, one under the IPv4 Unicast AFI (Example 8-3) and the other one under the L2VPN EVPN AFI (Example 8-4). Since the BGP Update about 172.16.1.0/24 was received from the interface that belongs to the vrf context TENANT77, Border Leaf-102 attached the RT 65000:10077 (Extended Community Path Attribute) to BGP table entry to the BGP table of both AFI.

Leaf-102# sh ip bgp vrf TENANT77 172.16.1.0
BGP routing table information for VRF TENANT77, address family IPv4 Unicast
BGP routing table entry for 172.16.1.0/24, version 6
Paths: (1 available, best #1)
Flags: (0x880c041a) on xmit-list, is in urib, is best urib route, is in HW, exported
  vpn: version 6, (0x100002) on xmit-list
  Advertised path-id 1, VPN AF advertised path-id 1
  Path type: external, path is valid, is best path, in rib
  AS-Path: 64577 , path sourced external to AS
    10.102.77.1 (metric 0) from 10.102.77.1    (172.16.77.77)
      Origin IGP, MED 0, localpref 100, weight 0
      Extcommunity: RT:65000:10077
Example 8-3: BRIB entry for VRF TENANT77 - AFI IPv4 Unicast (Border Leaf-102)

From the example 8-4, we can see that BGP table entry under L2VPN EVPN AFI also includes Route Distinguisher 192.168.77.102:3 taken from the VRF Context TENANT77 (L3VNI ID 10077). In addition to RT 65000:10077 Extended community there is also encapsulation type-8 which means the VXLAN encapsulation.

Leaf-102# sh bgp l2vpn evpn 172.16.1.0 vrf TENANT77
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:3    (L3VNI10077)
BGP routing table entry for [5]:[0]:[0]:[24]:[172.16.1.0]:[0.0.0.0]/224, version 12
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn
  Advertised path-id 1
  Path type: local, path is valid, is best path
  AS-Path: 64577 , path sourced external to AS
    192.168.100.102 (metric 0) from 0.0.0.0 (192.168.77.102)
      Origin IGP, MED 0, localpref 100, weight 0
      Received label 10077
      Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0001.0007
Example 8-4: BRIB entry for VRF TENANT77 - AFI L2VPN EVPN (Border Leaf-102)

Step-3: Border Leaf-102 constructs the BGP EVPN Update (Figure 8-4 and Capture 8-2) and sends it toward BGP Route Reflector, which in turn forwards it without modification of the content (though it will add a cluster list as a routing advertisement loop prevention mechanism) to its RR-Client Leaf-101. Note that in capture there is no Next Hop Address in decimal format but the address is visible in HEX format c0:a8:64:66 =192.168.100.102.

Note that switch Spine-11 (BGP RR) is unaware of any VRF information. But it can handle overlapping IPv4 routing updates since each VRF Context in our VXLAN fabric has different dedicated, auto-generated RD value which is used only with the network belonging to particular VRF Context. Well, we only have one VRF context, TENANT77 but if the other one will be created, it will get on unique RD. Just for the recap, VRF Context auto-RD is formed based on the BGP RDI and VRF Id (VRF IDs are unique).

Figure 8-4: BGP Update from Border Leaf-102 to Leaf-101 via RR Spine-11.


 Capture 8-2: BGP NLRI update from Border Leaf-102 to Leaf-101 (via RR Spine-11)

Step-4: Leaf-101 receives the BGP Update, it import routing update based on RT 65000:10077 which is configured under its vrf context TENANT77. It creates an L3VNI entry for the network 172.16.1.0/24.

Leaf-101# sh bgp l2vpn evpn 172.16.1.0 vrf TENANT77
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.168.77.102:3
BGP routing table entry for [5]:[0]:[0]:[24]:[172.16.1.0]:[0.0.0.0]/224, version 14
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW
  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported to 2 destination(s)
  AS-Path: 64577 , path sourced external to AS
    192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED 0, localpref 100, weight 0
      Received label 10077
      Extcommunity: RT:65000:10077 ENCAP:8 Router MAC:5e00.0001.0007
      Originator: 192.168.77.102 Cluster list: 192.168.77.111
  Path-id 1 not advertised to any peer
Route Distinguisher: 192.168.77.101:3    (L3VNI 10077)
BGP routing table entry for [5]:[0]:[0]:[24]:[172.16.1.0]:[0.0.0.0]/224, version 15
Paths: (1 available, best #1)
Flags: (0x000002) on xmit-list, is not in l2rib/evpn, is not in HW
  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 192.168.77.102:3:[5]:[0]:[0]:[24]:[172.16.1.0]:[0.0.0.0]/224
  AS-Path: 64577 , path sourced external to AS
    192.168.100.102 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED 0, localpref 100, weight 0
      Received label 10077
      Extcommunity: RT:65000:10077 ENCAP:8 RouterzMAC:5e00.0001.0007
      Originator: 192.168.77.102 Cluster list: 192.168.77.111
  Path-id 1 not advertised to any peer
Example 8-5: BRIB entry in Leaf-101.


So, what does all of this information tells to the receiver Leaf-101? First of all, route is exported by Border Leaf-102 with RT 65000:10077 which in turns means that there has to be an import clause for that RT value in Leaf-101 under its vrf context TEANANT77 (remember this is L3 service inside TENANT77), otherwise the route does not end up to BGP table (Control Plane operation).

Then, if some of its connected hosts send a packet targeted to network 172.16.1.0/24, the packet needs to be encapsulated with a new header, where the destination IP address is the NVE1 interface address of Border Leaf-102 and the VXLAN Virtual Network Identifier is VNI 10077 (Data Plane operation).

We could also compare RIB tables between Border Leaf-102 and Leaf-101. Example 8-6 shows that Border Leaf-102 has learned route via BGP from the 10.102.77.1 (Ext-Ro01), the update is external (AD20) and remote AS is 64577.

Leaf-102# sh ip route 172.16.1.0 vrf TENANT77 | sec 172.16.1.0
172.16.1.0/24, ubest/mbest: 1/0
    *via 10.102.77.1, [20/0], 00:35:13, bgp-65000, external, tag 64577

If we take a look at the RIB of the VTEP Leaf-101. We can see that there is additional Data Plane information; VNI segment ID 10077, which is our L3VNI inside TENANT77 used in VXLAN header VNI field. There is also information about tunnel id (remember that VXLAN is tunneling technology).

Leaf-101# sh ip route vrf TENANT77 | sec 172.16.1.0
172.16.1.0/24, ubest/mbest: 1/0
    *via 192.168.100.102%default, [200/0], 00:15:15, bgp-65000, internal, tag 64577 (evpn) segid: 10077 tunnelid: 0xc0a86466 encap: VXLAN
Example 8-6: RIB entry in Leaf-101.

The information in Example 8-7 is not directly related to routing update itself but it useful while doing troubleshooting. We can see that rnh database (Recursive Next Hop) of the VTEP Leaf-101 has information about Border Leaf-102 IP as well as associate tunnel id.

Leaf-101# sh nve internal bgp rnh database
--------------------------------------------
Total peer-vni msgs recvd from bgp: 1
Peer add requests: 1
Peer update requests: 0
Peer delete requests: 0
Peer add/update requests: 1
Peer add ignored (peer exists): 0
Peer update ignored (invalid opc): 0
Peer delete ignored (invalid opc): 0
Peer add/update ignored (malloc error): 0
Peer add/update ignored (vni not cp): 0
Peer delete ignored (vni not cp): 0
--------------------------------------------
Showing BGP RNH Database, size : 1 vni 0

Flag codes: 0 - ISSU Done/ISSU N/A        1 - ADD_ISSU_PENDING        
            2 - DEL_ISSU_PENDING          3 - UPD_ISSU_PENDING
       

VNI    Peer-IP            Peer-MAC            Tunnel-ID  Encap     (A/S)  Flags
10077  192.168.100.102    5e00.0001.0007      0xc0a86466 vxlan     (1/0)    0

Leaf-101#
Example 8-7: RNH database on Leaf-101.

From the Example 8-8 we can see that the tunnel between Leaf switches is up and running. What we also can see is that we are using Symmetric IRB (Integrated Route and Bridge) first-hop routing operation (draft-ietf-bess-evpn-inter-subnet-forwarding-03).
.
Leaf-101# sh nve peers detail
Details of nve Peers:
----------------------------------------
Peer-Ip: 192.168.100.102
    NVE Interface       : nve1
    Peer State          : Up
    Peer Uptime         : 01:21:51
    Router-Mac          : 5e00.0001.0007
    Peer First VNI      : 10077
    Time since Create   : 01:21:52
    Configured VNIs     : 10000,10077,20000
    Provision State     : peer-add-complete
    Learnt CP VNIs      : 10077
    vni assignment mode : SYMMETRIC
    Peer Location       : N/A
Example 8-8: RNH database on Leaf-101.

OK back on track, next we will take a look at how the route to IP address 192.168.11.11/32 of host Cafe ends up to BGP table and RIB of Ext-Ro01.

Step-4 and 5: Now the Host Cafe with IP 192.168.11.11/32 join the network. It sends a Gratuitous ARP (GARP process is explained in Part VII). VTEP switch Leaf-101 learns both the MAC- and IP addresses of Host Cafe from the GARP message. It sends two BGP Update messages to its BGP EVPN peers. The first message contains the MAC address information only and the second message in addition to MAC address includes the IP address information. This process is explained in detail in VXLAN Part VII. Our focus here is the IP address information.

The Host Mobility Manager component of Nexus 9000v installs the route to both L2RIB and L3RIB and from there the route is sent to BGP VRF AFI process. The BGP process constructs the BGP Update message (figure 8-4) with BGP EVPN NLRI Mac advertisement related Path Attributes. AS field is left empty since this is an internal BGP Update. Both L2 and L3 Route-Targets are attached to Extended Community field as well as Encapsulation type, VXLAN (Type-8).

Figure 8-5: BGP Update from Leaf-101 to Border Leaf-102 via RR Spine-11.

The whole BGP Update message can be in Capture 8-3 that is taken from the Uplink between Spine-11 and Border Leaf-102. That is why there are also the Origin-Id the and Cluster-List Path Attributes included in the BGP Update.


 Capture 8-3: BGP NLRI update from Leaf-101 to Border Leaf-102 (via RR Spine-11)

Example 8-9 shows that Border Leaf-102 has received BGP Update from Leaf-101 and installed it to BRIB based RT 64500:10077 defined under the VRF Context TENANT77.

Leaf-102# sh bgp l2vpn evpn 192.168.11.11 | beg L3VNI
Route Distinguisher: 192.168.77.102:3    (L3VNI 10077)
BGP routing table entry for [2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[192.168.11.11]/272, version 7
Paths: (1 available, best #1)
Flags: (0x000202) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path
             Imported from 192.168.77.101:32777:[2]:[0]:[0]:[48]:[1000.0010.cafe]:[32]:[192.168.11.11]/272
  AS-Path: NONE, path sourced internal to AS
    192.168.100.101 (metric 81) from 192.168.77.11 (192.168.77.111)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000 10077
      Extcommunity: RT:65000:10000 RT:65000:10077 ENCAP:8 Router MAC:5e00.0000.0007
      Originator: 192.168.77.101 Cluster list: 192.168.77.111

  Path-id 1 not advertised to any peer

Leaf-102#
Example 8-9: BRIB entry in Border Leaf-102.

Step-6: Border Leaf-102 forwards BGP update to Ext-Ro01 (Figure 8-6). Since the peering between the Border Leaf-102 and Ext-Ro01 is done under the AFI IPv4 only the mandatory Path Attributes related to AFI IPv4 are attached to BGP Update.

Figure 8-6: BGP Update from Border Leaf-102 to Leaf-102 via RR Spine-11.

Capture 8-4 taken from the link between the Border Leaf-102 and Ext-Ro01 shows the BGP Update message send by Border Leaf-102.



 Capture 8-4: BGP NLRI update from Leaf-101 to Border Leaf-102 (via RR Spine-11)

From the example 8-10, we can see that Ext-Ro01 has information about 192.168.11.11/32 received from Border Leaf-102. Note that there is an RD 65077:1 and RT 65077:1 attached to BGP entry even though we those we not included in BGP update. So where does that information comes from?

Ext-Ro01#sh ip bgp vpnv4 vrf TENANT77 192.168.11.11
BGP routing table entry for 65077:1:192.168.11.11/32, version 5
Paths: (1 available, best #1, table TENANT77)
  Not advertised to any peer
  Refresh Epoch 1
  65000
    10.102.77.102 (via vrf TENANT77) from 10.102.77.102 (192.168.11.1)
      Origin IGP, localpref 100, valid, external, best
      Extended Community: RT:65077:1
      rx pathid: 0, tx pathid: 0x0
Ext-Ro01#
Example 8-10: BRIB entry in Ext-Ro01.

Those were defined under the local vrf configuration in Ext-Ro01.

Ext-Ro01#sh run vrf TENANT77 | sec ip vrf
ip vrf TENANT77
 rd 65077:1
 route-target export 65077:1
 route-target import 65077:1
Example 8-11: VRF information in Ext-Ro01.

Information is installed from the BRIB to RIB

Ext-Ro01#sh ip route vrf TENANT77 bgp | sec 192 
      192.168.11.0/32 is subnetted, 1 subnets
B        192.168.11.11 [20/0] via 10.102.77.102, 00:33:13
Example 8-12: VRF information in Ext-Ro01.

Let’s verify that the Data Plane is ok and we have IP connectivity between the network 172.16.1.0/24 connected to Ext-Ro01 and host Cafe 192.168.11.11 connected to VTEP Leaf-101.

Ext-Ro01#ping vrf TENANT77 192.168.11.11 source 172.16.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.11.11, timeout is 2 seconds:
Packet sent with a source address of 172.16.1.1
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 24/34/54 ms
Example 8-13: Ping from Ext-Ro01 to host Cafe.

And we are almost done!

No I am going to change the host Abba IP address to from 192.168.12.11 to 192.168.11.100 and I am going to connect it to Leaf-101.

From the Example 8-14, we can see that Ext-Ro01 receives BGP Update just like it should be.

Ext-Ro01#debug ip routing
IP routing debugging is on
Ext-Ro01#
*May 27 17:44:13.722: RT(TENANT77): updating bgp 192.168.11.100/32 (0x1)  :
    via 10.102.77.102   0 1048577

*May 27 17:44:13.723: RT(TENANT77): add 192.168.11.100/32 via 10.102.77.102, bgp metric [20/0]
Example 8-14: RIB update in Ext-Ro01.


And now it has separate routes to both hosts Cafe 192.168.11.11 and Abba 192.168.11.100 as can see from the example 8-14

Ext-Ro01#sh ip route vrf TENANT77 bgp | sec 192.
      192.168.11.0/32 is subnetted, 2 subnets
B        192.168.11.11 [20/0] via 10.102.77.102, 00:48:32
B        192.168.11.100 [20/0] via 10.102.77.102, 00:04:53
Example 8-15: RIB in Ext-Ro01.

Even though we now have the IP connectivity from the external network to network in VXLAN Fabric and vice versa, we do not want to install each and every host route to the Ext-Ro01 RIB. What we are going to do is to aggregate the host routes into one BGP update. We will do under the vrf TENANT77 ipv4 unicast afi. Remember that we have to use also “summary-only” –option, otherwise in addition to the aggregate address we are going to continue to advertise also all host routes.

router bgp 65000
  router-id 192.168.77.102
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
      aggregate-address 192.168.11.0/24 summary-only
    neighbor 10.102.77.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.77
      address-family ipv4 unicast
Example 8-16: Aggregation in Border Leaf-102.

As we can see there is now only one routing entry in Ext-Ro01 RIB.

Ext-Ro01#sh ip route vrf TENANT77 bgp | b 192
B     192.168.11.0/24 [20/0] via 10.102.77.102, 00:06:5
Example 8-17: RIB in Ext-Ro01.

In the previous example, we used a single-homed BGP peering between VXLAN Fabric and External network while explaining the theory.

Now we are going to setup the dual-homed, full-mesh BGP peering. Our target here is to build a BGP policy model, where both incoming and outgoing paths of specifics networks can be controlled via Border Leaf switches without doing any changes in external routers Ext-Ro01 and Ext-Ro02. I am only going to set up the policy model which only affects the path selection. I am not going to do any incoming and outgoing route filtering, neither optimize the BGP convergence time by using BFD or object/interface tracking or changing BGP keepalive/hold-down timers. I am also not going to filter out the private networks defined in RFC1918 (since we are using those) or default route 0.0.0.0/0. I am also not going to prevent the External network AS64577 to use our VXLAN Fabric as a transit network between Ext-Ro01 and Ext-Ro02 in case that their backbone connection fails.

Since we are using OSPF as an IGP inside the AS64577, we are going to redistribute routes learned via BGP to OSPF. Ext-Ro01 redistribute routes with metric 10 while the Ext-Ro02 uses metric 100. This way the Ext-Ro03 will prefer route learned from the Ext-Ro01. I also set the “metric-type 1” in both routers to make sure that the metric to ASBR is included in the path cost. Instead of redistributing routes from OSPF to BGP, I am using network-clauses in Ext-Ro01 and Ext-Ro02.

Figure 8-7 shows the topology used in this example. Complete configurations of all devices can be found from the Appendix 1 at the end of the post.

Figure 8-7: full-mesh external BGP peering topology.

BGP configurations are shown in Examples through 8-18 to 8-21. The configuration of Ext-Ro01 and Ext-Ro02 also includes the OSPF configuration.

Leaf-102# sh run | sec bgp
feature bgp
  host-reachability protocol bgp
router bgp 65000
  timer bgp 3 9
  router-id 192.168.77.102
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
      aggregate-address 192.168.11.0/24 summary-only
    neighbor 10.102.77.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.77
      address-family ipv4 unicast
        send-community
        send-community extended
    neighbor 10.102.78.2
      remote-as 64577
      description ** External Network - Ext-Ro02 **
      update-source Ethernet1/3.78
      address-family ipv4 unicast
Example 8-18: Border Leaf-102 BGP configuration.

Leaf-103# sh run | sec bgp
feature bgp
  host-reachability protocol bgp
router bgp 65000
  timer bgp 3 9
  router-id 192.168.77.103
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
      aggregate-address 192.168.11.0/24 summary-only
    neighbor 10.103.77.2
      remote-as 64577
      description ** External Network - Ext-Ro02 **
      update-source Ethernet1/3.77
      address-family ipv4 unicast
    neighbor 10.103.78.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.78
      address-family ipv4 unicast
Example 8-19: Border Leaf-103 BGP configuration.

Ext-Ro01#
router ospf 1 vrf TENANT77
 redistribute bgp 64577 metric 10 metric-type 1 subnets
!
router bgp 64577
 timer bgp 3 9
 bgp router-id 172.16.77.77
 bgp log-neighbor-changes
 !
 address-family ipv4
 exit-address-family
 !
 address-family ipv4 vrf TENANT77
  network 172.16.1.0 mask 255.255.255.0
  network 172.16.3.0 mask 255.255.255.0
  neighbor 10.102.77.102 remote-as 65000
  neighbor 10.102.77.102 description ** VXLAN Fabric Border Leaf-102 **
  neighbor 10.102.77.102 update-source GigabitEthernet0/1.77
  neighbor 10.102.77.102 activate
  neighbor 10.103.78.103 remote-as 65000
  neighbor 10.103.78.103 description ** VXLAN Fabric Border Leaf-103 **
  neighbor 10.103.78.103 update-source GigabitEthernet0/3.78
  neighbor 10.103.78.103 activate
 exit-address-family
Ext-Ro01#
Example 8-20: Ext-Ro01 BGP configuration.

Ext-Ro02#
router ospf 1 vrf TENANT77
 redistribute bgp 64577 metric 100 metric-type 1 subnets
!
router bgp 64577
 timer bgp 3 9
 bgp router-id 172.16.77.79
 bgp log-neighbor-changes
 !
 address-family ipv4
 exit-address-family
 !
 address-family ipv4 vrf TENANT77
  network 172.16.3.0 mask 255.255.255.0
  neighbor 10.102.78.102 remote-as 65000
  neighbor 10.102.78.102 description ** VXLAN Fabric Border Leaf-102 **
  neighbor 10.102.78.102 update-source GigabitEthernet0/3.78
  neighbor 10.102.78.102 activate
  neighbor 10.103.77.103 remote-as 65000
  neighbor 10.103.77.103 description ** VXLAN Fabric Border Leaf-103 **
  neighbor 10.103.77.103 update-source GigabitEthernet0/1.77
  neighbor 10.103.77.103 activate
 exit-address-family
Ext-Ro02#
Example 8-21: Ext-Ro01 BGP configuration.
Let's see how the routing looks like. Example 8-22 shows that the Border Leaf-102 has learned route 172.16.3.0/24 from Ext-R01 (best), from Ext-Ro02 and from Spine-11. This decision is based on the lower RID of Ext-Ro01 (Ext-Ro01 BGP RID 172.16.77.78 and Ext-Ro02 BGP RID 172.16.77.79).

Leaf-102# sh ip bgp vrf TENANT77
<snipped>
   Network            Next Hop            Metric     LocPrf     Weight Path
* i172.16.1.0/24      192.168.100.103          0        100          0 64577 i
*>e                   10.102.77.1              0                     0 64577 i
* i172.16.3.0/24      192.168.100.103          2        100          0 64577 i
* e                   10.102.78.2              2                     0 64577 i
*>e                   10.102.77.1              2                     0 64577 i
  a192.168.11.0/24    0.0.0.0                           100      32768 i
Example 8-22: Leaf-102 BGP routes.

Example 8-23 shows that also the Border Leaf-103 has learned route 172.16.3.0/24 from Ext-R01 (best), from Ext-Ro02 and from Spine-11. This decision is also based on the lower RID of Ext-Ro01. Note that both Border Leaf switches are receiving BGP Update about 172.16.3.0/24 also from the VXLAN Fabric Spine switch, which is BGP Route-Reflector. Since the external BGP is prefferd over an internal BGP, it is only a third best route.


Leaf-103# sh ip bgp vrf TENANT77
<snipped>
   Network            Next Hop            Metric     LocPrf     Weight Path
* i172.16.1.0/24      192.168.100.102          0        100          0 64577 i
*>e                   10.103.78.1              0                     0 64577 i
* i172.16.3.0/24      192.168.100.102          2        100          0 64577 i
*>e                   10.103.78.1              2                     0 64577 i
* e                   10.103.77.2              2                     0 64577 i
  a192.168.11.0/24    0.0.0.0                           100      32768 i
Example 8-23: Leaf-102 BGP routes.

Example 8-24 shows that the Border Ext-Ro01 has learned aggregate route 192.168.11.0/24 from Leaf-102 (best) and from Leaf-103.

Ext-Ro01#sh ip bgp vpnv4 vrf TENANT77
<snipped>
     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 65077:1 (default for vrf TENANT77)
 *>   172.16.1.0/24    0.0.0.0                  0         32768 i
 *>   172.16.3.0/24    10.1.3.3                 2         32768 i
 *>   192.168.11.0     10.102.77.102                          0 65000 i
 *                     10.103.78.103                          0 65000 i
Example 8-24: Ext-Ro01 BGP routes.

Example 8-25 shows that the Border Ext-Ro02 has learned aggregate route 192.168.11.0/24 from Leaf-102 (best) and from Leaf-103. Once again the best path selection is based on the lowest BGP peer RID.

Ext-Ro01#sh ip bgp vpnv4 vrf TENANT77
<snipped>
     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 65077:1 (default for vrf TENANT77)
 *>   172.16.1.0/24    0.0.0.0                  0         32768 i
 *>   172.16.3.0/24    10.1.3.3                 2         32768 i
 *>   192.168.11.0     10.102.77.102                          0 65000 i
 *                     10.103.78.103                          0 65000 i
Example 8-25: Ext-Ro02 BGP routes.

We can see that BGP works just fine. At this point, we have not yet implement any BGP policy between the eBGP peers. To be sure that we really have an IP connectivity between the network 192.168.11.0/24 in AS65000 and network 172.16.3.0/24 in AS64577  we are going to ping from host Cafe (192.168.11.11) to address 172.16.3.1 (Loopback 163 on Ext-Ro03) (Example 8-26).

Cafe#ping 172.16.3.1 
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.3.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 10/17/22 m
Example 8-26: Ping from 192.168.11.1 to 172.16.3.1.
And we have IP connectivity in place.

Now it is time to the BGP policy. We are going to do it step by step.

Step-1: Tag the BGP updates about network 192.168.11.0/24 sent by Border-Leaf-102 with the Community Path Attribute 64577:999.

Border Leaf-102:
Step-1.1: Define the prefix-list for VXLAN Fabric internal network 192.168.11.0/24.
Step-1.2: Define the route-map that matches (permit) the previously defined ip prefix-list and set the community 64577:999 for it. Add implicit permit as a last line of route-map.
Step-1.3: Implement outgoing policy towards both external BGP peers Ext-Ro01 and Ext-Ro02.
Step-1.4: Since communities are not sent to BGP peer by default, allow communities to be sent to the BGP peer. You could allow just the standard communities even though in example configuration both standard and extended communities are permitted.

ip prefix-list TENANT77_LOCAL seq 10 permit 192.168.11.0/24
!
route-map OUTGOING_POLICIES permit 10
  match ip address prefix-list TENANT77_LOCAL
  set community 64577:999
!
route-map OUTGOING_POLICIES permit 100
!
Router bgp 65000
  vrf TENANT77
    address-family ipv4 unicast
    neighbor 10.102.77.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.77
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map OUTGOING_POLICIES out
    neighbor 10.102.78.2
      remote-as 64577
      description ** External Network - Ext-Ro02 **
      update-source Ethernet1/3.78
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map OUTGOING_POLICIES out

Example 8-27: Border Leaf-102 outgoing BGP policy.

Step-2: Tag BGP updates about network 192.168.11.0/24 sent by Border-Leaf-103 with Community Path Attribute 64577:9

Border Leaf-103:
Step-1.5: Define the prefix-list for VXLAN Fabric internal network 192.168.11.0/24.
Step-2.6: Define the route-map that matches (permit) the previously defined ip prefix-list and set the community 64577:9 for it. Add implicit permit as a last line of route-map.
Step-1.7: Implement outgoing policy towards both external BGP peers Ext-Ro01 and Ext-Ro02.

ip prefix-list TENANT77_LOCAL seq 10 permit 192.168.11.0/24
!
route-map OUTGOING_POLICIES permit 10
  match ip address prefix-list TENANT77_LOCAL
  set community 64577:9
!
route-map OUTGOING_POLICIES permit 100
!
Router bgp 65000
  vrf TENANT77
    address-family ipv4 unicast
        neighbor 10.103.77.2
      remote-as 64577
      description ** External Network - Ext-Ro02 **
      update-source Ethernet1/3.77
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map OUTGOING_POLICIES out
    neighbor 10.103.78.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.78
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map OUTGOING_POLICIES out

Example 8-28: Border Leaf-103 outgoing BGP policy.

Figure 8-7 shows what we have done so far. Border Leaf-102 attach the BGP COMMUNITY ATTRIBUTE 64577:999 to updates sent to the eBGP peers. Border Leaf-103 in turns attach the BGP COMMUNITY ATTRIBUTE 64577:9 to updates sent to the eBGP peers.

Figure 8-7: BGP Update from Border Leaf-102 and 103 to External routers.

Step-2: Define ingress Policy in Ext-Ro1 and Ext-Ro02. The ingress policy will select the best path based on the Community PA in incoming BGP Updates. Both Border routers in SA64577 will set the weight 999 for all BGP NLRI updates that include the Community PA 64577:999 and weight 9 for all BGP NLRI updates that include the Community PA 64577:9. The BGP NLRI Updates received from the Border Leaf-102 will get higher Weight value in both routers Ext-Ro01 and Ext-Ro02, which in turns means that the route to the network 192.168.11.0/24 via Border Leaf-102 will be best in a stable state.


Ext-Ro01 and Ext-Ro02:
Step-2.1: Define the community-list that permits community PA 64577:999.
Step-2.2: Define the community-list that permits community PA 64577:9.
Step-2.3: Define the route-map that set the weight 999 for all of the BGP NLRI updates that carries community PA 64577:999 and weight 9 for all of the BGP NLRI updates that carries community attribute 64577:9.
Step-3: Implement ingress policy towards both Border Leaf Leaf-102 and Leaf 103.
Step-4: enable bgp-community new-format.

ip bgp-community new-format
!
ip community-list standard SET_WEIGHT_999 permit 64577:999
ip community-list standard SET_WEIGHT_9 permit 64577:9
!
route-map SET_WEIGHT permit 10
 match community SET_WEIGHT_999
 set weight 999
!
route-map SET_WEIGHT permit 100
 match community SET_WEIGHT_9
 set weight 9
!
router bgp 64577
 bgp router-id 172.16.77.77
 bgp log-neighbor-changes
 !
 address-family ipv4
 exit-address-family
 !
 address-family ipv4 vrf TENANT77
  network 172.16.1.0 mask 255.255.255.0
  network 172.16.3.0 mask 255.255.255.0
  neighbor 10.102.77.102 remote-as 65000
  neighbor 10.102.77.102 description ** VXLAN Fabric Border Leaf-102 **
  neighbor 10.102.77.102 update-source GigabitEthernet0/1.77
  neighbor 10.102.77.102 activate
  neighbor 10.102.77.102 route-map SET_WEIGHT in
  neighbor 10.103.78.103 remote-as 65000
  neighbor 10.103.78.103 description ** VXLAN Fabric Border Leaf-103 **
  neighbor 10.103.78.103 update-source GigabitEthernet0/3.78
  neighbor 10.103.78.103 activate
  neighbor 10.103.78.103 route-map SET_WEIGHT in
 exit-address-family
Example 8-29: Ext-Ro01 ingress BGP policy.

ip bgp-community new-format
ip community-list standard SET_WEIGHT_999 permit 64577:999
ip community-list standard SET_WEIGHT_9 permit 64577:9
!
route-map SET_WEIGHT permit 10
 match community SET_WEIGHT_999
 set weight 999
!
route-map SET_WEIGHT permit 100
 match community SET_WEIGHT_9
 set weight 9
!
router bgp 64577
 bgp router-id 172.16.77.77
 bgp log-neighbor-changes
 !
 address-family ipv4
 exit-address-family
 !
  address-family ipv4 vrf TENANT77
  network 172.16.3.0 mask 255.255.255.0
  neighbor 10.102.78.102 remote-as 65000
  neighbor 10.102.78.102 description ** VXLAN Fabric Border Leaf-102 **
  neighbor 10.102.78.102 update-source GigabitEthernet0/3.78
  neighbor 10.102.78.102 activate
  neighbor 10.102.78.102 route-map SET_WEIGHT in
  neighbor 10.103.77.103 remote-as 65000
  neighbor 10.103.77.103 description ** VXLAN Fabric Border Leaf-103 **
  neighbor 10.103.77.103 update-source GigabitEthernet0/1.77
  neighbor 10.103.77.103 activate
  neighbor 10.103.77.103 route-map SET_WEIGHT in
 exit-address-family

Example 8-30: Ext-Ro02 ingress BGP policy.

Figure 8-8 shows the outgoing policy for network 192.168.11.0/24 implemented in AS65000 and ingress policy implemented in AS64577.

Figure 8-8: BGP policy concerning to network 192.168.11.0/24.


Now we are going to verify that our policy actually works. Example 8-31 shows that Ext-Ro01 has received BGP NLRI Update about network 192.168.11.11 from the Border Leaf-102 with community PA 645777:999. Based on the community PA, Ext-Ro01 has set the BGP weight 999 for route received from Border Leaf-102. It also has received BGP NLRI Update from the Border-Leaf 103 but with the community PA 64577:99 which gets the weight 9.


Ext-Ro01#sh ip bgp vpnv4 vrf TENANT77 192.168.11.0
BGP routing table entry for 65077:1:192.168.11.0/24, version 18
Paths: (2 available, best #1, table TENANT77)
  Advertised to update-groups:
     4        
  Refresh Epoch 1
  65000, (aggregated by 65000 192.168.11.1)
    10.102.77.102 (via vrf TENANT77) from 10.102.77.102 (192.168.11.1)
      Origin IGP, localpref 100, weight 999, valid, external, atomic-aggregate, best
      Community: 64577:999
      Extended Community: RT:65077:1
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  65000, (aggregated by 65000 192.168.11.1)
    10.103.78.103 (via vrf TENANT77) from 10.103.78.103 (192.168.11.1)
      Origin IGP, localpref 100, weight 9, valid, external, atomic-aggregate
      Community: 64577:9
      Extended Community: RT:65077:1
      rx pathid: 0, tx pathid: 0

Example 8-31: BGP table entry of Ext-Ro01 about the network 192.168.11.11

As we can see from the Example 8-32, the Ext-Ro02 has equal BGP table entry.

Ext-Ro02#sh ip bgp vpnv4 vrf TENANT77 192.168.11.0
BGP routing table entry for 65077:1:192.168.11.0/24, version 15
Paths: (2 available, best #1, table TENANT77)
  Advertised to update-groups:
     4        
  Refresh Epoch 1
  65000, (aggregated by 65000 192.168.11.1)
    10.102.78.102 (via vrf TENANT77) from 10.102.78.102 (192.168.11.1)
      Origin IGP, localpref 100, weight 999, valid, external, atomic-aggregate, best
      Community: 64577:999
      Extended Community: RT:65077:1
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  65000, (aggregated by 65000 192.168.11.1)
    10.103.77.103 (via vrf TENANT77) from 10.103.77.103 (192.168.11.1)
      Origin IGP, localpref 100, weight 9, valid, external, atomic-aggregate
      Community: 64577:9
      Extended Community: RT:65077:1
      rx pathid: 0, tx pathid: 0

Example 8-32: BGP table entry of Ext-Ro01 about the network 192.168.11.11.

Step-3: Define the incoming BGP policy in Border Leaf switches. Set weight 999 for the network 172.16.3/24 received from Ext-Ro01 and weigh 9 received from Ext-Ro02.

Step-3.1 Define the prefix-list that permits network 172.16.3.0/24
Step-3.2. Define the route-map that match the prefix-list and sets the weight 999 for routes received from Ext-Ro01 and weight 9 for routes received from Ext-Ro02.

router bgp 65000
  router-id 192.168.77.102
  timers bgp 3 9
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
      aggregate-address 192.168.11.0/24 summary-only
    neighbor 10.102.77.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.77
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map INCOMING_POLICIES_FROM_ExtRo01 in
        route-map OUTGOING_POLICIES out
    neighbor 10.102.78.2
      remote-as 64577
      description ** External Network - Ext-Ro02 **
      update-source Ethernet1/3.78
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map INCOMING_POLICIES_FROM_ExtRo02 in
        route-map OUTGOING_POLICIES out
!
ip prefix-list EXTERNAL_GROUP_1 seq 10 permit 172.16.3.0/24
!
route-map INCOMING_POLICIES_FROM_ExtRo01 permit 10
  match ip address prefix-list EXTERNAL_GROUP_1
  set weight 999
route-map INCOMING_POLICIES_FROM_ExtRo01 permit 100
!
route-map INCOMING_POLICIES_FROM_ExtRo02 permit 10
  match ip address prefix-list EXTERNAL_GROUP_1
  set weight 9
route-map INCOMING_POLICIES_FROM_ExtRo02 permit 100

Example 8-33: Ingress BGP policy on Border Leaf-102.

The same logic is applied to Border Leaf-103 (Example 8-34).

router bgp 65000
  router-id 192.168.77.103
  timers bgp 3 9
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
      aggregate-address 192.168.11.0/24 summary-only
    neighbor 10.103.77.2
      remote-as 64577
      description ** External Network - Ext-Ro02 **
      update-source Ethernet1/3.77
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map INCOMING_POLICIES_FROM_ExtRo02 in
        route-map OUTGOING_POLICIES out
    neighbor 10.103.78.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.78
      address-family ipv4 unicast
        send-community
        route-map INCOMING_POLICIES_FROM_ExtRo01 in
        route-map OUTGOING_POLICIES out
!
ip prefix-list EXTERNAL_GROUP_1 seq 10 permit 172.16.3.0/24
!
route-map INCOMING_POLICIES_FROM_ExtRo01 permit 10
  match ip address prefix-list EXTERNAL_GROUP_1
  set weight 999
route-map INCOMING_POLICIES_FROM_ExtRo01 permit 100
!
route-map INCOMING_POLICIES_FROM_ExtRo02 permit 10
  match ip address prefix-list EXTERNAL_GROUP_1
  set weight 9
route-map INCOMING_POLICIES_FROM_ExtRo02 permit 100
Example 8-34: Ingress BGP policy on Border Leaf-103.

As can be seen from the Example 8-35 eBGP update with the highest Weight has been chosen to the best path to network 172.16.3.0/24 in both Border Leaf switches (Examples 8-35 and 8-36).

Leaf-102# sh ip bgp vrf TENANT77
<snipped>
   Network            Next Hop            Metric     LocPrf     Weight Path
* i172.16.1.0/24      192.168.100.103          0        100          0 64577 i
*>e                   10.102.77.1              0                     0 64577 i
* i172.16.3.0/24      192.168.100.103          2        100          0 64577 i
*>e                   10.102.77.1              2                   999 64577 i
* e                   10.102.78.2              2                     9 64577 i
* i192.168.11.0/24    192.168.100.103                   100          0 i
*>a                   0.0.0.0                           100      32768 i
s>i192.168.11.11/32   192.168.100.101                   100          0 i
Example 8-35: BGP table on Border Leaf-102.

Leaf-103# sh ip bgp vrf TENANT77
<snipped>
   Network            Next Hop            Metric     LocPrf     Weight Path
* i172.16.1.0/24      192.168.100.102          0        100          0 64577 i
*>e                   10.103.78.1              0                     0 64577 i
* i172.16.3.0/24      192.168.100.102          2        100          0 64577 i
* e                   10.103.77.2              2                     9 64577 i
*>e                   10.103.78.1              2                   999 64577 i
* i192.168.11.0/24    192.168.100.102                   100          0 i
*>a                   0.0.0.0                           100      32768 i
s>i192.168.11.11/32   192.168.100.101                   100          0 i
Example 8-36: BGP table on Border Leaf-103.

Now our BGP configuration is ready. Let's run a couple of testes. First I am going to verify that we have IP connectivity between 192.168.11.11 and 172.16.3.1 by using a traceroute.

Cafe#traceroute 172.16.3.1
Type escape sequence to abort.
Tracing the route to 172.16.3.1
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.11.1 5 msec 5 msec 8 msec
  2 10.102.77.102 17 msec 31 msec 21 msec
  3 10.102.77.1 26 msec 9 msec 21 msec
  4 10.1.3.3 25 msec 18 msec *
Cafe#
Example 8-37: Trace from 192.168.11.11 to 172.16.3.1

Ext-Ro03#traceroute vrf TENANT77 192.168.11.11 source 172.16.3.1
Type escape sequence to abort.
Tracing the route to 192.168.11.11
VRF info: (vrf in name/id, vrf out name/id)
  1 10.1.3.1 4 msec 7 msec 4 msec
  2 10.102.77.102 8 msec 7 msec 3 msec
  3 192.168.11.1 11 msec 16 msec 25 msec
  4 192.168.11.11 27 msec 16 msec *
Ext-Ro03#
Example 8-38: Trace from 172.16.3.1 to 192.168.11.11


As can be seen from the traceroute examples, the routing is symmetric and the path goes as expected.

Figure 8-9: trace test#1

Now I am going to shut down the interface g0/1.77 on Ext-Ro01. This should mean that based on our policy the trace from left to right should use the path Leaf-102 > Ext-Ro02 > Ext-Ro03 and trace from right to left should use the path Ext-Ro01 > Border Leaf-103. Let's see what happens.

As can be seen from the Examples 8-39 and 8-40, network converges as expected. We could, of course, achieve symmetric routing by tracking interface g0/1.77 state on Ext-Ro01 and is the state is down we could stop redistribution from OSPF to BGP, but as said earlier I am not using interface or any other tracking in this example.

Cafe#traceroute 172.16.3.1 source 192.168.11.11
Type escape sequence to abort.
Tracing the route to 172.16.3.1
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.11.1 13 msec 6 msec 5 msec
  2 10.102.77.102 12 msec 19 msec 14 msec
  3 10.102.78.2 23 msec 22 msec 18 msec
  4 10.2.3.3 26 msec 21 msec *
Cafe#
Example 8-39: Trace from 192.168.11.11 to 172.16.3.1

Ext-Ro03#traceroute vrf TENANT77 192.168.11.11 source 172.16.3.1
Type escape sequence to abort.
Tracing the route to 192.168.11.11
VRF info: (vrf in name/id, vrf out name/id)
  1 10.1.3.1 13 msec 4 msec 4 msec
  2 10.103.78.103 6 msec 4 msec 8 msec
  3 192.168.11.1 35 msec 15 msec 10 msec
  4 192.168.11.11 18 msec 66 msec *
Ext-Ro03#
Example 8-40: Trace from 172.16.3.1 to 192.168.11.11

Figure 8-10: trace test#2

Next, we are going to shut down the interface g0/3.78 on Ext-Ro01. Then both paths from right to leaf and from left to right start using the path through the Border Leaf-102 and Ext-Ro02.


Cafe#traceroute 172.16.3.1 source 192.168.11.11
Type escape sequence to abort.
Tracing the route to 172.16.3.1
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.11.1 3 msec 20 msec 9 msec
  2 10.102.77.102 38 msec 6 msec 8 msec
  3 10.102.78.2 15 msec 14 msec 20 msec
  4 10.2.3.3 43 msec 20 msec *
Cafe#
Example 8-41: Trace from 192.168.11.11 to 172.16.3.1

Ext-Ro03#traceroute vrf TENANT77 192.168.11.11 source 172.16.3.1
Type escape sequence to abort.
Tracing the route to 192.168.11.11
VRF info: (vrf in name/id, vrf out name/id)
  1 10.2.3.2 9 msec 9 msec 2 msec
  2 10.102.78.102 3 msec 36 msec 5 msec
  3 192.168.11.1 34 msec 10 msec 6 msec
  4 192.168.11.11 16 msec 20 msec *
Ext-Ro03#
Example 8-42: Trace from 172.16.3.1 to 192.168.11.11

 Figure 8-11: trace test#3


Just for the last verification, I am going to shut down the interface g0/3.78 on Ext-Ro02 to make sure that path moves from the Border Leaf-102 to Border Leaf-103.

Cafe#traceroute 172.16.3.1 source 192.168.11.11
Type escape sequence to abort.
Tracing the route to 172.16.3.1
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.11.1 9 msec 14 msec 23 msec
  2 10.103.77.103 28 msec 10 msec 11 msec
  3 10.103.77.2 12 msec 20 msec 17 msec
  4 10.2.3.3 13 msec 15 msec *
Cafe#
Example 8-43: Trace from 192.168.11.11 to 172.16.3.1

Ext-Ro03#traceroute vrf TENANT77 192.168.11.11 source 172.16.3.1
Type escape sequence to abort.
Tracing the route to 192.168.11.11
VRF info: (vrf in name/id, vrf out name/id)
  1 10.2.3.2 8 msec 3 msec 4 msec
  2 10.103.77.103 8 msec 7 msec 3 msec
  3 192.168.11.1 15 msec 11 msec 11 msec
  4 192.168.11.11 29 msec 17 msec *
Ext-Ro03#
Example 8-43: Trace from 172.16.3.1 to 192.168.11.11


Figure 8-12: trace test#4


As can be seen, we have achieved predictable routing policy. And we are done!



APPENDIX 1: Configurations.

Leaf-102# sh run

!Command: show running-config
!Time: Sat Jun  2 16:14:43 2018

version 7.0(3)I7(1)
hostname Leaf-102
vdc Leaf-102 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 128 maximum 128
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay

username admin password 5 $5$r25DfmPc$EvUgSVebL3gCPQ8e1ngSTxeKYIk4yuuPIomJKa5Lp/
3  role network-admin
ip domain-lookup
snmp-server user admin network-admin auth md5 0x713961e592dd5c2401317a7e674464ac
 priv 0x713961e592dd5c2401317a7e674464ac localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

fabric forwarding anycast-gateway-mac 0001.0001.0001
ip pim rp-address 192.168.238.1 group-list 238.0.0.0/24 bidir
ip pim ssm range 232.0.0.0/8
vlan 1,10,20,77
vlan 10
  name L2VNI-for-VLAN10
  vn-segment 10000
vlan 20
  name L2VNI-for-VLAN20
  vn-segment 20000
vlan 77
  name TENANT77
  vn-segment 10077

ip prefix-list EXTERNAL_GROUP_1 seq 10 permit 172.16.3.0/24
ip prefix-list TENANT77_LOCAL seq 10 permit 192.168.11.0/24
route-map INCOMING_POLICIES_FROM_ExtRo01 permit 10
  match ip address prefix-list EXTERNAL_GROUP_1
  set weight 999
route-map INCOMING_POLICIES_FROM_ExtRo01 permit 100
route-map INCOMING_POLICIES_FROM_ExtRo02 permit 10
  match ip address prefix-list EXTERNAL_GROUP_1
  set weight 9
route-map INCOMING_POLICIES_FROM_ExtRo02 permit 100
route-map OUTGOING_POLICIES permit 10
  match ip address prefix-list TENANT77_LOCAL
  set community 64577:999
route-map OUTGOING_POLICIES permit 100
route-map SET-MED-to-Ext01 permit 10
  set metric 10
route-map SET_MED_to_Ext01 permit 100
vrf context TENANT77
  vni 10077
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
vrf context management
hardware access-list tcam region racl 512
hardware access-list tcam region arp-ether 256 double-wide


interface Vlan1
  no shutdown

interface Vlan10
  no shutdown
  vrf member TENANT77
  ip address 192.168.11.1/24
  fabric forwarding mode anycast-gateway

interface Vlan20
  no shutdown
  vrf member TENANT77
  ip address 192.168.12.1/24
  fabric forwarding mode anycast-gateway

interface Vlan77
  no shutdown
  vrf member TENANT77
  ip forward

interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback100
  member vni 10000
    suppress-arp
    mcast-group 238.0.0.10
  member vni 10077 associate-vrf
  member vni 20000
    suppress-arp
    mcast-group 238.0.0.10

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/3
  no switchport
  no shutdown

interface Ethernet1/3.78
  encapsulation dot1q 78
  vrf member TENANT77
  ip address 10.102.78.102/24
  no shutdown

interface Ethernet1/4
  no switchport
  no shutdown

interface Ethernet1/4.77
  encapsulation dot1q 77
  vrf member TENANT77
  ip address 10.102.77.102/24
  no shutdown



interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.102/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback77
  description ** BGP peering **
  ip address 192.168.77.102/32
  ip router ospf UNDERLAY-NET area 0.0.0.0

interface loopback100
  description ** VTEP/Overlay **
  ip address 192.168.100.102/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.102
  name-lookup
router bgp 65000
  router-id 192.168.77.102
  timers bgp 3 9
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
      aggregate-address 192.168.11.0/24 summary-only
    neighbor 10.102.77.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.77
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map INCOMING_POLICIES_FROM_ExtRo01 in
        route-map OUTGOING_POLICIES out
    neighbor 10.102.78.2
      remote-as 64577
      description ** External Network - Ext-Ro02 **
      update-source Ethernet1/3.78
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map INCOMING_POLICIES_FROM_ExtRo02 in
        route-map OUTGOING_POLICIES out
evpn
  vni 10000 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 20000 l2
    rd auto
    route-target import auto
    route-target export auto


Leaf-102# 
Example 8-44: Complete config LEAF-102



Leaf-103# sh run

!Command: show running-config
!Time: Sat Jun  2 16:16:57 2018

version 7.0(3)I7(1)
hostname Leaf-103
vdc Leaf-103 id 1
  limit-resource vlan minimum 16 maximum 4094
  limit-resource vrf minimum 2 maximum 4096
  limit-resource port-channel minimum 0 maximum 511
  limit-resource u4route-mem minimum 248 maximum 248
  limit-resource u6route-mem minimum 96 maximum 96
  limit-resource m4route-mem minimum 58 maximum 58
  limit-resource m6route-mem minimum 8 maximum 8

nv overlay evpn
feature ospf
feature bgp
feature pim
feature fabric forwarding
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay

no password strength-check
username admin password 5 $5$.82HC6Bt$QEpUIVi292elRGmwWNLciK2xa2z13xVwsGhdp2BMU0
D  role network-admin
ip domain-lookup
snmp-server user admin network-admin auth md5 0x7f693b750cc7550144b8410e07eecf1d
 priv 0x7f693b750cc7550144b8410e07eecf1d localizedkey
rmon event 1 description FATAL(1) owner PMON@FATAL
rmon event 2 description CRITICAL(2) owner PMON@CRITICAL
rmon event 3 description ERROR(3) owner PMON@ERROR
rmon event 4 description WARNING(4) owner PMON@WARNING
rmon event 5 description INFORMATION(5) owner PMON@INFO

fabric forwarding anycast-gateway-mac 0001.0001.0001
ip pim rp-address 192.168.238.1 group-list 238.0.0.0/24 bidir
ip pim ssm range 232.0.0.0/8
vlan 1,10,20,77
vlan 10
  name L2VNI-for-VLAN10
  vn-segment 10000
vlan 20
  name L2VNI-for-VLAN20
  vn-segment 20000
vlan 77
  name TENANT77
  vn-segment 10077

ip prefix-list EXTERNAL_GROUP_1 seq 10 permit 172.16.3.0/24
ip prefix-list TENANT77_LOCAL seq 10 permit 192.168.11.0/24
route-map INCOMING_POLICIES_FROM_ExtRo01 permit 10
  match ip address prefix-list EXTERNAL_GROUP_1
  set weight 999
route-map INCOMING_POLICIES_FROM_ExtRo01 permit 100
route-map INCOMING_POLICIES_FROM_ExtRo02 permit 10
  match ip address prefix-list EXTERNAL_GROUP_1
  set weight 9
route-map INCOMING_POLICIES_FROM_ExtRo02 permit 100
route-map OUTGOING_POLICIES permit 10
  match ip address prefix-list TENANT77_LOCAL
  set community 64577:9
route-map OUTGOING_POLICIES permit 100
vrf context TENANT77
  vni 10077
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
vrf context management
hardware access-list tcam region racl 512
hardware access-list tcam region arp-ether 256 double-wide


interface Vlan1
  no shutdown

interface Vlan10
  no shutdown
  vrf member TENANT77
  ip address 192.168.11.1/24
  fabric forwarding mode anycast-gateway

interface Vlan20
  no shutdown
  vrf member TENANT77
  ip address 192.168.12.1/24
  fabric forwarding mode anycast-gateway

interface Vlan77
  no shutdown
  vrf member TENANT77
  ip forward

interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback100
  member vni 10000
    suppress-arp
    mcast-group 238.0.0.10
  member vni 10077 associate-vrf
  member vni 20000
    suppress-arp
    mcast-group 238.0.0.10

interface Ethernet1/1
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/2
  no switchport
  medium p2p
  ip unnumbered loopback0
  ip ospf network point-to-point
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
  no shutdown

interface Ethernet1/3
  no switchport
  no shutdown

interface Ethernet1/3.77
  encapsulation dot1q 77
  vrf member TENANT77
  ip address 10.103.77.103/24
  no shutdown

interface Ethernet1/4
  no switchport
  no shutdown

interface Ethernet1/4.78
  encapsulation dot1q 78
  vrf member TENANT77
  ip address 10.103.78.103/24
  no shutdown


interface Ethernet1/64

interface mgmt0
  vrf member management

interface loopback0
  description ** RID/Underlay **
  ip address 192.168.0.103/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode

interface loopback77
  description ** BGP peering **
  ip address 192.168.77.103/32
  ip router ospf UNDERLAY-NET area 0.0.0.0

interface loopback100
  description ** VTEP/Overlay **
  ip address 192.168.100.103/32
  ip router ospf UNDERLAY-NET area 0.0.0.0
  ip pim sparse-mode
line console
line vty
router ospf UNDERLAY-NET
  router-id 192.168.0.103
  name-lookup
router bgp 65000
  router-id 192.168.77.103
  timers bgp 3 9
  address-family ipv4 unicast
  address-family l2vpn evpn
  neighbor 192.168.77.11
    remote-as 65000
    description ** Spine-11 BGP-RR **
    update-source loopback77
    address-family l2vpn evpn
      send-community extended
  vrf TENANT77
    address-family ipv4 unicast
      advertise l2vpn evpn
      aggregate-address 192.168.11.0/24 summary-only
    neighbor 10.103.77.2
      remote-as 64577
      description ** External Network - Ext-Ro02 **
      update-source Ethernet1/3.77
      address-family ipv4 unicast
        send-community
        send-community extended
        route-map INCOMING_POLICIES_FROM_ExtRo02 in
        route-map OUTGOING_POLICIES out
    neighbor 10.103.78.1
      remote-as 64577
      description ** External Network - Ext-Ro01 **
      update-source Ethernet1/4.78
      address-family ipv4 unicast
        send-community
        route-map INCOMING_POLICIES_FROM_ExtRo01 in
        route-map OUTGOING_POLICIES out
evpn
  vni 10000 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 20000 l2
    rd auto
    route-target import auto
    route-target export auto

Leaf-103#   
Example 8-45: Complete config LEAF-103


Ext-Ro01#sh run
Building configuration...

Current configuration : 5229 bytes
!
! Last configuration change at 16:00:23 UTC Sat Jun 2 2018
!
version 15.6
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname Ext-Ro01
!
boot-start-marker
boot-end-marker
!
no aaa new-model
!
!
!
mmi polling-interval 60
no mmi auto-configure
no mmi pvc
mmi snmp-timeout 180
!
!
ip vrf TENANT77
 rd 65077:1
 route-target export 65077:1
 route-target import 65077:1
!
!
!
!
ip cef
no ipv6 cef
!
multilink bundle-name authenticated
!
!
redundancy
!
!
interface Loopback77
 description ** BGP-RID **
 ip address 172.16.77.77 255.255.255.255
!
interface Loopback161
 description ** This Interface simulates external net 172.16.1.0/24 **
 ip vrf forwarding TENANT77
 ip address 172.16.1.1 255.255.255.0
!
interface GigabitEthernet0/0
 ip address 10.255.2.133 255.255.0.0
 shutdown
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/1
 no ip address
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/1.77
 encapsulation dot1Q 77
 ip vrf forwarding TENANT77
 ip address 10.102.77.1 255.255.255.0
 shutdown
!        
interface GigabitEthernet0/2
 no ip address
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/2.13
 encapsulation dot1Q 13
 ip vrf forwarding TENANT77
 ip address 10.1.3.1 255.255.255.0
 ip ospf 1 area 0
!
interface GigabitEthernet0/3
 no ip address
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/3.78
 encapsulation dot1Q 78
 ip vrf forwarding TENANT77
 ip address 10.103.78.1 255.255.255.0
 shutdown
!
router ospf 1 vrf TENANT77
 redistribute bgp 64577 metric 10 metric-type 1 subnets
!
router bgp 64577
 bgp router-id 172.16.77.77
 bgp log-neighbor-changes
 timers bgp 3 9
 !
 address-family ipv4
 exit-address-family
 !
 address-family ipv4 vrf TENANT77
  network 172.16.1.0 mask 255.255.255.0
  network 172.16.3.0 mask 255.255.255.0
  neighbor 10.102.77.102 remote-as 65000
  neighbor 10.102.77.102 description ** VXLAN Fabric Border Leaf-102 **
  neighbor 10.102.77.102 update-source GigabitEthernet0/1.77
  neighbor 10.102.77.102 activate
  neighbor 10.102.77.102 route-map SET_WEIGHT in
  neighbor 10.103.78.103 remote-as 65000
  neighbor 10.103.78.103 description ** VXLAN Fabric Border Leaf-103 **
  neighbor 10.103.78.103 update-source GigabitEthernet0/3.78
  neighbor 10.103.78.103 activate
  neighbor 10.103.78.103 route-map SET_WEIGHT in
 exit-address-family
!
ip default-gateway 192.168.12.1
ip forward-protocol nd
!
ip bgp-community new-format
ip community-list standard SET_WEIGHT_999 permit 64577:999
ip community-list standard SET_WEIGHT_9 permit 64577:9
!
no ip http server
no ip http secure-server
ip ssh server algorithm encryption aes128-ctr aes192-ctr aes256-ctr
ip ssh client algorithm encryption aes128-ctr aes192-ctr aes256-ctr
!
ipv6 ioam timestamp
!
route-map ADD_MED_10_to_BGP_UPDATE permit 10
 set metric 10
!
route-map ADD_MED_10_to_BGP_UPDATE permit 100
!        
route-map SET_WEIGHT permit 10
 match community SET_WEIGHT_999
 set weight 999
!
route-map SET_WEIGHT permit 100
 match community SET_WEIGHT_9
 set weight 9
!
route-map ADD_MED_100_to_BGP_UPDATE permit 10
 set metric 100
!
route-map ADD_MED_100_to_BGP_UPDATE permit 100
!
!
!
control-plane
!
!
line con 0
line aux 0
line vty 0 4
 login
 transport input none
!
no scheduler allocate
!
end

Ext-Ro01#
Example 8-46: Complete config Ext-Ro01

Ext-Ro02#sh run
Building configuration...

Current configuration : 5145 bytes
!
! Last configuration change at 16:01:27 UTC Sat Jun 2 2018
!
version 15.6
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname Ext-Ro02
!
boot-start-marker
boot-end-marker
!
!
!
no aaa new-model
!
!
!
mmi polling-interval 60
no mmi auto-configure
no mmi pvc
mmi snmp-timeout 180
!
!
!
!
!
!
!
!
ip vrf TENANT77
 rd 65077:1
 route-target export 65077:1
 route-target import 65077:1
!
!
!
!
ip cef
no ipv6 cef
!
multilink bundle-name authenticated
!
!
interface Loopback77
 description ** BGP-RID **
 ip address 172.16.77.79 255.255.255.255
!
interface GigabitEthernet0/0
 ip address 10.255.2.134 255.255.0.0
 shutdown
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/1
 no ip address
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/1.77
 encapsulation dot1Q 77
 ip vrf forwarding TENANT77
 ip address 10.103.77.2 255.255.255.0
!
interface GigabitEthernet0/2
 no ip address
 duplex auto
 speed auto
 media-type rj45
!        
interface GigabitEthernet0/2.23
 encapsulation dot1Q 23
 ip vrf forwarding TENANT77
 ip address 10.2.3.2 255.255.255.0
 ip ospf 1 area 0
!
interface GigabitEthernet0/3
 no ip address
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/3.78
 encapsulation dot1Q 78
 ip vrf forwarding TENANT77
 ip address 10.102.78.2 255.255.255.0
 shutdown
!
router ospf 1 vrf TENANT77
 redistribute bgp 64577 metric 100 metric-type 1 subnets
!
router bgp 64577
 bgp router-id 172.16.77.79
 bgp log-neighbor-changes
 timers bgp 3 9
 !
 address-family ipv4
 exit-address-family
 !
 address-family ipv4 vrf TENANT77
  network 172.16.3.0 mask 255.255.255.0
  neighbor 10.102.78.102 remote-as 65000
  neighbor 10.102.78.102 description ** VXLAN Fabric Border Leaf-102 **
  neighbor 10.102.78.102 update-source GigabitEthernet0/3.78
  neighbor 10.102.78.102 activate
  neighbor 10.102.78.102 route-map SET_WEIGHT in
  neighbor 10.103.77.103 remote-as 65000
  neighbor 10.103.77.103 description ** VXLAN Fabric Border Leaf-103 **
  neighbor 10.103.77.103 update-source GigabitEthernet0/1.77
  neighbor 10.103.77.103 activate
  neighbor 10.103.77.103 route-map SET_WEIGHT in
 exit-address-family
!
ip default-gateway 192.168.11.1
ip forward-protocol nd
!        
ip bgp-community new-format
ip community-list standard SET_WEIGHT_999 permit 64577:999
ip community-list standard SET_WEIGHT_9 permit 64577:9
!
no ip http server
no ip http secure-server
ip ssh server algorithm encryption aes128-ctr aes192-ctr aes256-ctr
ip ssh client algorithm encryption aes128-ctr aes192-ctr aes256-ctr
!
ipv6 ioam timestamp
!
route-map ADD_MED_2000_to_BGP_UPDATE permit 10
 set metric 2000
!
route-map ADD_MED_2000_to_BGP_UPDATE permit 100
!
route-map SET_WEIGHT permit 10
 match community SET_WEIGHT_999
 set weight 999
!
route-map SET_WEIGHT permit 100
 match community SET_WEIGHT_9
 set weight 9
!
route-map ADD_MED_100_to_BGP_UPDATE permit 10
 set metric 100
!
route-map ADD_MED_100_to_BGP_UPDATE permit 100
!
route-map ADD_MED_20000_to_BGP_UPDATE permit 10
 set metric 20000
!
route-map ADD_MED_20000_to_BGP_UPDATE permit 100
!
!
!
control-plane
!
C
!
line con 0
line aux 0
line vty 0 4
 login
 transport input none
!
no scheduler allocate
!
end

Ext-Ro02#
Example 8-47: Complete config Ext-Ro02


Ext-Ro03#sh run
Building configuration...

Current configuration : 3523 bytes
!
! Last configuration change at 14:43:58 UTC Sat Jun 2 2018
!
version 15.6
service timestamps debug datetime msec
service timestamps log datetime msec
no service password-encryption
!
hostname Ext-Ro03
!
boot-start-marker
boot-end-marker
!
!
!
no aaa new-model
!
!
!
mmi polling-interval 60
no mmi auto-configure
no mmi pvc
mmi snmp-timeout 180
!
!
ip vrf TENANT77
 rd 65077:1
 route-target export 65077:1
 route-target import 65077:1
!
!
!
!
ip cef
no ipv6 cef
!
multilink bundle-name authenticated
!
!        
!
!
interface Loopback163
 ip vrf forwarding TENANT77
 ip address 172.16.3.1 255.255.255.0
 ip ospf network point-to-point
 ip ospf 1 area 0
!
interface GigabitEthernet0/0
 ip address 10.255.2.135 255.255.0.0
 shutdown
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/1
 no ip address
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/1.13
 encapsulation dot1Q 13
 ip vrf forwarding TENANT77
 ip address 10.1.3.3 255.255.255.0
 ip ospf 1 area 0
!
interface GigabitEthernet0/2
 no ip address
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/2.23
 encapsulation dot1Q 23
 ip vrf forwarding TENANT77
 ip address 10.2.3.3 255.255.255.0
 ip ospf 1 area 0
!
router ospf 1 vrf TENANT77
 capability vrf-lite
!
ip forward-protocol nd
!
!
no ip http server
no ip http secure-server
ip ssh server algorithm encryption aes128-ctr aes192-ctr aes256-ctr
ip ssh client algorithm encryption aes128-ctr aes192-ctr aes256-ctr
!
ipv6 ioam timestamp
!
!        
!
control-plane
!
line con 0
line aux 0
line vty 0 4
 login
 transport input none
!
no scheduler allocate
!
end

Ext-Ro03#
Example 8-48: Complete config Ext-Ro03

Appendix 2: BGP MED Path Attribute

Figure 1 shows the BGP Update messages sent by each router concerning to network 172.16.77.0. There we can see that MED value in each update use default value 0. 


Figure 1: default MED.

If we take a look at the router R3 BGP table we can see that metric is 0 for both BGP entries.

R3#sh ip bgp
BGP table version is 5, local router ID is 10.3.4.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 * i  172.16.77.0/24   10.1.2.2                 0    100      0 65002 i
 *>                    10.3.4.4                 0             0 65002 i
Example-1: BGP table before with default MED

There is a metric value in BGP table but it is not the BGP path attribute but metric to next hop.

R3#sh ip bgp 172.16.77.0
BGP routing table entry for 172.16.77.0/24, version 5
Paths: (2 available, best #2, table default)
  Advertised to update-groups:
     2        
  Refresh Epoch 1
  65002
    10.1.2.2 (metric 2) from 10.1.3.1 (10.1.3.1)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 10.2.4.2, Cluster list: 10.1.3.1
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  65002
    10.3.4.4 from 10.3.4.4 (172.16.77.1)
      Origin IGP, metric 0, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0
Example-2: metric to next-hop.

Now I will change the BGP policy in R4 by advertising network 172.16.77.0/24 to router R3 with MED 20 (example 3).
R4#sh run | sec route-map SET-MED
 neighbor 10.3.4.3 route-map SET-MED out
route-map SET-MED permit 10
 set metric 20
Example-3: Set  MED to 20.

Now, router R3 selects the path via R2 to network 172.16.77.0/24 because of better (smaller) MED than what is received from the eBGP peer R4.

R3#sh ip bgp
BGP table version is 6, local router ID is 10.3.4.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i  172.16.77.0/24   10.1.2.2                 0    100      0 65002 i
 *                     10.3.4.4                20             0 65002 i
Example-4: BGP table on R3 after BGP policy change

R3#sh ip bgp 172.16.77.0
BGP routing table entry for 172.16.77.0/24, version 6
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     4        
  Refresh Epoch 1
  65002
    10.1.2.2 (metric 2) from 10.1.3.1 (10.1.3.1)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 10.2.4.2, Cluster list: 10.1.3.1
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 1
  65002
    10.3.4.4 from 10.3.4.4 (172.16.77.1)
      Origin IGP, metric 20, localpref 100, valid, external
      rx pathid: 0, tx pathid: 0
Example-5: BGP table on R3 after BGP policy change.

The Administrative Distance (AD) of BGP learned route depends on whether the route is learned from the internal peer (default AD 200) or from the external peer (default AD 20). Still, when comparing the route received from both internal and external BGP peer, the AD is not used as a tiebreaker in any situation. The AD is only used when comparing the same route received from two or more different routing protocols.  If the best path selection process goes up to the eBGP-iBGP comparison, the eBGP wins. This can be seen from the example 6 where I first use default AD and the change the AD of eBGP to 200 and iBGP to 19. This does not affect routing.

R3#sh ip route | i 172.16.77.0
B        172.16.77.0 [20/0] via 10.3.4.4, 00:00:45

!-------> Change the AD values

R3(config)#router bgp 65001
R3(config-router)#distance bgp 200 19 19
R3(config-router)#exit
R3(config)#exit
R3#clear ip bgp *            

*Oct  1 10:45:13.222: %BGP-5-ADJCHANGE: neighbor 10.1.3.1 Up
*Oct  1 10:45:15.743: %BGP-5-ADJCHANGE: neighbor 10.3.4.4 Up
R3#sh ip route | i 172.16.77.0
B        172.16.77.0 [200/0] via 10.3.4.4, 00:01:00


Example-6: Changing BGP AD.

Author: Toni Pasanen CCIE#28158
Published: 5-June 2018
Edited: 1-October 2018 | Toni Pasanen

-------------------------------------------------
References:

Building Data Center with VXLAN BGP EVPN – A Cisco NX-OS Perspective
ISBN-10: 1-58714-467-0 – Krattiger Lukas, Shyam Kapadia, and Jansen Davis

Integrated Routing and Bridging in EVPN         
draft-ietf-bess-evpn-inter-subnet-forwarding-03

 


32 comments:

  1. Finlay my friend , I was waiting for that post , you've covered VXLAN and simplified it magnificently , also i want to know about your next post , what it will be about ? because there are other important data center topic like ACI that i really hope you can cover them too

    ReplyDelete
    Replies
    1. Hi Mahmoud, and thanks again for your comment. There are still a couple of topics related to the VXLAN that I want to cover before moving on to other topics. The subject of the next post will be Layer 2 external connectivity where the main focus is EVPN multihoming technology.

      Delete
    2. I'm planning to take the DC CCIE exam , the new version , V2.1 , and your VXLAN posts helped me so much to understand VXLAN in depth , I really appreciate the time that you've spent creating such useful posts , thanks.

      Delete
    3. I am more than happy if my VXLAN posts helps you to achieve your goal towards CCIE my friend!

      Delete
  2. Tony, thank you so much for your detailed explanations and hard work! This has been a tough topic for me to digest and you have helped immensely! Also, I love that you are using VIRL as that is another topic that I don't see too many people using. Especially with DC topics, it's hard to get labs that are affordable. The challenge with VIRL is it's lack of ASIC support (which DC devices make vast use of). Your explanations of how the Nexus 9000v handles these things, as best it can, is also invaluable. Any chance you would be able to share your .virl files? I would love to try and reproduce what you have done and go thru each lesson step by step myself. Also, if you have time, would you be able to do a post on the integration of VXLAN and vPC? Thanks again!

    ReplyDelete
    Replies
    1. Hi David! I am on a short summer holiday at the moment and I left my laptop at home but I will try to remember to share the .virl files later. My next topic will be the vPC implementation just like you wish :) I try to find the time to start writing on next Monday. By the way, thank you from your very kind comments!

      Delete
    2. I wrote previously that the next subject will be about EVPN Multihoming but first I will write post about vPC integration...

      Delete
    3. Thanks Toni! I hope you had a great holiday! I was able to get vPC working between 3 NX-OSv 9000 switches. However, I haven't been able to get it to work between 2 9k switches and a server node. The vpc member interfaces show suspended because they are not receiving LACP PDUs. It could be my lack of knowledge in Ubuntu-land. I'm going to try and see if I can figure a way to import a Windows 10 image into VIRL to see if that works any better. Thanks again!

      Delete
    4. Hi David! The holiday was short but still great! I have noticed a same kind of problem while trying to form a port-channel from the IOSL2 switch to vPC N9000v switches on VIRL. The messages was "%EC-5-L3DONTBNDL2: Gi0/2 suspended: LACP currently not enabled on the remote port". There is a bug CSCva22545 that might (or might not) be related to this odd behavior but it does not explain why it did not work in your VIRL lab. Manual chanel configuration works fine.

      Delete
  3. Hi Toni, I think this must be related to the bug you mentioned. I was able to get a Windows 2016 Eval Server loaded up from within VIRL and saw the same behavior. LACP bonded links did not come uo, but non-LACP bonds worked OK. I have been trying to figure out the Ubuntu equivalent as it's not as much of a resource hog, but haven't been too successful yet... Thanks!

    ReplyDelete
  4. Hello Toni.. in this confg for external connect.. do you not need maximum-paths 2 for allow vxlan side multiple path load balancing.. can you explain pls

    ReplyDelete
    Replies
    1. i mean also on leaf 102 and 103 under bgp process you don't need maximum path 2 to facing ext routers?

      Delete
    2. Hi VPZ, you are absolutely right, for implementing equal cost multipath load-balancing the following commands should be added under the BGP configuration:
      --------------------
      router bgp 65000

      vrf TENANT77
      address-family ipv4 unicast
      advertise l2vpn evpn
      maximum-paths ibgp 2
      maximum-paths 2

      ---------------------------

      Delete
  5. Hi Toni, great job, all of your posts on VXLAN+EVPN+... are very enlightening. I want to point you out an inaccuracy in this post. After example 8.22 you write "Since the internal BGP has worse Administrative Distance (200) than an external BGP (20), it is only a third best route". This is not correct. The real reason why IBGP route is not the best path is because in the BGP best path selection process, after MED comparison (if applicable), BGP prefers EBGP paths over IBGP paths. It is not a matter of Administrative Distance. AD comes into play when you compare BGP best path with other advertisements of the same network from a different protocol. If BGP best path comes from an EBGP advertisement, then you use AD=20, otherwise you use 200. I do not know if you are familiar with Junos, in Junos both EGBP and IBGP advertisements have equal AD = 170 (that Junos calls Preference), so it would be impossible use AD as a tie break. I thank you for your efforts in explaining a tough topic as VXLAN+EVPN with so many details.

    ReplyDelete
    Replies
    1. I do not know why the comment appears as written by "unknown", I am the author: Tiziano Tofoni (tiziano.tofoni@ssgrr.com)

      Delete
    2. Hi Tiziano, thanks for visiting and especially for the really good comment about BGP best path selection process! Unfortunately, I do not have this specific lab running anymore, so I am not able to test MED in VXLAN environment. Instead, I build a simple, for routers sample topology and test MED there. The topology and captures, as well as show commands, are visible in new Appendix 2 at the end of the posts. The test shows that route is first selected based on eBGP - iBGP AD comparison and after MED change the MED is used.

      Delete
  6. Hi Toni, this Tiziano again, even if from another Google account. Probably we had a misunderstanding. It is not a problem of MED. Of course if you change MED as you have done in Appendix 2, everything works fine. The problem is when MED attribute is the same, as in your Example-2 of Appendix 2. What I was saying in my previous comment is that the reason why the second path (external) is best, does not depend on Adminstrative Distance, but from the BGP path decision process itself. Il you look at RFC 4271, sec. 9.1.2.2, point d) at page 81, there is written "d) If at least one of the candidate routes was received via EBGP,remove from consideration all routes that were received via IBGP." That's the real reason why in Example-2 the external path is best. It is not a matter of Administrative Distance. Thanks a lot for your attention.

    ReplyDelete
    Replies
    1. Hi Tiziano, now I got your point. You are absolutely right if the path selection process goes up to eBGP - iBGP tiebreaker, path selection process will prefer eBGP over iBGP. An AD has nothing to do when comparing updates received from internal/external BGP peers. It is used when the same route is learned from different routing protocols (e.g. BGP/OSPF), my mistake!
      I just updated the post. I will also update the Appendix 2 just to remind me for this since I am also using these posts as personal notes :)
      I really appreciate your comments and corrections, Thanks!

      Delete
    2. Hi Toni, don't worry, it was a pleasure to read your very well-written posts. And I am happy to have given a (very) small contribution.

      Delete
  7. Great work, Toni!
    EVPN is hard topic, but your articles make things much more clear, while even official config guides are lack of details. Please keep on writing, there are lack of good materials about VXLAN.

    Can you please clarify a few moments:

    1. On Ext-Ro01, 02 and 03 you defined RD and RT for VRF TENANT77, but do not use these numbers anywhere - so what for they are defined?
    2. In Example 8-44 you defined route-map SET-MED-to-Ext01, but I also could not find any references to this route-map in rest config - so what for is it?


    And one more question slightly related to the topic of this post:

    Let's assume that we have the following design:
    _____________________ _____________________________
    | | <---eBGP---> | |
    | | | |
    | EVPN fabric | <---eBGP---> | Legacy OSPF-only-network |
    | iBGP, RRS | | |
    | | | |
    |___________________| <---eBGP---> |____________________________|

    EVPN fabric has 2+ eBGP links from different border-leafs to legacy OSPF-only network to different edge routers. Edge routers do not have any BGP peerings between each other. On every edge router mutual redistribution is configured between OSPF and BGP permitting 'everything'. If I leave everything as-is, then suboptimal routing takes place and one edge router selects eBGP path to EVPN fabric, while the others select OSPF routes and in fact all traffic flows to EVPN fabric go via one path and that one edge router. After digging a bit, I found that it happens because of re-redistribution eBGP-OSPF-eBGP of the same routing entry (it's hard to provide all the debugs in one message, but if you wish, I can simulate it and provide all the outputs). Sp what I did to improve this - I set tag when when redistributing eBGP->OSPF and performed filtering when redistributing OSPF-eBGP to avoid re-redistributing routes from EVPN fabric back there again. This helped and from that moment each edge router has chosen eBGP route to EVPN fabric. My question is - is route-tag-based filtering enough to avoid sub-optimal routing and/or routing loops or should I implement similar filtering technic on EVPN side but in BGP using communities? I read some articles and book excerpts about mutual redistribution OSPF-BPG and there was said that you have to implement route tagging and filtering on OSPF side 'OR' same but using communities on BGP side. So I'm in doubt - 'OR' here really means that one of the options is enough?

    Thank you for your time!

    ReplyDelete
    Replies
    1. Sorry for weird topology, could not attach a picture.

      Delete
    2. Hi Andrei,
      Thanks for your kind words and great questions.
      RD/RT configurations on each Ext-ROs are only part of the VRF configuration and are not used for import/export policy. SET-MED-to-Ext01 is part of the old config that is not used, so you can ignore it.
      If mutual redistribution is required and cannot be avoided, I personally would use both BGP and OSPF routing loop prevention mechanisms just to make sure that routes learned from OSPF and redistributed into BGP (by Leaf-x) are not advertised back into OSPF by another Border Leaf (by Leaf-z) and vice versa.

      Delete
  8. Hello Toni,
    This is so much awesome explanation. Thank you very much for the great efforts over this blog.
    For the advertisement of the fabric VXLAN subnets/hosts to the external BGP neighbor, I have a couple of questions.

    First, I was expecting to receive 2 l2vpn evpn routes over leaf01:
    1. Type2 advertisement: with the MAC+IP of the cafe host
    2. Type5 advertisement: with the subnet (192.168.11.0/24)
    Is it correct?

    Second, for the command "advertise l2vpn evpn", does it control the advertise of received VXLAN routes to the external BGP neighbor or does it control the advertisement of the external routes to the leafs as Type5 advertisements or both?
    Is this command required only over boarder routers?

    ReplyDelete
    Replies
    1. Hi Mousa,

      Thanks for your kind comments and good questions!

      MAC-Only and MAC-IP (route-type 2) can be received either as a separate update or within one update.
      Type 5 advertisement for the VLAN specific network is only received when it is advertised to BGP (see the last comments regarding VXLAN Part XIII).
      ”Advertise l2vpn evpn” is explained in VXLAN Part example 12-7.

      Delete
  9. This comment has been removed by the author.

    ReplyDelete
  10. Hi Toni,
    Sorry, it took me sometime to compile your explanation & to check some other examples :)

    Here's what I understand so far. Please correct me if I am wrong

    1. By default, VLAN arp table entries (IP-to-MAC bindings) + VLAN learnt mac-address table entries are propagated to all VTEPs as Type2 routes. At the receiving end, Type2 routes are propagated to corresponding ip arp tables & MAC address tables by default.

    2. In order to propagate IP Subnets through the same l2vpn evpn neighborship, we should redistribute these subnets to the BGP corresponding IRB VRF, such information is propagated to all VTEPs as Type5 routes.

    3. At the receiving end, learnt Type5 routes could be installed to the BGP database under the IRB vrf as ipv4 unicast routes using the command "advertise l2vpn evpn" or otherwise it will keep there as l2vpn type5 routes (a little bit strange as I would expect it should be done by default as type2 routes)

    4. By default, routes received from eBGP neighors over the IRB VRF, are advertised back to the VTEPs as Type5 routes (as BGP used for the external connectivity) & again receiving VTEPs should be configured to pass these routes to correponding BGP IRB VRF ipv4 unicast AF using the command "advertise l2vpn evpn"

    ReplyDelete
    Replies
    1. Hi Mousa,
      The answer for the first and second questions is Yes. I wrote a post last year that describes the L2/L3 Control and Dataplane operation. It is a pretty long document but if you have time, it might be worth to read it. https://nwktimes.blogspot.com/2018/12/vxlan-part-xv-analysis-of-bgp-evpn.html
      Question 3: advertising VTEP needs the command but receiving Leaf import it based on Route-Targets.
      Question 4: If I remember correctly (I am not 100% sure) but the ”advertise l2vpn evpn” is needed when L2VPN EVPN routes are advertised to IPv4 Unicast AFI and another way around. You could test what happens if you remove the command from Leaf switches.

      Delete
  11. Hi
    Base on your lab, I have a different scenario needs to work with. I got a separate border leaf connect to External network using static routing and running EBGP with Spins.
    EBGP neighbour is running, and Spine already advertise l2vpn evpn route to B-leaf, but there is nothing in RIB on B-leaf.

    The most importantly is that I have no ideal what configuration should I put on B-Leaf in order to communicate with External network.

    ReplyDelete
    Replies
    1. If the Border-Leaf imports route into BGP table but not in the RIB, the problem might be related to Next-Hop advertised withing BGP Update by Spine. You can try to fix this by disabling next-hop path-attribute modification on Spine.
      This is the link where you can find the example configuration (example 1-24).
      https://nwktimes.blogspot.com/2019/05/vxlan-underlay-routing-part-iv-dual-as.html

      Delete
  12. I savour, result in I found exactly what I used to be taking a look for. You've ended my 4 day long hunt! God Bless you man. Have a great day. Bye

    ReplyDelete

Note: only a member of this blog may post a comment.