Tuesday, 13 June 2023

NVA Part V: NVA Redundancy with Azure Internal Load Balancer - On-Prem Connec

 Introduction


In Chapter Five, we deployed an internal load balancer (ILB) in the vnet-hub. It was attached to the subnet 10.0.0.0/24, where it obtained the frontend IP (FIP) 10.0.1.6. Next, we created a backend pool and associated our NVAs with it. Finally, we bound the frontend IP 10.0.1.6 to the backend pool to complete the ILB setup.


Next, in vnet-spoke1, we created a route table called rt-spoke1. This route table contained a user-defined route (UDR) for 10.2.0.0/24 (vnet-spoke2) with the next-hop set as 10.0.1.6. We attached this route table to the subnet 10.1.0.0/24. Similarly, in vnet-spoke2, we implemented a user-defined route for 10.1.0.0/24 (vnet-spoke1). By configuring these UDRs, we ensured that the spoke-to-spoke traffic would pass through the ILB and one of the NVAs on vnet-hub. Note that in this design, the Virtual Network Gateway is not required for spoke-to-spoke traffic.


In this chapter, we will add a Virtual Network Gateway (VGW) into the topology and establish an IPsec VPN connection between the on-premises network edge router and VGW. Additionally, we will deploy a new route table called "rt-gw-snet" where we add routing entries to the spoke VNets with the next-hop IP address 10.0.1.6 (ILB's frontend IP). Besides, we will add a routing entry 10.3.0.0/16 > 10.0.1.6 into the existing route tables on vnet-spoke-1 and vnet-spoke-2 (not shown in figure 6-1). This configuration will ensure that the spoke to spoke and spoke to on-prem flows are directed through one of the Network Virtual Appliances (NVAs) via ILB. The NVAs use the default route table, where the VGW propagates all the routes learned from VPN peers. However, we do not propagate routes from the default route table into the "rt-gw-snet" and "rt-prod-1" route tables. To enable the spoke VNets to use the VGW on the hub VNet, we allow it in VNet peering configurations.


  1. The administrator of the mgmt-pc opens an SSH session to vm-prod-1. The connection initiation begins with the TCP three-way handshake. The TCP SYN message is transmitted over the VPN connection to the Virtual Gateway (VGW) located on the vnet-hub. Upon receiving the message, the VGW first decrypts it and performs a routing lookup. The destination IP address, 10.1.0.4, matches the highlighted routing entry in the route table rt-gw-snet.
  2. The VGW determines the location (the IP address of the hosting server) of 10.1.0.6, encapsulates the message with tunnel headers, and forwards it to an Internal Load Balancer (ILB) using the destination IP address 10.1.0.6 in the tunnel header.
  3. The Internal Load Balancer receives the TCP SYN message. As the destination IP address in the tunnel header matches one of its frontend IPs, the ILB decapsulates the packet. It then checks which backend pool (BEP) is associated with the frontend IP (FIP) 10.0.1.6 to determine to which VMs it can forward the TCP SYN message. Using a hash algorithm (in our example, the 5-tuple), the ILB selects a VM from the backend pool members, in this case, NVA2. The ILB performs a location lookup for the IP address 10.1.0.5, encapsulates the TCP SYN message with tunnel headers, and finally sends it to NVA2.
  4. The message reaches the hosting server of NVA2, which removes the encapsulation since the destination IP in the tunnel header belongs to itself. Based on the Syn flag set in the TCP header, the packet is identified as the first packet of the flow. Since this is the initial packet of the flow, there is no flow entry programmed into the Generic Flow Table (GFT) related to this connection. The parser component generates a metadata file from the L3 and L4 headers of the message, which then is processed by the Virtual Filtering Platform (VFP) layers associated with NVA2. Following the VFP processing, the TCP SYN message is passed to NVA2, and the GFT is updated with flow information and associated actions (Allow and Encapsulation instructions). Besides, the VFP process creates a corresponding entry for the return packets into the GFT (reversed source and destination IPs and ports). Please refer to the first chapter for more detailed information on VFP processes.
  5. We do not have any pre-routing or post-routing policies configured on either NVA. As a result, NVA2 simply routes the traffic out of the eth0 interface based on its routing table. The ingress TCP SYN message has already been processed by the VFP layers, and the GFT has been updated accordingly. Consequently, the egress packet can be forwarded based on the GFT without the need for additional processing by the VFP layers.
  6. Subsequently, the encapsulated TCP SYN message is transmitted over VNet peering to vm-prod-1, located on vnet-spoke-1. Upon reaching the hosting server of vm-prod-1, the packet is processed in a similar manner as we observed with NVA. The encapsulation is removed, and the packet undergoes the same VFP processing steps as before.


Figure 6-1: ILB Example Topology.


Packet Walk: SSH Session Initiation – TCP SYN-ACK



Vm-prod sends a TCP SYN-ACK message in response to the SYN message received from "mgmt-pc. The TCP SYN-ACK message is transmitted from the 'Vm-prod' virtual machine through its 'Ethernet0' interface to the Virtual NIC. Since the incoming TCP SYN message triggers the creation of a GFT entry, the TCP SYN-ACK message can be forwarded without the need for Virtual Filtering Platform (VFP) processes. The TCP SYN-ACK message is encapsulated within the VXLAN tunnel header, with the outer destination IP address set to 10.0.1.6 (the Internal Load Balancer's Frontend IP address). After a route table lookup, the encapsulated message is sent over the VNet peering connection to the Internal Load Balancer (ILB).

The Internal Load Balancer processes the TCP SYN-ACK message in the same manner as it did with the TCP SYN message and forwards it to NVA2.

When the incoming TCP SYN message is received, it generates a reversed flow entry besides the actual flow entry in the Generic Flow Table (GFT), allowing the TCP SYN-ACK message to be forwarded without requiring Virtual Filtering Platform (VFP) processes. After the route table lookup, the TCP SYN-ACK packet is sent toward the VGW that, in turn, encrypts the message and sends it to the on-prem VPN peer. 


Figure 6-2: ICMP Reply from vm-prod-1 (vnet-spoke-1) to mgmt-pc.


Configuration and Verification


Figure 6-3 shows how the route to on-prem network 10.3.0.0/16 changes in the route table rt-spoke1 when we add a UDR and disable route propagation from the default route table. 

The first example shows the situation where we haven't added a UDR, and route propagation from the default route table is allowed. The network 10.3.0.0/16 has learned from the Virtual Network Gateway, and the next hop IP address is VGW's public IP. 

The second route table example shows the effect after adding a User Defined Route (UDR) on-prem-10.3.0.0/216 into the route table. The state of the UDR route is Active, while the propagated route is Invalidated. 

The last route table output shows that when disabling route propagation on route table rt-spoke1, there is only a UDR route towards the on-prem network 10.3.0.0/24.


Figure 6-3: Route Changes in Route Table rt-spoke1 #1.


Figure 6-4 demonstrates how we can forward part of the on-prem traffic via ILB with UDR while allowing route propagation from the default route table. In our example, only the traffic to mgmt-pc (10.3.0.4/32) is forwarded via ILB-NVA, while all other traffic is sent straight to VGW. 


Figure 6-4: Route Changes in Route Table rt-spoke1 #2.

If you need to disable a route propagation and still route some connections via NVA and some straight to VGW, you can add a UDR with the next hop type Virtual Network Gateway.


Figure 6-5: Route Changes in Route Table rt-spoke1 #3.

Figures 6-6 and 6-7 on the next page show the effective routes used by NVA1 and NVA2. Both NVAs use the default route table where the VGW has installed the route to the on-prem network 10.3.0.0/16


Figure 6-6: Effective Routes on NVA1 (vNIC nva1388).



Figure 6-7: Effective Routes on NVA2 (vNIC nva2205).

Figure 6-8 shows the UDRs we have added to the route table rt-vgw.


Figure 6-8: UDR in the Route Table rt-vgw.

To enable vnet-spoke1 and vnet-spoke2 to utilize the Virtual Network Gateway (VGW) on vnet-hub for connecting to the on-premises network, we allow its usage in the configurations of the hub and spoke VNets peer link (refer to Figure 6-9).


Figure 6-9: VNet Peering Configuration.


Data Plane Testing Using Ping



The following four examples verify that the data paths between the on-prem subnet 10.3.0.0/16 and spoke VNets go through the Network Virtual Appliances (NVA). The Internal Load Balancer (ILB) directs ICMP Request and Reply packets between the mgmt-pc (10.3.0.4) and vm-prod-1 (10.1.0.4) to NVA2 (Example 6-3), while the data path between mgmt-pc and vm-spoke-2 (10.2.0.4) goes through NVA1 (Example 6-4). 

azureuser@mgmt-pc:~$ ping 10.1.0.4 -c2
PING 10.1.0.4 (10.1.0.4) 56(84) bytes of data.
64 bytes from 10.1.0.4: icmp_seq=1 ttl=63 time=6.03 ms
64 bytes from 10.1.0.4: icmp_seq=2 ttl=63 time=9.12 ms

--- 10.1.0.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 6.028/7.575/9.122/1.547 ms
azureuser@mgmt-pc:~$ 
Example 6-1: Ping from mgmt-pc to vm-prod-1 – After Routing Changes.

azureuser@mgmt-pc:~$ ping 10.2.0.4 -c2
PING 10.2.0.4 (10.2.0.4) 56(84) bytes of data.
64 bytes from 10.2.0.4: icmp_seq=1 ttl=63 time=6.72 ms
64 bytes from 10.2.0.4: icmp_seq=2 ttl=63 time=6.35 ms

--- 10.2.0.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 6.347/6.532/6.717/0.185 ms
azureuser@mgmt-pc:~$ 
Example 6-2: Ping from mgmt-pc to vm-prod-2 – After Routing Changes.

azureuser@nva2:~$ sudo tcpdump -i eth0 host 10.1.0.4 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:28:53.379296 IP 10.3.0.4 > 10.1.0.4: ICMP echo request, id 19, seq 1, length 64
14:28:53.379333 IP 10.3.0.4 > 10.1.0.4: ICMP echo request, id 19, seq 1, length 64
14:28:53.380698 IP 10.1.0.4 > 10.3.0.4: ICMP echo reply, id 19, seq 1, length 64
14:28:53.380707 IP 10.1.0.4 > 10.3.0.4: ICMP echo reply, id 19, seq 1, length 64
14:28:54.380462 IP 10.3.0.4 > 10.1.0.4: ICMP echo request, id 19, seq 2, length 64
14:28:54.380497 IP 10.3.0.4 > 10.1.0.4: ICMP echo request, id 19, seq 2, length 64
14:28:54.383143 IP 10.1.0.4 > 10.3.0.4: ICMP echo reply, id 19, seq 2, length 64
14:28:54.383153 IP 10.1.0.4 > 10.3.0.4: ICMP echo reply, id 19, seq 2, length 64
^C
8 packets captured
8 packets received by filter
0 packets dropped by kernel
azureuser@nva2:~$ 
Example 6-3: Tcpdump from NVA2.
azureuser@nva1:~$ sudo tcpdump -i eth0 host 10.2.0.4 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:47:37.812395 IP 10.3.0.4 > 10.2.0.4: ICMP echo request, id 34, seq 1, length 64
14:47:37.812430 IP 10.3.0.4 > 10.2.0.4: ICMP echo request, id 34, seq 1, length 64
14:47:37.814140 IP 10.2.0.4 > 10.3.0.4: ICMP echo reply, id 34, seq 1, length 64
14:47:37.814155 IP 10.2.0.4 > 10.3.0.4: ICMP echo reply, id 34, seq 1, length 64
14:47:38.814187 IP 10.3.0.4 > 10.2.0.4: ICMP echo request, id 34, seq 2, length 64
14:47:38.814219 IP 10.3.0.4 > 10.2.0.4: ICMP echo request, id 34, seq 2, length 64
14:47:38.815583 IP 10.2.0.4 > 10.3.0.4: ICMP echo reply, id 34, seq 2, length 64
14:47:38.815591 IP 10.2.0.4 > 10.3.0.4: ICMP echo reply, id 34, seq 2, length 64
^C
8 packets captured
8 packets received by filter
0 packets dropped by kernel
azureuser@nva1:~$ 
Example 6-4: Tcpdump from NVA1.

The following two examples verify that the data paths between vnet-spoke-1 and vnet-spoke-2 go through the NVA1. 

azureuser@vm-prod-1:~$ ping 10.2.0.4  -c2
PING 10.2.0.4 (10.2.0.4) 56(84) bytes of data.
64 bytes from 10.2.0.4: icmp_seq=1 ttl=63 time=3.16 ms
64 bytes from 10.2.0.4: icmp_seq=2 ttl=63 time=2.42 ms

--- 10.2.0.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 2.417/2.790/3.164/0.373 ms
azureuser@vm-prod-1:~$
Example 6-5: Ping from vm-prod-1 to vm-prod-2 on vnet-spoke-2.

azureuser@nva1:~$ sudo tcpdump -i eth0 host 10.1.0.4 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:02:10.216219 IP 10.1.0.4 > 10.2.0.4: ICMP echo request, id 6, seq 1, length 64
15:02:10.216260 IP 10.1.0.4 > 10.2.0.4: ICMP echo request, id 6, seq 1, length 64
15:02:10.217646 IP 10.2.0.4 > 10.1.0.4: ICMP echo reply, id 6, seq 1, length 64
15:02:10.217656 IP 10.2.0.4 > 10.1.0.4: ICMP echo reply, id 6, seq 1, length 64
15:02:11.217820 IP 10.1.0.4 > 10.2.0.4: ICMP echo request, id 6, seq 2, length 64
15:02:11.217854 IP 10.1.0.4 > 10.2.0.4: ICMP echo request, id 6, seq 2, length 64
15:02:11.218643 IP 10.2.0.4 > 10.1.0.4: ICMP echo reply, id 6, seq 2, length 64
15:02:11.218652 IP 10.2.0.4 > 10.1.0.4: ICMP echo reply, id 6, seq 2, length 64
^C
8 packets captured
8 packets received by filter
0 packets dropped by kernel
azureuser@nva1:~$
Example 6-6: Tcpdump from NVA1.

 

References



[1] Daniel Firestone et al., “VFP: A Virtual Switch Platform for Host SDN in the Public Cloud”, 2017

[2] Deploy highly available NVAs
https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/dmz/nva-ha



No comments:

Post a Comment