Monday 3 April 2023

Routing in Azure Subnets

Introduction

Subnets, aka Virtual Local Area Networks (VLANs) in traditional networking, are Layer-2 broadcast domains that enable attached workloads to communicate without crossing a Layer-3 boundary, the subnet Gateway. Hosts sharing the same subnet resolve each other’s MAC-IP address binding using Address Resolution Protocol, which relays on Broadcast messages. That is why we often use the Failure domain definition with subnets. We can spread subnets between physical devices over Layer-2 links using VLAN tagging, defined in the IEEE 802.1Q standard. Besides, tunnel encapsulation solutions supporting tenant/context identifier enables us to extend subnets over Layer-3 infrastructure. Virtual eXtensible LAN (VXLAN) using VXLAN Network Identifier (VNI) and Network Virtualization using Generic Route Encapsulation (NVGRE) using Tenant Network ID (TNI) are examples of Network Virtualization Over Layer 3 (NVO) solutions. If you have to spread the subnet over MPLS enabled network, you can choose to implement Virtual Private LAN (VPLS) Service or Virtual Private Wire Service (VPWS), among the other solutions.  

In Azure, the concept of a subnet is different. You can think about it as a logical domain within a Virtual Network (VNet), where attached VMs share the same IP address space and use the same shared routing policies. Broadcast and Multicast traffic is not natively supported in Azure VNet. However, you can use a cloudSwXtch VM image from swXtch.io to build a Multicast-enabled overlay network within VNet. 

Default Routing in Virtual Network

This section demonstrates how the routing between subnets within the same Virtual Network (VNet) works by default. Figure 2-1 illustrates our example Azure VNet setup where we have deployed two subnets. The interface eth0 of vm-west and interface eth1 of vm-nva-fw are attached to subnet snet-west (10.0.0.0/24), while interface eth2 of vm-nva-fw and interface eth0 of vm-west is connected to subnet snet-east (10.0.1.0/24). All three VMs use the VNet default routing policy, which routes Intra-VNet data flows directly between the source and destination endpoint, regardless of which subnets they are connected to. Besides, the Network Security Groups (NSGs) associated with vNICs share the same default security policies, which allow inbound and outbound Intra-VNet data flows, InBound flows from the Load Balancer, and OutBound Internet connections. 

Now let’s look at what happens when vm-west (DIP: 10.0.0.4) pings vm-west (DIP: 10.0.1.4), recapping the operation of VFP. Note that Accelerated Networking (AccelNet) is enabled in neither VMs.

  1. The VM vm-west sends an ICMP Request message to vm-east. The packet arrives at the Virtual Filtering Platform (VFP) for processing. Since this is the first packet of the flow, the Flow Identifier and associated Actions are not in the Unified Flow Table (UFT). The Parser component extracts the 5-tuple header information (source IP, source port, destination IP, destination port, and transport protocol) as metadata from the original packet. The metadata is then processed in each VFP layer to generate a flow-based entry in the UFT.
  2. The destination IP address matches the Network Security Group's (NSG) default outbound rule, which allows Intra-VNet flows. Then the metadata is passed on to the routing process. Since we haven't yet deployed subnet-specific route tables, the result of the next-hop route lookup is 3.3.3.3, the Provider Address (PA) of Host-C.
  3. Intra-VNet connections use private IP addresses (DIP-Direct IP), and the VFP process bypasses the NAT layer. The VNet layer, responsible for encapsulation/decapsulation, constructs tunnel headers (IP/UDP/VXLAN). It creates the outer IP address with the source IP 1.1.1.1 (Host-A) and destination IP 3.3.3.3 (Host-C), resolved by the Routing layer. Besides, it adds Virtual Network Identifier (VNI) into the VXLAN header.
  4. After each layer has processed the metadata, the result is encoded to Unified Flow Table (UFT) with Flow-Id with push action (Encapsulation). 
  5. The Header Transposition engine (HT) modifies the original packet based on the UFT actions. It adds tunnel headers leaving all original header information intact. Finally, the modified packet is transmitted to the upstream switch. The subsequent packets are forwarded based on the UFT.
  6. The Azure switching infra forwards the packet based on the destination IP address on the outer IP header (tunnel header).
  7. The VFP on Host-C processes the ingress ICMP Request message in the same manner as VFP in Host-A but in reversed order starting with decapsulation in the VNet layer.

Figure 2-1: Example Topology Diagram.

Traceroute taken from vm-west to vm-east verifies that from the virtual machine's perspective, there are no additional IP hops in between.

azureuser@vm-west:~$ traceroute 10.0.1.5
traceroute to 10.0.1.5 (10.0.1.5), 30 hops max, 60 byte packets
 1  10.0.1.5 (10.0.1.5)  1.220 ms  1.201 ms  1.187 ms
azureuser@vm-west:~$

Example 2-1: Traceroute from vm-west to vm-east.

Examples 2-2 (Azure CLI) and 2-3 (PowerShell) show effective routing entries on vNIC vm-west747. There are only next-hops for Intra-Vnet and Internet traffic.


az network nic show-effective-route-table `
 -g rg-nwkt-demo `
 -n vm-west747  `
 -o table 

Source    State    Address Prefix    Next Hop Type     Next Hop IP
--------  -------  ----------------  ----------------  -------------
Default   Active   10.0.0.0/16       VnetLocal
Default   Active   0.0.0.0/0         Internet
**Azure routes snipped for brevity**

Example 2-2: Effective Routes on vNIC vm-west747 – Azure CLI.


Get-azEffectiveRouteTable -NetworkInterfaceName vm-west747 -ResourceGroupName rg-nwkt-demo  | Format-Table -Property AddressPrefix, NextHopType, NextHopIpAddress -AutoSize

AddressPrefix    NextHopType      NextHopIpAddress
-------------    -----------      ----------------
{10.0.0.0/16}    VnetLocal        {}
{0.0.0.0/0}      Internet         {}
**Azure routes snipped for brevity**

Example 2-3: Effective Routes on vNIC vm-west747 - PowerShell.

Route Traffic through the Network Virtual Appliance (NVA)



This section explains how you can deploy subnet-specific Routing Tables (RT) with User Defined Routes (UDR) in order to forward data flow through your Network Virtual Appliance (NVA). Figure 2-2 shows the first two steps to route data packets between vm-west and vm-east through the vm-nva-fw. We start by creating a new route table named rt-west and adding a route to the subnet 10.0.1.0/24 with the next-hop IP 10.0.0.5 (interface eth1 of vm-nva-fw). Then we associate the subnet snet-west with the routing table rt-west. The new route is now used by all interfaces attached to snet-west. Besides, we deploy a new route table named rt-east with a routing entry to the subnet 10.0.0.0/24 with the next-hop IP 10.0.1.5 (interface eth2 of vm-nva-fw). Then we associate the subnet snet-east with it.

Figure 2-2: Routing Data Flows through the NVA.


Create Route Table


Create a new route table using the Azure CLI command “az network route-table create”. Then, name it and attach it to the resource group. You can give the command in one line or use the accent symbol (`) to break the command into multiple lines for readability. After the last command, press the Enter button without an accent mark to deploy the new route table.

az network route-table create `
  --resource-group rg-nwkt-demo `
  --name rt-west

Example 2-4: Create Route Table. 

Once the routing table is provisioned, you get a confirmation about the successful deployment process.                                                                                                 
{
  "disableBgpRoutePropagation": false,
  "etag": "W/\"ee9ae1b2-53d3-4e95-afe0-9a3d0ab804d1\"",
  "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Network/routeTables/rt-west",
  "location": "swedencentral",
  "name": "rt-west",
  "provisioningState": "Succeeded",
  "resourceGroup": "rg-nwkt-demo",
  "resourceGuid": "9638aff9-acf1-4775-9d00-28d7b492a76b",
  "routes": [],
  "type": "Microsoft.Network/routeTables"
}

Example 2-5: Feedback about Provisioning State – Route Table.

Add Routing Entry to Route Table


Use the Azure CLI command “az network route-table route create” to add a routing entry to the route table. You need to specify the name of the route, the resource group, and the route table where you are deploying it. Besides, define the prefix, next-hop type, and next-hop IP address.


az network route-table route create `
  --name rt-west-vm-nva-fw `
  --resource-group rg-nwkt-demo `
  --route-table-name rt-west `
  --address-prefix 10.0.1.0/24 `
  --next-hop-type VirtualAppliance `
  --next-hop-ip-address 10.0.0.5

Example 2-6: Add Route to Route Table. 

The example below shows the confirmation you get after the successful deployment.

{
  "addressPrefix": "10.0.1.0/24",
  "etag": "W/\"5581a5cc-1ecd-4f80-bae9-577be73c7b4a\"",
  "hasBgpOverride": false,
  "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Network/routeTables/rt-west/routes/rt-west-vm-nva-fw",
  "name": "rt-west-vm-nva-fw",
  "nextHopIpAddress": "10.0.0.5",
  "nextHopType": "VirtualAppliance",
  "provisioningState": "Succeeded",
  "resourceGroup": "rg-nwkt-demo",
  "type": "Microsoft.Network/routeTables/routes"
}

Example 2-7: Feedback about Provisioning State – New Routing Entry.

Associate Route Table with Subnet


Use the Azure CLI command “az network vnet subnet update” to associate a new route table with the subnet. Define the VNet and the target subnet in which you want to associate the route table among the resource group information.


az network vnet subnet update `
--vnet-name vnet-nwkt `
--name snet-west `
--resource-group rg-nwkt-demo `
--route-table rt-west 

Example 2-8: Associate Subnet to Route Table. 


The example below shows the confirmation you get after the successful deployment.
                                                                                

{
  "addressPrefix": "10.0.0.0/24",
  "delegations": [],
  "etag": "W/\"1962eb46-d13d-449d-acf8-9b4bed8041b1\"",
  "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Network/virtualNetworks/vnet-nwkt/subnets/snet-west",
  "ipConfigurations": [
    {
      "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Network/networkInterfaces/vm-west747/ipConfigurations/ipconfig1",
      "resourceGroup": "rg-nwkt-demo"
    },
    {
      "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Network/networkInterfaces/vm-nva-nic-west/ipConfigurations/ipconfig1",
      "resourceGroup": "rg-nwkt-demo"
    }
  ],
  "name": "snet-west",
  "privateEndpointNetworkPolicies": "Disabled",
  "privateLinkServiceNetworkPolicies": "Enabled",
  "provisioningState": "Succeeded",
  "resourceGroup": "rg-nwkt-demo",
  "routeTable": {
    "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Network/routeTables/rt-west",
    "resourceGroup": "rg-nwkt-demo"
  },
  "serviceEndpoints": [],
  "type": "Microsoft.Network/virtualNetworks/subnets"
}

Example 2-9: Feedback about Provisioning State – Subnet to Route Table Association.

Once you have associated subnet snet-west with the route table rt-west, vNICs vm-west747 (vm-west) and vm-nva-nic-west (vm-nva-fw) will start using the route. To confirm which routes are effective in vNIC, use the Azure CLI command "az network nic show-effective routes".

az network nic show-effective-route-table `
  -g rg-nwkt-demo `
  -n vm-west747 `
  -o table 

Source    State    Address Prefix    Next Hop Type     Next Hop IP
--------  -------  ----------------  ----------------  -------------
Default   Active   10.0.0.0/16       VnetLocal
Default   Active   0.0.0.0/0         Internet
User      Active   10.0.1.0/24       VirtualAppliance  10.0.0.5

Example 2-10: vNIC Effective Routes Verification.

Enable IP Forwarding

At this stage, we have created two route tables, rt-west and rt-east, which describe how to forward traffic between subnets snet-west and snet-east through the vm-nva-fw. Next, we enable IP forwarding on the vNICs attached to vm-nva-fw, allowing them to forward data packets whose destination is not the vNIC itself. In addition, we need to enable IP forwarding on Linux to allow the system to act as a router, forwarding network traffic from one network interface to another. By default, IP forwarding is disabled on Linux to prevent the system from accidentally or maliciously becoming a router.



Figure 2-3: Enable IP Forwarding and Update NVA Route Table.


To enable IP Forwarding on a vNIC, you can use the Azure CLI command “az network nic” update.

az network nic update `
  --name vm-nva-nic-west ` 
  --resource-group rg-nwkt-demo ` 
  --ip-forwarding true

Example 2-11: Enable IP Forwarding on vNIC. 

The example below shows the confirmation you get after enabling the IP forwarding.

{
  "dnsSettings": {
    "appliedDnsServers": [],
    "dnsServers": [],
    "internalDomainNameSuffix": "zqbofd2lfdau1a0inekbgorlfc.gvxx.internal.cloudapp.net"
  },
  "enableAcceleratedNetworking": false,
  "enableIPForwarding": true,
  "etag": "W/\"b1ca1d1d-af70-4af8-8cb3-c9844a7fb72b\"",
  "hostedWorkloads": [],
  "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Network/networkInterfaces/vm-nva-nic-west",
  "ipConfigurations": [
    {
      "etag": "W/\"b1ca1d1d-af70-4af8-8cb3-c9844a7fb72b\"",
      "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Network/networkInterfaces/vm-nva-nic-west/ipConfigurations/ipconfig1",
      "name": "ipconfig1",
      "primary": true,
      "privateIPAddress": "10.0.0.5",
      "privateIPAddressVersion": "IPv4",
      "privateIPAllocationMethod": "Dynamic",
      "provisioningState": "Succeeded",
      "resourceGroup": "rg-nwkt-demo",
      "subnet": {
        "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Network/virtualNetworks/vnet-nwkt/subnets/snet-west",
        "resourceGroup": "rg-nwkt-demo"
      },
      "type": "Microsoft.Network/networkInterfaces/ipConfigurations"
    }
  ],
  "location": "swedencentral",
  "macAddress": "60-45-BD-AB-C6-07",
  "name": "vm-nva-nic-west",
  "networkSecurityGroup": {
    "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Network/networkSecurityGroups/basicNsgvm-nva-nic-west",
    "resourceGroup": "rg-nwkt-demo"
  },
  "nicType": "Standard",
  "primary": false,
  "provisioningState": "Succeeded",
  "resourceGroup": "rg-nwkt-demo",
  "resourceGuid": "244d3323-1ec4-40cd-8963-a52ce1c913dd",
  "tapConfigurations": [],
  "type": "Microsoft.Network/networkInterfaces",
  "virtualMachine": {
    "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Compute/virtualMachines/vm-nva-fw",
    "resourceGroup": "rg-nwkt-demo"
  },
  "vnetEncryptionSupported": false
}

Example 2-12: Feedback about Provisioning State – vNIC Update. 

Enable IP Forwarding on Linux NVA


By changing the value of net.ipv4.ip_forwarding entry in the sysctl file, you can alter the IP Forwarding behaviour on Linux. The example below shows that IP Forwarding is disabled by default.


azureuser@vm-nva-fw:~$ sudo sysctl -a |grep ip_forward
net.ipv4.ip_forward = 0
net.ipv4.ip_forward_update_priority = 1
net.ipv4.ip_forward_use_pmtu = 0

Example 2-13: Verify the State of the “IP-Forward” on Linux IP. 

For enabling IP Forwarding, use the Linux CLI command “sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward". The echo command sets the IP Forward value 1 to the file located at /proc/sys/net/ipv4/ip_forward. 


azureuser@vm-nva-fw:~$ sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward"

azureuser@vm-nva-fw:~$ sudo sysctl -a |grep ip_forward
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_update_priority = 1
net.ipv4.ip_forward_use_pmtu = 0

Example 2-14: Enable “IP-Forward” using Linux Shell. 

Alternatively, you can use the Azure CLI for configuring Linux using the command “az vm extension set”. The example below shows how you can enable IP Forwarding.


az vm extension set `
  --resource-group rg-nwkt-demo `
  --vm-name vm-nva-fw `
  --name customScript `
  --publisher Microsoft.Azure.Extensions ` 
  --settings '{"commandToExecute":"sudo sysctl -w net.ipv4.ip_forward=1"}'

Example 2-16: Enable “IP-Forward” using Azure CLI. 

The example 2-17 shows the confirmation you get after enabling the IP forwarding on Linux using Azure CLI.

{
  "autoUpgradeMinorVersion": true,
  "enableAutomaticUpgrade": null,
  "forceUpdateTag": null,
  "id": "/subscriptions/**snipped**/resourceGroups/rg-nwkt-demo/providers/Microsoft.Compute/virtualMachines/vm-nva-fw/extensions/customScript",
  "instanceView": null,
  "location": "swedencentral",
  "name": "customScript",
  "protectedSettings": null,
  "protectedSettingsFromKeyVault": null,
  "provisioningState": "Succeeded",
  "publisher": "Microsoft.Azure.Extensions",
  "resourceGroup": "rg-nwkt-demo",
  "settings": {
    "commandToExecute": "sudo sysctl -w net.ipv4.ip_forward=1"
  },
  "suppressFailures": null,
  "tags": null,
  "type": "Microsoft.Compute/virtualMachines/extensions",
  "typeHandlerVersion": "2.1",
  "typePropertiesType": "customScript"
}
Example 2-17: Feedback about Provisioning State – IP Forward.

Data Plane testing


Ping process in example 2-20 shows that we have IP connectivity between vm-west and vm-east. 

azureuser@vm-west:~$ ping 10.0.1.4 -c2
PING 10.0.1.4 (10.0.1.4) 56(84) bytes of data.
64 bytes from 10.0.1.4: icmp_seq=1 ttl=63 time=3.22 ms
64 bytes from 10.0.1.4: icmp_seq=2 ttl=63 time=3.16 ms

--- 10.0.1.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 3.157/3.190/3.224/0.033 ms

Example 2-18: IP Connectivity Test Using ICMP. 

The tcpdump taken from vm-nva-fw proves that the data packets go through the vm-nva-fw.

azureuser@vm-nva-fw:~$ sudo tcpdump -i eth1 icmp -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
14:15:29.954539 IP 10.0.0.4 > 10.0.1.4: ICMP echo request, id 41, seq 1, length 64
14:15:29.956493 IP 10.0.1.4 > 10.0.0.4: ICMP echo reply, id 41, seq 1, length 64
14:15:30.955740 IP 10.0.0.4 > 10.0.1.4: ICMP echo request, id 41, seq 2, length 64
14:15:30.957072 IP 10.0.1.4 > 10.0.0.4: ICMP echo reply, id 41, seq 2, length 64
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel

Example 2-19: TCP Dump from Linux NVA. 

Besides, the traceroute from vm-west to vm-east in example 2-22 verifies that the routing works as expected.

azureuser@vm-west:~$ traceroute 10.0.1.4
traceroute to 10.0.1.4 (10.0.1.4), 30 hops max, 60 byte packets
 1  10.0.0.5 (10.0.0.5)  2.384 ms  2.366 ms  2.447 ms
 2  vm-east.internal.cloudapp.net (10.0.1.4)  3.556 ms *  3.531 ms

Example 2-22: Data Path Verification using Traceroute. 

Figure 2-4 shows the data path from vm-west to vm-east after we have deployed vm-nva-fw in-between subnets.


Figure 2-4:
Packet Walk After UDR/NVA deployment.

References



[1] Create, change, or delete a network interface 
https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-network-interface?tabs=network-interface-portal, March 8, 2023

[2] Do VNets support multicast or broadcast?
https://learn.microsoft.com/en-us/azure/virtual-network/virtual-networks-faq#do-vnets-support-multicast-or-broadcast, March 2023

[3] What protocols can I use within VNets?
https://learn.microsoft.com/en-us/azure/virtual-network/virtual-networks-faq#what-protocols-can-i-use-within-vnets March 2023

[4] cloudSwXtch VM Image
https://azuremarketplace.microsoft.com/en-us/marketplace/apps/swxtchiollc1614108926893.cloudswxtch-vm-001?exp=ubp8&tab=Overview 

[5] Route network traffic with a route table using the Azure CLI
https://learn.microsoft.com/en-us/azure/virtual-network/tutorial-create-route-table-cli, February 11, 2023

[RFC 8014] D. Black et al., “An Architecture for Data-Center Network Virtualization over Layer 3 (NVO3)”, RFC 8014, December 2016.