Wednesday 22 March 2023

Chapter 1: Azure VM networking – Virtual Filtering Platform and Accelerated Networking

 Note! This post is under the technical review

Introduction


Virtual Filtering Platform (VFP) is Microsoft’s cloud-scale software switch operating as a virtual forwarding extension within a Hyper-V basic vSwitch. The forwarding logic of the VFP uses a layered policy model based on policy rules on Match-Action Table (MAT). VFP works on a data plane, while complex control plane operations are handed over to centralized control systems. The VFP includes several layers, including VNET, NAT, ACL, and Metering layers, each with dedicated controllers that program policy rules to the MAT using southbound APIs. The first packet of the inbound/outbound data flow is processed by VFP. The process updates match-action table entries in each layer, which then are copied into the Unified Flow Table (UFT). Subsequent packets are then switched based on the flow-based action in UFT. However, if the Virtual Machine is not using Accelerated Networking (AccelNet), all packets are still forwarded over the software switch, which requires CPU cycles. Accelerated Networking reduces the host’s CPU burden and provides a higher packet rate with a more predictable jitter by switching the packet using hardware NIC yet still relaying to VFP from the traffic policy perspective.


Hyper-V Extensible Virtual Switch


Microsoft’s extensible vSwitch running on Hyper-V operates as a Networking Virtualization Service Provider (NetVSP) for Virtual Machine. VMs, in turn, are Network Virtualization Service Consumers (NetVSP). When a VM starts, it requests the Hyper-V virtualization stack to connect to the vSwitch. The virtualization stack creates a virtual Network Interface (vNIC) for the VM and associates it with the vSwitch. The vNIC is presented to the VM as a physical network adapter. The communication channel between VM and vSwitch uses a synthetic data path Virtual Machine Bus (VMBus), which provides a standardized interface for VMs to access physical resources on the host machine. It helps ensure that virtual machines have consistent performance and can access resources in a secure and isolated manner. 


Virtual Filtering Platform - VFP


A Virtual Filtering Platform (VFP) is Microsoft’s cloud-scale virtual switch operating as a virtual forwarding extension within a Hyper-V basic vSwitch. VFP sits in the data path between virtual ports facing the virtual machines and default vPort associated with physical NIC. VFP uses VM’s vPort-specific layers for filtering traffic to and from VM. A layer in the VFP is a Match-Action Table (MAT) containing policy rules programmed by independent, centralized controllers. The packet is processed through the VFP layers if it’s an exception packet, i.e., no Unified Flow entry (UF) in the Unified Flow Table (UFT), or if it’s the first packet of the flow (TCP SYN packet). When a Virtual Machine initiates a new connection, the first packet of the data flow is stored in the Received Queue (RxQ). The Parser component on VFP then takes the L2 (Ethernet), L3 (IP), and L4 (Protocol) header information as metadata, which is then processed through the layer policies in each VFP layer. The VFP layers involved in packet processing depend on the flow destination and the Azure services associated with the source/destination VM. 

VNET-to-Internet traffic from with VM using a Public IP


The metering layer measures traffic for billing. It is the first layer for VM’s outgoing traffic and the last layer for incoming traffic, i.e., it processes only the original ingress/egress packets ignoring tunnel headers and other header modifications (Azure does not charge you for overhead bytes caused by the tunnel encapsulation). Next, the ACL layer runs the metadata through the NSG policy statements. If the source/destination IP addresses (L3 header group) and protocol, source/destination ports (L4 header group) match one of the allowing policy rules, the traffic is permitted (action#1: Allow). After ACL layer processing, the routing process intercepts the metadata. Because the destination IP address in the L3 header group matches only with the default route (0.0.0.0/0, next-hop Internet), the metadata is handed over to Server Load Balancing/Network Address Translation (SLB/NAT) layer. In this example, a public IP is associated with VM’s vNIC, so the SLB/NAT layer translates the private source IP to the public IP (action#2: Source NAT). The VNet layer is bypassed if both source and destination IP addresses are from the public IP space. When the metadata is processed by each layer, the results are programmed into the Unified Flow Table (UFT). Each flow is identified with a unique Unified Flow Identifier (UFID) - hash value calculated from the flow-based 5-tuple (source/destination IP, Protocol, Source Port, Destination Port). The UFID is also associated with the actions Allow and Source NAT. The Header Transposition (HT) engine then takes the original packet from the RxQ and modifies its L2/L3/L4 header groups as described in the UFT. It changes the source private IP to public IP (Modify) and moves the packet to TxQ. The subsequent packets of the flow are modified by the HT engine based on the existing UFT entry without running related metadata through the VFP layers (slow-path to fast-path switchover). 

Besides the outbound flow entry, the VFP layer processes generate an inbound flow entry for the same connection but with reversed 5-tuple (source/destination addresses and protocol ports in reversed order) and actions (destination NAT instead of source NAT). These outbound and inbound flows are then paired and seen as a connection, enabling the Flow State Tracking process where inactive connections can be deleted from the UFT. For example, the Flow State Machine tracks the TCP RST flags. Let’s say that the destination endpoint sets the TCP RST flags to the L4 header. The TCP state machine notices it and removes the inbound flow together with its paired outbound flow from the UFT. Besides, the TCP state machine tracks the TCP FIN/FIN ACK flags and TIME_WAIT state (after TCP FIN. The connection is kept alive for max. 2 x Max Segment Lifetime to wait if there are delayed/retransmitted packets).


Intra-VNet traffic



The Metering and ACL layers on VFP process inbound/outbound flows for Intra-VNet connections in the same manner as VNet-Internet traffic. When the routing process notices that the destination Direct IP address (Customer Address space) is within the VNet CIDR range, the NAT layer is bypassed. The reason is that Intra-VNet flows use private Direct IP addresses as source and destination addresses. The Host Agent responsible for VNet layer operations, then examines the destination IP address from the L3 header group. Because this is the first packet of the flow, there is no information about the destination DIP-to-physical host mapping (location information) in the cache table. The VNet layer is responsible for providing tunnel headers to Intra-VNet traffic, so the Host Agent requests the location information from the centralized control plane. After getting the reply, it creates a MAT entry where the action part defines tunnel headers (push action). After the metadata is processed, the result is programmed into Unified Flow Table. As a result, the Header Transposition engine takes the original packet from the Received Queue, adds a tunnel header, and moves the packet to Transmit Queue.

Figure 1-1: Azure Host-Based SDN Building Blocks.