The Network Times: Azure Networking: Cloud Scale Load Balancing

Introduction

During the load balancer deployment process, we define a virtual IP (a.k.a front-end IP) for our published service. As a next step, we create a backend (BE) pool to which we attach Virtual Machines using either their associated vNIC or Direct IP (DIP). Then, we bind the VIP to BE using an Inbound rule. Besides, in this phase, we create and associate health probes with inbound rules for monitoring VM's service availability. If VMs in the backend pool also initiate outbound connections, we build an outbound policy, which states the source Network Address Translation (SNAT) rule (DIP, src port > VIP, src port).

This chapter provides an overview of the components of the Azure load balancer service: Centralized SDN Controller, Virtual Load balancer pools, and Host Agents. In this chapter, we discuss control plane and data plane operation.

Management & Control Plane – External Connections

Figure 20-1 depicts our example diagram. The top-most box, Loadbalancer deployment, shows our LB settings. We intend to forward HTTP traffic from the Internet to VIP 1.2.3.4 to either DIP 10.0.0.4 (vm-beetle) or DIP 10.0.0.5 (vm-bailey). The health probe associated with the inbound rule uses TCP port 80 for availability check.

The Azure LB service control plane operation is implemented into a centralized, highly available SDN controller. The system consists of several controller instances, one of which is elected as an active instance. When the active controller receives our LB configuration, it distributes it to other replicas. The active controller creates VIP to DIP mapping entries with the destination protocol/port combination and programs them with the VIP to load balancers in a load balancer pool. Besides programming the load balancers, the SDN controller monitors their health.

The load balancer pool consists of several instances, which all advertise their configured VIPs to upstream routers via BGP, defining itself as a next hop. When one of the upstream routers receives an ingress packet to VIP, it uses Equal Cost Multi-Path (ECMP) for selecting the next hop load balancer. Therefore, flow-based packets may not end up with the same load balancer. However, this is not a problem. Load balancer units use the same hashing algorithm for selecting the DIP from the BE members. Therefore, they all choose the same DIP. Upstream routers and load balancers also use BGP as a failure detection. When the load balancer goes out of service, the BGP peering goes down. As a reaction, upstream routers exclude the failed load balancer from the ECMP process.

The Host Agent (HA) is the third piece of the load balancer service puzzle. SDN controller sends the VIP to DIP destination NAT (DNAT) policies to HA, which programs them to the Virtual Filtering Platform’s (VFP) NAT/SLB Layer. Besides, it monitors the BE member VM’s availability using our configured health probe. When the service in VM stops responding, the host agent reports to the SDN controller, which removes the failed DIP from the load balancers VIP to DIP mapping table. HA also adjusts a Network Security Group (NSG) rule in the VFP Security layer to allow monitoring traffic.

Figure 20-1: Cloud Scale LB Management & Control Plane Operation.

Data Plane - External Connections

Figure 20-2 depicts the data plane processes when an external host starts the TCP three-way handshake process with VIP 1.2.3.4. I have excluded redundant components from the figure to keep it simple. The data packet with the TCP SYN flag set arrives at Edge router Ro-1. It has two equal-cost next hops for 1.2.3.4 installed into the RIB. Therefore, it runs a channel hashing algorithm and selects the next hop via 10.1.1.1 (LB-1). Next, it forwards the packet toward LB-1.

Based on the TCP SYN flag, LB-1 notices that this is a new data flow. The destination IP address and transport layer information (TCP/80) match the inbound rule programmed to its VIP mapping table. The hashing algorithm result, calculated from 5-tuple, for this packet selects the DIP 10.0.0.4. After choosing the destination DIP, the load balancer updates its flow table (not shown in the figure). It leaves the original packet intact and adds tunnel headers (IP/UDP/VXLAN) using its IP address as a source and the DIP as a destination address. Note, the load balancers check ingress non-SYN and UDP packets first against the flow table. If no matching entries are found, packets are processed via the VIP-to-DIP mapping table.

The encapsulated packet arrives at the host. The tenant is identified based on the Virtual Network Identifier (VNI) carried in the VXLAN header. The VNet layer decapsulates the packet and based on the SYN flag in the TCP header, it is recognized as the first packet of the new flow. Therefore, the L3/L4 header information is sent through VFP layers (see details in Chapter 14). The Header Transposition engine then encodes the result to Unified Flow Table (UFT) with related actions: decapsulate, DNAT 1.2.3.4 > 10.0.04, and Allow (an NSG allows TCP/80 traffic). It also creates a paired flow entry since the 5-tuple hash result run against the reply message gives an equal flow id.

After processing the TCP SYN packet, VM 10.0.0.4 replies with TCP SYN, ACK packet. It uses its IP address as a source and the original IP address 172.16.1.4 as a destination. The UFT lookup result for flow-id matches to paired entry. Therefore, that packet is processed (Allowed and SNAT: DIP > VIP) and forwarded without running it against VFP layers. The unencapsulated packet is sent directly to Ro-1, bypassing the LB-1 (Direct Server Return - DSR). Therefore, the load balancer must process ingress flows only. This is one of the load balancer service optimization solutions.

The last packet of the TCP three-way handshake (and the subsequent packets of the flow) from the external host may end up in LB-2, but as it uses the same hashing algorithm as LB-1, it selects the same DIP.

Figure 20-2: Cloud Scale LB Data Plane Operation.

Data Plane and Control Plane for Outbound Traffic

A virtual machine with a customer-assigned public IP address could establish outbound connections to the Internet. Furthermore, these VMs are exposed and accessible on the Internet. VMs lacking a public IP address are automatically assigned an Azure-assigned IP address for egress-only Internet connections. Besides, VMs without public IP addresses can also utilize the Azure NAT Gateway for outbound connectivity. Among these three options, we can use the front-end IP address assigned to the load balancer for private IP-only VMs' outbound Internet access, creating an outbound rule with source NAT.

In Figure 20-3, we have an outbound rule where VIP 1.2.3.4 is associated with backend pool members (vm-beetle with the DIP 10.0.0.4). This rule is programmed to the SDN controller’s VIP mapping table.

Next, vm-beetle initiates a TCP three-way handshake process with an external host at the IP address 172.16.1.4. The TCP SYN flag indicates the start of a new data flow. Before packet forwarding, the host agent requests a Virtual IP (VIP) and source ports for the outbound connection's source Network Address Translation (SNAT). To accommodate the possibility of multiple connections from the VM, the controller allocates eight source ports for vm-beetle (we can adjust the port count). This allocation strategy eliminates the need for the host agent to request a new source port each time the VM initiates a new connection.

Subsequently, the controller synchronizes the DIP-to-VIP and source port mapping information across standby controllers. It then proceeds to program the mapping tables of the load balancer because the return traffic goes via load balancers. After these steps, the controller responds to the host agent's request. The host agent, in turn, updates the Source NAT (SNAT) policy within the Virtual Forwarding Plane (VFP) layer.

After the parser component has transmitted the original packet's header group metadata through the Virtual Forwarding Plane (VFP) layers, the header transposition engine updates the Unified Flow Table. Finally, the packet is directed toward the Internet, circumventing the load balancer.

Figure 20-3: Outbound Traffic.

Fast Path

The Azure Load Balancer service must manage an extensive volume of network traffic. Therefore, Azure has developed several load balancer optimization solutions. For example, the source and destination NAT are offloaded from the load balancers to hosts (to VFP’s NAT layer). That enables Direct Server Return (DSR) solution where the return traffic from the VM is routed without tunnel encapsulation towards the destination, bypassing load balancers.

This section introduces another optimization solution known as Fastpath. Once the TCP three-way handshake between load-balanced virtual machines is complete, the data flow is redirected straight between the VMs, bypassing load balancers in both directions. The solution uses a redirect message where the load balancer tells their VIP, DIP, and port mapping information to the source VIP. The Fastpath solution has many similarities with Dynamic Multipoint VPN (DMVPN).

Figure 20-6 depicts the three-way handshake process between two load-balanced VMs. VM named 'vm-beetle' hosts a service accessible through VIP 1.1.1.1 (VIP-A). Conversely, the service running on 'vm-bailey' (with Destination IP: 10.2.2.4) is reached using VIP 2.2.2.2 (VIP-B). Vm-beetle starts a TCP three-way handshake process with vm-bailey by sending a TCP SYN message to VIP 2.2.2.2. TCP SYN packet is the first packet of this connection. Therefore, the packet's L3/L4 header information is forwarded through the VFP layers to create a new flow entry with associated actions.

An NSG allows the connection. The destination IP address is public, so the header group information is sent to the NAT layer. NAT layer rewrites the source IP to 1.1.1.1 as defined in our load balancer's Outbound NAT policy. Because the destination IP address is from the public IP address space, the TCP SYN message is sent unencapsulated toward the destination (Direct Server Return). Therefore, the VNet layer doesn’t create any encapsulation action. After header group information has passed all the layers, the header transposition engine rewrites the changed source IP address, and the flow is encoded to the Unified Flow Table (UFT) with actions (allow, snat). The subsequent packets of this flow are directed based on the UFT. Then the TCP SYN message is sent to the destination VIP.

The VIP-B receives the packet. It checks the VIP mapping table, selects the destination DIP using a hash algorithm, and programs the three-tuple (VIP, DIP, port) to the flow table. Then it encapsulates the packet and forwards it to vm-bailey. The host agent intercepts the ingress TCP SYN packet. The parser component takes the header group information and runs it through the VFP layers. The VNet layer action removes the outer VXLAN header, the NAT layer action rewrites the VIP 1.1.1.1 to DIP 10.2.2.4 based on the load balancer’s inbound policy, and the NSG layer allows the packet. Then, after updating UFT, the packet is forwarded to vm-bailey. Note that UFT entries created to vm-beetle’s and vm-bailey’s UFT tables have also paired, reversed direction flow entries.

The VM vm-bailey accepts the SYN-ACK message by sending a TCP SYN ACK message back to vm-beetle. Because of the paired flow entry, this message can be forwarded based on the UFT. The source NAT from DIP 10.2.2.4 to 2.2.2.2 is the only rewrite action for the packet. Then it is sent towards VIP 1.1.1.1. The load balancer VIP-A receives the TCP SYN ACK message and selects the DIP from the VIP mapping table. Then it encapsulates the packet and sends it towards vm-bailey. When vm-bailey receives the message, it processes it based on the UFT. First, the VXLAN header is removed. Then the destination VIP 1.1.1.1 is changed to 10.1.1.4. Finally, the packet is passed to vm-bailey because of the allowed action. The TCP ACK sent by vm-beetle is processed in the same way.

Figure 20-4: Fast Path: TCP Three-Way Handshake.

After VMs have completed the TCP three-way handshake process, the load balancer VIP-B sends the redirect message to VIP-A (VIP 1.1.1.1 VIP-A), where it tells that the rest of the packets of the flow '1.1.1.1, 2.2.2.2, TCP/80' can be sent directly to DIP 10.2.2.4. Based on its flow table (1.1.1.1 mapped to 10.1.1.4), VIP-A forwards the redirect message toward DIP 10.1.1.4. Besides, VIP-A sends a redirect message to VIP-B describing its VIP mapping information in the same way as VIP-B did. When the redirect process is completed and the UFT is updated with the new flow entries, the packets between vm-beetle and vm-bailey are sent without source NAT action over the VXLAN tunnel using DIP for both source and destination IP addresses.

Figure 20-5: Fast Path: Redirect.

References

[1] Ananta: Cloud Scale Load Balancing, Parveen Patel et. Al.,

https://conferences.sigcomm.org/sigcomm/2013/papers/sigcomm/p207.pdf

[2] SDN for the Cloud, Albert Greenberg

https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/keynote.pdf

[3] Azure Windows VM Agent overview

https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/agent-windows

[4] Scenario 1: Configure outbound connections to a specific set of public IPs or prefix

https://learn.microsoft.com/en-us/azure/load-balancer/outbound-rules#scenario1out

Sunday, 24 March 2024

Azure Networking: Cloud Scale Load Balancing