Receive Network Processing Unit (Rx NPU)
Figure 9-4 illustrates a simplified receive-side processing pipeline, starting from the moment a Packet Header Vector (PHV), constructed by the Rx IFG, is delivered to the Receive Network Processing Unit (Rx NPU).
When the PHV arrives at the Rx NPU, it is dispatched to one of the Run-to-Completion (RTC) cores in the Packet Processing Array (PPA). Each RTC core processes the packet within a single execution context, allowing parsing, classification, lookup, and queuing decisions to be resolved without intermediate handoffs between processing stages.
The first task of the RTC parser is to perform deep inspection of the packet headers. While the Rx IFG has already extracted basic Layer-2 and Layer-3 information, the RTC parser determines whether the packet is tunneled and whether the switch itself is the tunnel termination point. To demonstrate this behavior, consider a VXLAN-encapsulated packet. The outer Ethernet and IP headers are used to forward the packet through the underlay network. If the outer destination IP address matches one of the local switch IP addresses, the device identifies itself as the tunnel endpoint. The tunneling protocol is recognized by examining the UDP header, where destination port 4789 indicates VXLAN. After the tunneling mechanism is identified, the outer headers are logically removed, and processing continues using the inner Ethernet and IP headers. These inner headers then form the basis for forwarding decisions. In this example, tunneling illustrates a scenario in which the switch operates as a Virtual Tunnel Endpoint (VTEP) in a multitenant scale-out backend network.
In parallel with deep parsing, traffic classification takes place. The packet is assigned an Internal Traffic Class (ITC) by the pre-classification process in the Rx IFG pipeline. The ITC is used solely for internal prioritization within the Rx NPU, such as memory access arbitration and scheduling of processing resources inside the pipeline. It influences how the packet progresses through the internal stages of the NPU but does not determine where the packet is buffered for transmission. The ITC is carried as metadata within the Packet Header Vector (PHV), ensuring that every internal bus and memory controller treats the packet according to its pre-assigned urgency as it traverses the NPU.
The Rx NPU pipeline, in turn, matches the DSCP field against the configured QoS classification policy. The result of this policy evaluation determines the Virtual Output Queue (VOQ) into which the packet will be enqueued. As described in the VOQ chapter, VOQs are organized by Traffic Class and destination, ensuring that congestion affecting one egress interface does not introduce head-of-line blocking for traffic destined to another. This separation decouples ingress buffering from egress congestion and preserves fairness under load.
At the same time, the RTC core performs a forwarding lookup against the Forwarding Information Base (FIB). The lookup resolves the egress interface and any associated forwarding attributes. Once the egress interface is known, the previously selected VOQ is mapped to the corresponding Output Queue (OQ) on that interface. This mapping follows the Traffic Class–to–egress priority relationship described earlier, ensuring that packets maintain consistent priority semantics from ingress classification through egress scheduling.
After the VOQ-to-OQ mapping is established, the Traffic Manager initiates a credit request toward the Tx NPU scheduler. This interaction follows the credit-based flow control model described in the VOQ chapter. Conceptually, the request indicates that a packet of a given size, associated with a specific egress port and priority level, is ready for transmission. The scheduler evaluates whether the egress port’s microscopic FIFO has sufficient available buffer space and whether any higher-priority packets are waiting to be transmitted. If the conditions allow, credits equal to the packet length are granted.
Only after credits are granted is the packet permitted to move from Unified Shared Memory (USM) toward the egress pipeline. This strict separation between enqueueing and transmission prevents buffer overcommitment and enforces priority ordering during congestion.
Once the Tx NPU scheduler grants the necessary credits, the packet is dequeued from the Unified Shared Memory and enters the Transmit Network Processing Unit (Tx NPU). Similar to the receive side, the transmit pipeline utilizes a Run-to-Completion (RTC) model within its own Packet Processing Array (PPA). This ensures that the final packet transformations are performed with the same deterministic, single-context efficiency as the initial ingress processing.
Upon entering the Tx NPU, the packet is dispatched to a Tx RTC core. The core's primary responsibility is header reconstruction and encapsulation. While the Rx NPU made the forwarding decision, the Tx NPU executes the "physical" rewrite. For a packet exiting a VTEP, this is where the RTC engine pushes the appropriate VXLAN, UDP, IP, and Ethernet headers onto the inner payload. Because this is a programmable RTC environment, the device can support complex, multi-label stacks, such as SRv6 or deep MPLS label impositions, without the "recirculation" penalties found in fixed-pipeline ASICs.
In addition to encapsulation, the Tx NPU performs a final round of Egress Policy Enforcement. This includes applying egress ACLs, updating packet counters for billing or monitoring, and inserting In-band Network Telemetry (INT) metadata if configured. This allows the switch to timestamp the packet at the precise moment of departure, providing nanosecond-accurate latency data.
The final stage of the Tx NPU involves the Output Queue (OQ) Scheduler. Even though the packet has already been "credited" for transmission, this local scheduler manages the final arbitration between different traffic classes sharing the same physical port. It ensures that a burst of low-priority bulk data does not jitter a high-priority data stream at the very last microsecond of the journey.
Finally, the fully formed packet is handed off to the MAC and PCS (Physical Coding Sublayer). Here, the digital data is serialized and mapped into PAM4 (Pulse Amplitude Modulation 4-level) symbols. These symbols are then modulated onto the physical medium, whether as electrical signals over a backplane or light pulses through an optical transceiver, completing the packet's journey through the Silicon One architecture.
In summary, the Rx NPU integrates tunnel awareness, forwarding lookup, QoS-based queuing, and credit-controlled admission into the egress pipeline within a single run-to-completion processing model. Internal Traffic Class governs how the packet is processed inside the NPU, while the QoS policy determines where the packet waits for transmission. This separation of responsibilities enables deterministic performance, scalable queuing, and strict priority enforcement across the switching fabric.