Monday, 9 October 2017

Rapid per VLAN Spanning Tree protocol (Rapid PVST+): Synchronization


This article describes the Rapid per VLAN Spanning Tree Protocol (Rapid PVST +) synchronization process based on Proposal / Agreement messages. The purpose of the synchronization is to form a new loop-free layer 2 topology as a response to changes in stable STP domains. The synchronization process is started by a switch that loses its connection to a current STP root switch and it does not have an STP Alternate port.
The article also describes the topology change process that occurs when the STP non-Edge port status changes from Discarding/Learning state to Forwarding state.
The idea for writing this article started when I studied the new Cisco VXLAN-based Campus Fabric model and I wanted to compare its complexity with the complexity of the traditional Campus network.

Figure 1-1: Spanning Tree topology

In Figure 1-1, switch SW1 is statically set to the STP root switch to VLAN 10, defining its STP Priority to 4096 (+ VLAN id). Switches SW2 to SW6 use a default priority 32768 (+ VLAN-id). Figure 1-1 shows the stable STP topology.
The STP root switch SW1 sends the STP BPDU messages an every two seconds (default Hello time) where the Root Bridge-ID (RBID) and the Sender Bridge-id (SBID) fields are filled with its own information. The Root Path Cost (RPC) is 0 and the mac address is 1. Switches SW2 and SW6 receive the BPDU messages from their STP root ports. The switches SW2 and SW6 generate their own BPDU packets an every two seconds, promoting switch SW1 as the STP root switch. The SBID + mac fields in the BPDU messages include their own information. RPC will become their own RPC to the STP root switch. The switches SW3 and SW5 operate in the same way. Switch SW4 receives BPDU messages from both switches SW5 and SW3.
Both BPDU messages have the same RBID (SW1 = 4106) and RPC 8, so as a next step switch  SW4 compares the SBID value of BPDU message. Switch SW3 has a smaller SBID (SW3: 32778 + mac 3 vs. SW5: 32278 + mac 5), so the port between switches SW4-SW3 is selected as the STP root port. The STP port role of the port to the switch SW5 is set to Alternate and the state is set to Discarding (Blocking).
Although the STP topology is stable, STP continuously monitors network state and responds when changes occur.

Figure 1-2: Synchronization process - Proposal

Phase 1a: The link between switches SW1 and SW2 goes down on the switch SW1. As a result, switch SW2 loses connection to the STP root switch SW1. Switch SW2 reacts to the event immediately if its own link goes down. In a situation where its own link remains up, it waits for a BPDU message for 6 seconds (3 x Hello Time). After this, it deduces that the connection to the root switch is broken.

Step 1b: Switch SW2 assumes that it is a new STP root switch, so it starts the synchronization process with its downstream switches. It places all non-Edge Designated ports in Discarding mode. Subsequently, it generates BPDU messages, where it advertises itself as a new STP root switch and with a Proposal bit is set. Buy using the Proposal bit, it wants to express its willingness to act as a Designated switch for the segment.

Step 1c: Switch SW3 receives a BPDU message where the switch SW2 is marked as an STP root switch. The switch SW3 trusts this message, even if it is an inferior BPDU since the BPDU message has been received from the current STP root port. This is based on the assumption that the Inferior BPDU message received from the STP root port signifies either a change in the STP priority of the current STP root switch or that the upstream switch has lost its connection to the STP root switch.

Step 1d: Switch SW3 starts the synchronization process by setting all non-Edge Designated ports into Discarding mode.

Step 1e: The switch SW3 replies to switch SW2 by sending a BPDU message with the Agreement bit set. In this way, it tells that it accepts the SW2 to be a designated switch of the segment.

Step 1f: Switch SW2 change the status of the Designated port to the "Forwarding", upon receipt of the Agreement message. Now synchronization between switches SW2 and SW3 is complete.

Step 2a: Switch SW3 continues the synchronization process by sending the Proposal message to its downstream switch SW4 where it advertises switch SW2 as a new STP root switch with RPC 4.

Figure 1-3: Synchronization process - Agreement

Step 2b: Switch SW4 receives the Proposal message sent by switch SW3, where switch SW2 is marked as an STP root switch with RBID 32778 (+ mac). However, switch SW4 has a cache entry on ints Alternate port, where switch SW1 is marked as the STP root switch with a RBID value 4106. The switch SW4, therefore, does not accept the STP Proposal message. It sends its own Proposal message, where the switch SW1 is the STP root switch. The Switch SW4 sets the status of all its non-Edge Designated ports to Discarding.

Step 2c: Switch SW4 sets the role of the Alternate port to the STP root port as soon as it has started its own synchronization process with the downstream switch SW3 by sending Proposal messages and by switching its non-Edge Designated port to Discard state. The role change of Alternate port can be made immediately, as the STP Designated ports are in Discarding state, and the switch cannot create an L2 loop.

Step 2d: The switch SW3 receives the superior (BPDU) message sent by the switch SW4.

Step 2e: Switch SW3 starts the synchronization process by changing the STP role of the current STP root port towards switch SW2 to Designated and by switching the port state from Forwarding to Discarding.

Step 2f: Switch SW3 sets the STP port role of the port towards switch SW4 to STP root.

Step 2g: The switch SW3 accepts the Proposal message sent by switch SW4 with an Agreement message indicating that it accepts the Switch3 switch as a Designated switch of the segment.

Step 2h: Switch SW4 sets the STP port state of the port toward switch SW3 into Forwarding state.

Synchronization process between switches SW4 and SW3 is now complete.

Figure 1-4: Synchronization process - Ready

Step 3a: Switch SW3 continues the synchronization process by sending a proposal message to switch SW2, which still thinks it is STP root switch.

Step 3b: Switch SW2 receives a Superior BPDU message and works according to the synchronization process. It switches the STP port role from the Designated port to the Root port. Subsequently, it accepts the Proposal message sent by the switch SW3 with an Agreement message.

Step 3c: Switch SW3 receives the Agreement message and sets the Designated port state to Forwarding. Synchronization is complete.

Topology Changes

Switches build a new loop-free L2 path to the STP root switch by using the STP Synchronization process but the process does not affect the actual data path between hosts (switching is done based on a mac-address-table on each switch).  Changing of switching path is done by setting the Topology Change bit (TC) on BPDU messages in case of a non-Edge port state is changed from Discarding/Learning state to Forwarding state. The process is described in the following example.
Return to phase 2c, where switch SW4 change the port role from Alternate to Root and status from Discarding to Forwarding. This event triggers an STP Topology Change (TC) process, where the switch removes all mac addresses from the mac-address-table excluding the addresses learned from the switch where the TC message was received.

Step 4a: Switch SW4 starts the tcWhile timer (2 x Hello Time). During time switch marks the TC-bit to all BPDU messages it sends.

Step 4b: The switches SW5 and SW3 reacts to the TC message by clearing the mac address excluding addresses that are learned from the switch SW4.

Step 4c: The switches SW5 and SW3 activates the tcWhile timer and sets the TC-bit in all outgoing BPUDs during tcWhile timer is running.

Steps 4d - e: Switch SW6 repeats the same operations as switches SW5 and SW3.
Because switches SW1 and SW2 do not have upstream switches, they do not react to TC messages. Due to a removal of mac addresses, there will be lots of Flood and Learn -type mac address relearning, which is a normal process in layer two Ethernet networks.
After the Synchronization and TC processes, the network has stabilized. This does not mean that Spanning Tree is in quiet or in standby mode, switches will send the STP BPDU messages on every hello time and this way they can react to the possible network changes.

Figure 1-5: Topology Change

Interconnected ring topologies

The reaction of the STP to the network failure in a single ring or star topology is predictable and blackouts will be short, but the situation becomes more complicated if the topology consists of multiple interconnected rings.
In a figure 1-6, the STP root switch is VSS1. Let's think about the situation, where it goes down either in a controlled manner, for example during the major software upgrades or uncontrolled manner due to some faulty situation.
In the first phase switches sw12, sw15 and VSS-2 lose connection to the root switch and they start to advertise themselves as an STP root switch. This process triggers a synchronization wave that proceeds to switches sw11 and sw14 in block 1, switch sw22 and sw25 in block 2, and switch sw31 in block 3. Since switches in blocks 1 and 2 receive the message from the current STP root port, they believe that the root switch is either changed or its priority has changed, that's why they accept the Proposal messages with the Agreement and continues the synchronization process. Switch sw31 in block 3 does not accept the Proposal message because it has a Superior BPDU in its Alternate port cache where the switch VSS1 is an STP root. Therefore, it will start its own synchronization process.

Figure 1-6: Complex topology

The switch sw31 in block 3 changes the role and status of the current STP root port to Designate Discarding and sends a Proposal message out of it. It also changes the Alternate port role and state to Designated Forwarding. This change triggers a topology change process. As a result, block 2 switches clears their mac address-tables. The same reaction could be possible also in switches sw13 and sw23, depending on which orders proposals from their adjacent switches is received.
Several synchronization waves are now running simultaneously.
In the worst case scenario, the network converges only after the old STP root information in the cache of switches expires (MaxAge -MessageAge timer).

Figure 1-7: Complex topology

As shown in the previous example, it is difficult to determine the convergence time of multiple interconnect ring topology. By observing a star or single-ring topology where the STP root switch is included in the ring, STP predictability in a faulty situation is much easier and convergence happens fast.

No comments:

Post a Comment