Cisco IOS-XE bug impacting Catalyst 9K series switches - workaround
Scope
This document advises customers of a known issue with Cisco IOS-XE software that can have adverse effects on AOIP networks. Customers will typically encounter this issue on the Cisco Catalyst 9K series switches, as those run on the IOS-XE OS. Switch series like the Catalyst 1000 series, or SG/CBS 350 do not use the same software.
Alternative configurations are provided in this document to achieve a stable AOIP multicast network without the disruptions caused by this software issue
Impact
Starting in Cisco switch IOS-XE software version 17.2.xx a change was made that prevents the switchport command "no ip igmp snooping tcn flood" from having any effect.
This command prevents the switch from "flooding" all multicast traffic for a predefined period of time when the switch encounters a "topology change notification". TCN's are a function of spanning-tree protocol, and get created when other switches are detected as being interconnected to the network. This is in part how STP can detect loops, alternative paths, essentially create a map of the network interconnection as a whole
In early days, before multicast traffic was ever used in a widespread manner for audio or video - it was desirable to flood this traffic over all switch to switch links anytime a TCN occurred. The reason being, to keep multicast receivers getting the traffic they subscribed too even during times when the IGMP snooping querier might also get moved and needs to rebuild the multicast forwarding table again from scratch.
Realistically small to medium sized AOIP networks do not have network switching capacity that can forward all multicast traffic to all hosts without considerable packet loss. This is where the issue begins as when the flooding occurs, depending on the network size/ bandwidth audio streams may become very poor sounding as many RTP packets per second are dropped on the interfaces.
Some customers have asked about rolling back firmware to IOS-XE versions that don't have this bug. Telos Support does not recommended this approach and can not assist customers with that process. In most cases the network is already in an active production environment.
Immediate workaround configurations
Connecting other Cisco switches to the Catalyst 9K
- On the Edge switch turn off spanning-tree for the VLAN.
- On the Edge switch set the uplink port into access mode with the typical configuration as specified in the related configuration document
- On the Catalyst 9K series Core switch, leave spanning-tree on for the VLAN.
- On the Catalyst 9K series Core switch set the uplink port into access mode with the typical configuration as specified in the related configuration document
This configuration will prevent TCN Flooding from occurring if the edge switch is rebooted, or disconnected and reconnected to the Core switch.
If it is desired to have redundant links before the Cisco Core and Cisco Edge, an etherchannel/LACP can still be implemented as this does not require spanning-tree to function.
Acceptable interconnection examples
Connecting Axia Powerstation's to the Catalyst 9K
- On the Powerstation, under the web gui page for Ethernet Switch Options leave the GIG port with the Trunk selection enabled. Enable STP box must NOT be checked
- On the Catalyst 9K series Core switch set the uplink port into access mode with the typical configuration as specified in the related configuration document
This configuration will prevent TCN Flooding from occurring if the Powerstation is rebooted, or disconnected and reconnected to the Core switch.
Only uplink one Powerstation GIG port back to the Catalyst 9K series Core switch. It is NOT possible to use both ports since spanning-tree is disabled on the Powerstation's integrated switch. The Powerstation does not support etherchannel/LACP either
Acceptable interconnection example
Connecting Axia QOR engines to the Catalyst 9K
- On the QOR there are no settings to adjust
- On the Catalyst 9K series Core switch set the uplink port into access mode with the typical configuration as specified in the related configuration document
Only uplink one QOR 1000BT port back to the Catalyst 9K series Core switch. With the QOR, it was never possible to have any redundant link configuration.
Acceptable interconnection example
Connecting Axia xSwitch to the Catalyst 9K
- On the xSwitch there are no setting to adjust
- On the Catalyst 9K series Core switch set the uplink port into access mode with the typical configuration as specified in the related configuration document
Only uplink one QOR 1000BT port back to the Catalyst 9K series Core switch. With the xSwitch, it was never possible to have any redundant link configuration.
Acceptable interconnection example
Workaround compromise/ considerations
The edge switches of the network will no longer have spanning-tree to protect against accidental network loops. Care must be taken to not create a network loop between edge switch local ports, or between neighboring edge switches.
Spanning-tree is still enabled on the 9K series core, if a loop is created it will be prevented but may also trigger flooding event
If it is desired to make the port changes on the "live" network, to prevent additional flood triggers administratively shutdown the switchport on the Catalyst 9K switch (shut) first, then make port configuration changes as noted. Then change the configuration on the edge switch or Powerstation, and lastly then bring the port back up on the Catalyst 9K
If the Catalyst 9K series is in a stacking configuration, this does not create any additional issues with the above workarounds.
The workaround assume only one access VLAN is needed between the Catalyst 9K core and the edge switch. In this way, only the access VLAN is available on all ports on the edge switch. In most classic Livewire+ networks, or AOIP only networks only one VLAN is implemented anyways.
If multiple VLANS are needed between the Core and edge switch, this workaround will NOT work for you.
Long term resolution
As of April 2023 Cisco has published version 17.11.1 of IOS-XE that should resolve the underlying issue. At this time we have not tested or have any field testing from customer sites to confirm resolution. Please let us know if you have seen resolution to the bug in 17.11.1