Spanning Tree Protocol (STP) Explained

Updated by Eric Hufford

Scope

This article briefly explains what Spanning Tree Protocol is and what it is used for.


Description

Customers may see Spanning Tree Protocol (referred to as "STP") enabled on their switches, or they may see it as an option in certain Telos equipment, but what is STP, and why do we use it?

This article is not a guide, rather it is intended to demystify STP and provide an introduction.

The Problem: Network Loops

If you've ever plugged a network switch back into itself, you may have quickly learned that was a bad idea, as the switch overloads and the network grinds to a halt.

While most people don't plug a switch back into itself, they can actually create the same effect by building a ring topology, where Switch 1 plugs into Switch 2, Switch 2 plugs into Switch 3, but then Switch 3 plugs BACK into Switch 1 again, as seen below:

What is actually looped in a "network loop"?

Messages are always getting sent back and forth on your network switch. Most times, your switch only will send a message to its intended recipient. For example, if your PC is plugged into port 1 of your switch, and your server is plugged into port 5, then messages between your PC and the server will only be sent from port 1 to port 5, and from port 5 back to port 1. We would call this a one-to-one conversation, or "unicast":

Because other devices are not involved in this "conversation," we don't need to bother sending those messages to all the other ports - that would generate a lot of unnecessary traffic.

But what happens when we have to send messages to ALL devices? Well, that would mean that a message gets sent out all ports. We call these "broadcast" messages, and while most connections are unicast (one-to-one), broadcast messages are also common, as several protocols use them to gather a variety of information.

The Broadcast Storm

Let's say that we have a PC sending out a broadcast message (for whatever reason). That message gets sent to the switch the PC is connected to. The switch will, in turn, send that message to all other devices that are connected to it:

This is expected behavior and not a problem. But let's change the situation - now we connect a 2nd switch to our network:

See any difference? There isn't really, and there won't be much difference with a 3rd switch as long as we're just stringing it along:

So far, we're operating with no problems.

Now, let's strike the death blow and connect Switch 3 back to Switch 1:

Whoa, what happened?

Well, when we connected Switch 3 to Switch 1, we began looping that message.

To be precise, a broadcast message that was originally sent from Switch 1 was sent to Switch 2 and Switch 3 simultaneously, and because it's a broadcast message, Switch 2 and Switch 3 sent the same message back to each other. Then, they both sent that message back to Switch 1, which then began the rinse, lather, repeat cycle.

To illustrate, the process probably progressed like this, but much faster:

By connecting Switch 3 back to Switch 1, we began duplicating the message infinitely in our network.

The cycle continues until all the switches are so overloaded with duplicate messages that they don't have enough CPU power and simply stop working.

We call this a broadcast storm.


The Solution: STP

Enter STP: it stops loops.

Now, it can be used for more than that, but essentially that's what it was built for.

STP is enabled by default on most switches, and that is to ensure that network loops are never an issue. The process by which it does that is a little more complicated, and there are even different types of Spanning Tree Protocols that can be used, but that is beyond the scope of this document.

How does it work?

STP uses a process called convergence, whereby switches discover which links are important enough to keep up (forwarding), and which ones should be shutdown (blocking) until they are needed.

Let's go back to our example of the 3 switches all connected together. Before STP was enabled, messages were being duplicated and sent all over the place uncontrollably. However, with STP enabled, the switches collectively decide to block one of these links and defeat the loop:

During the convergence process, all the connected switches exchange information to decide who is the leader. This "leader" switch becomes designated as the root bridge, and this designation is very important, because it is a deciding factor in determining which links continue forwarding traffic, and which ones block it.

Once a root bridge is decided, then the task becomes determining the most efficient network path to that root bridge. This process occurs by counting the "cost" of each connected port to the root bridge. The port cost is simply a measurement of link speed (Mb/s). For example, a 100 Mb/s link would cost more than a 1000 Mb/s link, so given the choice between the two, a switch will use the 1000 Mb/s link because it is faster (it "costs less").

If there are multiple switches between a switch and the root bridge, then the cost of each link is summed into the "path cost." The illustration below shows how Switch 3 will use the direct connection to the root bridge instead of going through Switch 2:

What's the catch?

The convergence process described above takes time, and that's the catch.

Let's say you decide to utilize the classic STP to create a redundant network. That's a great idea, except when you plug in a switch to your network, you're going to shut it down for 30-50 seconds.

Why? Plugging in that switch created a new topology, and so the convergence process must occur to determine root bridge and path cost all over again. All network traffic gets blocked while things get sorted.

It should also be mentioned that unplugging a switch will have the same effect, as this would be considered a topology change.

Isn't there a better way?

Yes, the above catch is true, but only with classic STP (which is still used today). The good news is that other protocols have emerged to make STP more efficient. After all, 50 seconds is a long time in any network.

One such protocol is Rapid PVST+, or Rapid Per-VLAN Spanning Tree Protocol+. This protocol can converge on a per-VLAN basis in as little as 1 second! While this is a great improvement in convergence time, it does come at a cost of higher CPU and memory usage.


The Takeaway

Spanning Tree Protocol is a great tool to defeat loops and simultaneously create redundant links within your Axia network. However, before implementing redundancy, be sure you understand each protocol's benefits and drawbacks, as well as how to enable the type of STP (RSTP, PVST+, etc.) that you want to use on your network.

While some types have higher convergence times, others require more of your switch's resources. Depending on the age of your hardware, and how much traffic it handles, you may decide to use one over the other. It is always a good idea to count the cost before making an investment.


Let us know how we can help

If you have further questions on this topic or have ideas about improving this document, please contact us.


How did we do?