IEEE: Internet routing problems

Rob Seastrom rs at seastrom.com
Sun Aug 24 13:43:47 CDT 2014


Andre Kesteloot <akesteloot at gmail.com> writes:

> [[http://spectrum.ieee.org/riskfactor/computing/it/the-routing-wall-of-shame/]]?

This is one of the better articles I've seen in terms of
not-getting-it-wrong.

The folks on this list can stand a little technical nuance though.

Routers on the Internet that handle large amounts of traffic are
generally not composed entirely of general-purpose CPUs - they are
split into what's called the "control plane" and the "data plane".

The "control plane" is where detailed information about network
topology lives, in the form of routing protocol tables (stuff like
BGP, OSPF, IS-IS).  It is where the command line interpreter via which
the network manger interacts with it lives.  The control plane is a
general purpose CPU (usually Intel, MIPS, or PowerPC).  Memory for
general-purpose CPUs is fairly inexpensive, though one is often
constrained by the number of memory sockets or traces on the CPU card
as to how much memory one can pile on.  The control plane is capable
of forwarding IP packets but not fast (and is often called into action
for things like replying to pings, generating max-hops-exceeded
messages for traceroute, etc).

The "data plane" contains the distilled set of best routes to each
destination that the router knows about, perhaps including a default,
or catch-all route.  The data plane is fast but stupid - it is made of
application-specific integrated circuits, and a special kind of memory
called TCAM:
http://en.wikipedia.org/wiki/Content-addressable_memory#Ternary_CAMs
This stuff is expensive, and generally non-expandable except by
replacing the board it lives on.  On the plus side, the task of the
data plane is fairly straightforward - send the packet to the
proper exit interface based on its destination address, and do so fast.

Conceptually, the data plane's role is very much like the MASH signpost:
http://www.mash4077tv.com/features/prop_spotlight_signpost/

What happened in these last days when the global routing table crossed
512k routes is akin to nailing one too many signs on the signpost thus
causing it to blow over in the wind.  The map is still in perfect
shape and is available if you jump through hoops, but the quick
reference is gone.  The fallback technique is to go see Radar O'Reilly
when you need directions (IP_Input on the control plane) and when he
gets around to it Colonel Potter will get around to handling the
matter personally.

This is why what people were seeing was often not 100% failure, but
more like 98%+ packet loss (annoyingly, enough to keep monitoring
software confused in many cases as to the true nature of things).

-r



More information about the Tacos mailing list