In most cases, switches and routers are configured for "best-effort" packet forwarding. This
means that the router forwards all packets it receives to the best of its ability. The router forwards
a packet as soon as it can perform the table lookup necessary to determine the appropriate egress
interface(es) for the packet. If the router is unable to send a packet immediately, the packet is
queued. If the queue is full, the packet is dropped. Packets are typically processed on a first-come,
first-served (or FIFO, First In First Out) basis. This adds up to best-effort forwarding.
Everything is typically fine with best-effort forwarding until an interface is oversubscribed.
Once that happens, even if the oversubscription is momentary, the router must queue packets
to avoid dropping them. Therefore, the amount of queuing available on an interface determines the amount
of momentary oversubscription that the router is able to tolerate on that interface without dropping packets
and causing performance degradation. Note that, in most Research and Education (R&E) networks,
oversubscription of 1Gbps or 10Gbps interfaces is typically momentary - the bulk of the network traffic
is composed of science flows which consume a large amount of bandwidth in a small number of flows,
and once those flows encounter packet loss they collapse and stop consuming bandwidth.
This is very different from web browsing, email, YouTube, and so on where a very large number of
flows consume a relatively small amount of bandwidth each. R&E networks are sized for the
science flows, so the smaller flows do not typically saturate interfaces. However, a large
number of small flows can provide enough background traffic that the bursts associated with
high-speed transfers can cause those transfers to collapse as the large transfers momentarily
oversubscribe an interface and overflow its output queue. Since TCP performs poorly in the face
of even a small amount of packet loss, it is very important to configure routers and switches
with sufficient queuing to accommodate the momentary oversubscription of interfaces that
comes with the bursty traffic patterns inherent in wide-area, high-performance, TCP-based
data transfers.
The following diagram illustrates a common cluster configuration and the locations
where packet loss typically occurs due to inadequate queue resources:
The Cisco 6500/7600 switch-routers have default settings that result in underutilization of
the packet memory on their interfaces. Specifically, IOS defaults to a 40-packet output
queue. This means that it is very easy to make the router drop packets when it is
carrying wide area TCP flows, such as flows that result from high-performance bulk
transfers of large science data sets. For example, when high-speed packet bursts
arrive at the router from a 10G interface and exit the router via a 1G interface,
it is possible for packets to enter the router from the 10G interface faster than
the 1G interface can send them on. This means that the 1G interface needs to buffer
packets or drop them. A similar situation occurs when traffic enters the router
from two different interfaces and the two traffic streams are forwarded out a common egress
interface – momentary oversubscription of the output interface can result, and if
the output queue is too shallow then packets are dropped. A 40-packet output queue
is typically inadequate under these circumstances. We have found that adding the
following configuration to the 10G interfaces on the router helps a lot:
hold-queue 4096 out
1G interfaces can benefit from “hold-queue 1024 out” as well. This can be configured
on input as well, e.g. “hold-queue 1024 in” in the same place in the configuration.
Looking at the output of “show interface” can tell you the size of the interface queues.
Check before you make changes, since some interfaces default to a 2000-packet input queue.
However, the output queue is often the so-called "point of pain" in these circumstances.