TCP Ordering

The Transmission Control Protocol (TCP) is very sensitive to the behavior of packets sent end-to-end.  Variations in arrival time (“jitter”) coupled with other anomalous events that can either change the ordering, or completely lose, sections of the data cause problems that are challenging to recover from.  These challenges are propagated back to the user in the form of “low observed throughput”, e.g. things will still make it end-to-end, but will be “slower” than if no such perturbation occurred at all.  As a rule of thumb, one should try to eliminate all potential sources of packet re-ordering from the path.  Experience has shown that even low levels of re-ordering, that result in sub-millisecond delivery of a missing packet, can result in significant degradation in performance. 

The document attached to this page outlines behavior when the netem tool is used between two geographically separated hosts, and various levels of out of orderness and delay are applied to an iperf3 test. In summary:

  • Out of Order Packets (OOP) become harmful to a TCP stream when two factors increase: e.g. the probability of an OOP happening, and a small delay from which it can arrive (jitter)
    • This is recoverable at low levels, provided that TCP on the send and receive host has been tuned, the hosts are capable (fast processor, adequate memory)
    • It is not recoverable as the probability of packets being out of order increases, or the delay grows beyond an acceptable limit
  • As the time increases between an In Order Packet (IOP) and OOP - this starts to trigger SACK behavior.  This is a preventative measure that tries to get the single packet to be re-sent.  This causes a backup in the window, and may cause a re-send of the entire load of data
  • There is a point of no return, based on the delay of the OOP, and the RTT.  Observation has found this to be around 1% of the RTT for the path. 

In general one should minimize the opportunity for OOP, and if it has to occur the delay that will be caused.  In general the following real-world situations may cause this to happen:

  • Traversal of a device that may parallelize the process of processing a single TCP stream (a firewall, packet shaper)
  • Traversal of a device that may split traffic between links that are viewed as being equivalent (e.g. a bonded link on a host, a LAG on a network device) when the hashing algorithm doesn't use the same path for all packets in a flow
  • Queuing somewhere in the path due to when portions of the traffic in a single flow are forwarded with different priority
  • Adaptive load balancing methods (e.g. inside a host or router)

Adopting a Science DMZ architecture helps to eliminate this problem. Other useful references for this include: