ENC (Explicit Congestion Notification)

There has been much debate about ECN (Explicit Congestion Notification), and its impact on high-performance data transfers which use TCP.  RFC 3168 describes ECN:

Section 6.1.2 describes the behavior of a TCP sender. In the first paragraph of section 6.1.2, there is the following text:

If the sender receives an ECN-Echo (ECE) ACK packet (that is, an ACK packet with the ECN-Echo flag set in the TCP header), then the sender knows that congestion was encountered in the network on the path from the sender to the receiver.  The indication of congestion should be treated just as a congestion loss in non-ECN-Capable TCP. That is, the TCP source halves the congestion window "cwnd" and reduces the slow start threshold "ssthresh". The sending TCP SHOULD NOT increase the congestion window in response to the receipt of an ECN-Echo ACK packet.

Our interpretation of the text is that receiving ECN is equivalent to seeing packet loss from an end-host signaling perspective.

The next thing to consider is when a router would set CE in an ECN-capable TCP packet. Section 5 of RFC 3168 says:

For a router, the CE codepoint of an ECN-Capable packet SHOULD only be set if the router would otherwise have dropped the packet as an indication of congestion to the end nodes. When the router's buffer is not yet full and the router is prepared to drop a packet to inform end nodes of incipient congestion, the router should first check to see if the ECT codepoint is set in that packet's IP header.  If so, then instead of dropping the packet, the router MAY instead set the CE codepoint in the IP header.

So far we have been unable to find a recommendation for when a router should start setting CE for ECN-capable flows. However, prior engineering experience with RED (Random Early Detection) indicates that this would happen either before the interface reaches full capacity or before the interface queue fills.  However, the key issue for long distance data transfers is that ECN effectively signals packet loss for TCP sooner - either at lower queue fill levels or at lower interface utilization levels. This means TCP effectively sees loss while there are still hardware resources available that might prevent that loss.

This leads to a question: do science networks see full queues during production operations? On networks like ESnet this is not a typical observation.  The only time a queue may fill is during micro-congestion events, and these do not keep the queue full for any significant length of time (fractions of a  second).  The same is most likely true for any network with a traffic profile dominated by high-performance long-distance bulk data transfers, provided there is sufficient bandwidth available to avoid congestion due to link saturation.

Thus: for a well-provisioned science network, ECN reduces the performance for long-distance bulk data transfers while providing little benefit for latency-sensitive applications.