Identifying Performance Problems

Identifying Problems In Regular Testing

Poor performance is easy to spot.  Packet loss causes TCP to behave erradically on short RTT tests, and perform extremely poorly as the RTT increases.  The following graph shows two extremes:

The first half of the graph shows high, yet erradic performance.  This can be explained as:

  • Short RTT testing (<10ms) between sites
  • Congestion, due to placement of server within the core of a network exchange point
  • Hosts of a different hardware setup, and software configuration

While not smooth, the performance on the first half was reaching user expectations for the environment.  The second half of the graph shows the impact of packet loss on TCP testing.  Loss was observed to be as high as 7% at times due to a problem within one of the domains that seperated the testers.  This packet loss is best visualized with OWAMP data:

As a final takeaway, please consider the following things when reviewing performance data:

  • When reviewing regular performance graphs, consider all factors of the network (connectivity, caoacity, congestion, buffering, latency between ends), host (hardware age, operating system, applied tunings), and testing environment (duration of test, settings of test). 
  • Expectations of seeing 90% of available capacity are often unrealistic for regular tests that are not destructive to the network.  To be a good citizen, short tests (< 30 seconds), that use TCP autotuning, and a single stream, will tell the operator and end user much about what the network is capable of. 
  • It is possible to create a 90% capacity test that will be destructive to the network by overtuning the host, adding additional streams, setting a default window size, and running longer tests.  
  • Use throughput data in conjunction with latency data to see the full picture of potential performance problems.