Our work on TCP performance measurement and optimization has been accepted to appear in ACM CoNEXT 2015


TCP is an important factor affecting user-perceived performance of Internet applications. Diagnosing the causes behind TCP performance issues in the wild is essential to better understand the current shortcomings in TCP. We present a TCP flow performance analysis framework focused on identifying TCP stalls, and implement a related tool which we make publicly available to the research community. We rely on our tool to analyze packet-level traces of three different services (cloud storage, software download and web search) belonging toa popular Chinese service provider. We find that as many as 20% of the flows are stalled for half of their lifetime. Network-related causes, especially timeout retransmission, dominate the stalls. Breakdown of the causes for timeout retransmission stalls reveals that double retransmission and tail retransmission are among the top contributors. However,the importance of these causes is dependent on the specificapplication. We then propose S-RTO, a mechanism used to mitigate timeout retransmission stalls. S-RTO has been deployed on production front-end servers and shown to be effective at improving TCP performance,especially for short flows.


Jianer Zhou*,Qinghua Wu*,Zhenyu Li*,Steve Uhlig,Peter Steenkiste, Jian Chen,Gaogang Xie, Demystifying and Mitigating TCP Stalls at the Server Side, ACM CoNEXT, 2015 (*The first three authors contributed equally to this work.)