An error occurred fetching the project authors.
  1. 04 Dec, 2008 1 commit
    • Ilpo Järvinen's avatar
      tcp: make urg+gso work for real this time · f8269a49
      Ilpo Järvinen authored
      I should have noticed this earlier... :-) The previous solution
      to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
      patterns, or even worse, a steep downward slope into packet
      counting because each skb pcount would be truncated to pcount
      of 2 and then the following fragments of the later portion would
      restore the window again.
      
      Basically this reverts "tcp: Do not use TSO/GSO when there is
      urgent data" (33cf71ce). It also removes some unnecessary code
      from tcp_current_mss that didn't work as intented either (could
      be that something was changed down the road, or it might have
      been broken since the dawn of time) because it only works once
      urg is already written while this bug shows up starting from
      ~64k before the urg point.
      
      The retransmissions already are split to mss sized chunks, so
      only new data sending paths need splitting in case they have
      a segment otherwise suitable for gso/tso. The actually check
      can be improved to be more narrow but since this is late -rc
      already, I'll postpone thinking the more fine-grained things.
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8269a49
  2. 22 Nov, 2008 1 commit
    • Petr Tesarik's avatar
      tcp: Do not use TSO/GSO when there is urgent data · 33cf71ce
      Petr Tesarik authored
      This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
      
      Since most (if not all) implementations of TSO and even the in-kernel
      software GSO do not update the urgent pointer when splitting a large
      segment, it is necessary to turn off TSO/GSO for all outgoing traffic
      with the URG pointer set.
      
      Looking at tcp_current_mss (and the preceding comment) I even think
      this was the original intention. However, this approach is insufficient,
      because TSO/GSO is turned off only for newly created frames, not for
      frames which were already pending at the arrival of a message with
      MSG_OOB set. These frames were created when TSO/GSO was enabled,
      so they may be large, and they will have the urgent pointer set
      in tcp_transmit_skb().
      
      With this patch, such large packets will be fragmented again before
      going to the transmit routine.
      
      As a side note, at least the following NICs are known to screw up
      the urgent pointer in the TCP header when doing TSO:
      
      	Intel 82566MM (PCI ID 8086:1049)
      	Intel 82566DC (PCI ID 8086:104b)
      	Intel 82541GI (PCI ID 8086:1076)
      	Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
      Signed-off-by: default avatarPetr Tesarik <ptesarik@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33cf71ce
  3. 27 Oct, 2008 1 commit
  4. 23 Oct, 2008 1 commit
    • Ilpo Järvinen's avatar
      tcp: Restore ordering of TCP options for the sake of inter-operability · fd6149d3
      Ilpo Järvinen authored
      This is not our bug! Sadly some devices cannot cope with the change
      of TCP option ordering which was a result of the recent rewrite of
      the option code (not that there was some particular reason steming
      from the rewrite for the reordering) though any ordering of TCP
      options is perfectly legal. Thus we restore the original ordering
      to allow interoperability with/through such broken devices and add
      some warning about this trap. Since the reordering just happened
      without any particular reason, this change shouldn't cost us
      anything.
      
      There are already couple of known failure reports (within close
      proximity of the last release), so the problem might be more
      wide-spread than a single device. And other reports which may
      be due to the same problem though the symptoms were less obvious.
      Analysis of one of the case revealed (with very high probability)
      that sack capability cannot be negotiated as the first option
      (SYN never got a response).
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Reported-by: default avatarAldo Maggi <sentiniate@tiscali.it>
      Tested-by: default avatarAldo Maggi <sentiniate@tiscali.it>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd6149d3
  5. 21 Oct, 2008 1 commit
    • Ilpo Järvinen's avatar
      tcp: should use number of sack blocks instead of -1 · 75e3d8db
      Ilpo Järvinen authored
      While looking for the recent "sack issue" I also read all eff_sacks
      usage that was played around by some relevant commit. I found
      out that there's another thing that is asking for a fix (unrelated
      to the "sack issue" though).
      
      This feature has probably very little significance in practice.
      Opposite direction timeout with bidirectional tcp comes to me as
      the most likely scenario though there might be other cases as
      well related to non-data segments we send (e.g., response to the
      opposite direction segment). Also some ACK losses or option space
      wasted for other purposes is necessary to prevent the earlier
      SACK feedback getting to the sender.
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75e3d8db
  6. 07 Oct, 2008 1 commit
    • Ilpo Järvinen's avatar
      tcp: kill pointless urg_mode · 33f5f57e
      Ilpo Järvinen authored
      It all started from me noticing that this urgent check in
      tcp_clean_rtx_queue is unnecessarily inside the loop. Then
      I took a longer look to it and found out that the users of
      urg_mode can trivially do without, well almost, there was
      one gotcha.
      
      Bonus: those funny people who use urg with >= 2^31 write_seq -
      snd_una could now rejoice too (that's the only purpose for the
      between being there, otherwise a simple compare would have done
      the thing). Not that I assume that the rest of the tcp code
      happily lives with such mind-boggling numbers :-). Alas, it
      turned out to be impossible to set wmem to such numbers anyway,
      yes I really tried a big sendfile after setting some wmem but
      nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf...
      So I hacked a bit variable to long and found out that it seems
      to work... :-)
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33f5f57e
  7. 01 Oct, 2008 1 commit
    • KOVACS Krisztian's avatar
      tcp: Port redirection support for TCP · a3116ac5
      KOVACS Krisztian authored
      Current TCP code relies on the local port of the listening socket
      being the same as the destination address of the incoming
      connection. Port redirection used by many transparent proxying
      techniques obviously breaks this, so we have to store the original
      destination port address.
      
      This patch extends struct inet_request_sock and stores the incoming
      destination port value there. It also modifies the handshake code to
      use that value as the source port when sending reply packets.
      Signed-off-by: default avatarKOVACS Krisztian <hidden@sch.bme.hu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3116ac5
  8. 23 Sep, 2008 1 commit
  9. 21 Sep, 2008 11 commits
  10. 27 Aug, 2008 1 commit
  11. 22 Jul, 2008 1 commit
  12. 19 Jul, 2008 2 commits
  13. 17 Jul, 2008 3 commits
  14. 03 Jul, 2008 1 commit
    • Pavel Emelyanov's avatar
      tcp: de-bloat a bit with factoring NET_INC_STATS_BH out · 40b215e5
      Pavel Emelyanov authored
      There are some places in TCP that select one MIB index to
      bump snmp statistics like this:
      
      	if (<something>)
      		NET_INC_STATS_BH(<some_id>);
      	else if (<something_else>)
      		NET_INC_STATS_BH(<some_other_id>);
      	...
      	else
      		NET_INC_STATS_BH(<default_id>);
      
      or in a more tricky but still similar way.
      
      On the other hand, this NET_INC_STATS_BH is a camouflaged
      increment of percpu variable, which is not that small.
      
      Factoring those cases out de-bloats 235 bytes on non-preemptible
      i386 config and drives parts of the code into 80 columns.
      
      add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-235 (-235)
      function                                     old     new   delta
      tcp_fastretrans_alert                       1437    1424     -13
      tcp_dsack_set                                137     124     -13
      tcp_xmit_retransmit_queue                    690     676     -14
      tcp_try_undo_recovery                        283     265     -18
      tcp_sacktag_write_queue                     1550    1515     -35
      tcp_update_reordering                        162     106     -56
      tcp_retransmit_timer                         990     904     -86
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40b215e5
  15. 12 Jun, 2008 1 commit
  16. 11 Jun, 2008 1 commit
  17. 04 Jun, 2008 1 commit
  18. 21 May, 2008 1 commit
    • Sridhar Samudrala's avatar
      tcp: TCP connection times out if ICMP frag needed is delayed · 7d227cd2
      Sridhar Samudrala authored
      We are seeing an issue with TCP in handling an ICMP frag needed
      message that is received after net.ipv4.tcp_retries1 retransmits.
      The default value of retries1 is 3. So if the path mtu changes
      and ICMP frag needed is lost for the first 3 retransmits or if
      it gets delayed until 3 retransmits are done, TCP doesn't update
      MSS correctly and continues to retransmit the orginal message
      until it timesout after tcp_retries2 retransmits.
      
      I am seeing this issue even with the latest 2.6.25.4 kernel.
      
      In tcp_retransmit_timer(), when retransmits counter exceeds 
      tcp_retries1 value, the dst cache entry of the socket is reset.
      At this time, if we receive an ICMP frag needed message, the 
      dst entry gets updated with the new MTU, but the TCP sockets
      dst_cache entry remains NULL.
      
      So the next time when we try to retransmit after the ICMP frag
      needed is received, tcp_retransmit_skb() gets called. Here the
      cur_mss value is calculated at the start of the routine with
      a NULL sk_dst_cache. Instead we should call tcp_current_mss after
      the rebuild_header that caches the dst entry with the updated mtu.
      Also the rebuild_header should be called before tcp_fragment
      so that skb is fragmented if the mss goes down.
      Signed-off-by: default avatarSridhar Samudrala <sri@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d227cd2
  19. 16 Apr, 2008 1 commit
  20. 10 Apr, 2008 1 commit
    • Florian Westphal's avatar
      [Syncookies]: Add support for TCP options via timestamps. · 4dfc2817
      Florian Westphal authored
      Allow the use of SACK and window scaling when syncookies are used
      and the client supports tcp timestamps. Options are encoded into
      the timestamp sent in the syn-ack and restored from the timestamp
      echo when the ack is received.
      
      Based on earlier work by Glenn Griffin.
      This patch avoids increasing the size of structs by encoding TCP
      options into the least significant bits of the timestamp and
      by not using any 'timestamp offset'.
      
      The downside is that the timestamp sent in the packet after the synack
      will increase by several seconds.
      
      changes since v1:
       don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
       and have ipv6/syncookies.c use it.
       Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()
      Reviewed-by: default avatarHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4dfc2817
  21. 08 Apr, 2008 1 commit
    • Ilpo Järvinen's avatar
      [TCP]: tcp_simple_retransmit can cause S+L · 882bebaa
      Ilpo Järvinen authored
      This fixes Bugzilla #10384
      
      tcp_simple_retransmit does L increment without any checking
      whatsoever for overflowing S+L when Reno is in use.
      
      The simplest scenario I can currently think of is rather
      complex in practice (there might be some more straightforward
      cases though). Ie., if mss is reduced during mtu probing, it
      may end up marking everything lost and if some duplicate ACKs
      arrived prior to that sacked_out will be non-zero as well,
      leading to S+L > packets_out, tcp_clean_rtx_queue on the next
      cumulative ACK or tcp_fastretrans_alert on the next duplicate
      ACK will fix the S counter.
      
      More straightforward (but questionable) solution would be to
      just call tcp_reset_reno_sack() in tcp_simple_retransmit but
      it would negatively impact the probe's retransmission, ie.,
      the retransmissions would not occur if some duplicate ACKs
      had arrived.
      
      So I had to add reno sacked_out reseting to CA_Loss state
      when the first cumulative ACK arrives (this stale sacked_out
      might actually be the explanation for the reports of left_out
      overflows in kernel prior to 2.6.23 and S+L overflow reports
      of 2.6.24). However, this alone won't be enough to fix kernel
      before 2.6.24 because it is building on top of the commit
      1b6d427b ([TCP]: Reduce sacked_out with reno when purging
      write_queue) to keep the sacked_out from overflowing.
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Reported-by: default avatarAlessandro Suardi <alessandro.suardi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      882bebaa
  22. 21 Mar, 2008 1 commit
    • Peter P Waskiewicz Jr's avatar
      [NET]: Add per-connection option to set max TSO frame size · 82cc1a7a
      Peter P Waskiewicz Jr authored
      Update: My mailer ate one of Jarek's feedback mails...  Fixed the
      parameter in netif_set_gso_max_size() to be u32, not u16.  Fixed the
      whitespace issue due to a patch import botch.  Changed the types from
      u32 to unsigned int to be more consistent with other variables in the
      area.  Also brought the patch up to the latest net-2.6.26 tree.
      
      Update: Made gso_max_size container 32 bits, not 16.  Moved the
      location of gso_max_size within netdev to be less hotpath.  Made more
      consistent names between the sock and netdev layers, and added a
      define for the max GSO size.
      
      Update: Respun for net-2.6.26 tree.
      
      Update: changed max_gso_frame_size and sk_gso_max_size from signed to
      unsigned - thanks Stephen!
      
      This patch adds the ability for device drivers to control the size of
      the TSO frames being sent to them, per TCP connection.  By setting the
      netdevice's gso_max_size value, the socket layer will set the GSO
      frame size based on that value.  This will propogate into the TCP
      layer, and send TSO's of that size to the hardware.
      
      This can be desirable to help tune the bursty nature of TSO on a
      per-adapter basis, where one may have 1 GbE and 10 GbE devices
      coexisting in a system, one running multiqueue and the other not, etc.
      
      This can also be desirable for devices that cannot support full 64 KB
      TSO's, but still want to benefit from some level of segmentation
      offloading.
      Signed-off-by: default avatarPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82cc1a7a
  23. 20 Mar, 2008 1 commit
    • Patrick McHardy's avatar
      [TCP]: Fix shrinking windows with window scaling · 607bfbf2
      Patrick McHardy authored
      When selecting a new window, tcp_select_window() tries not to shrink
      the offered window by using the maximum of the remaining offered window
      size and the newly calculated window size. The newly calculated window
      size is always a multiple of the window scaling factor, the remaining
      window size however might not be since it depends on rcv_wup/rcv_nxt.
      This means we're effectively shrinking the window when scaling it down.
      
      
      The dump below shows the problem (scaling factor 2^7):
      
      - Window size of 557 (71296) is advertised, up to 3111907257:
      
      IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>
      
      - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
        below the last end:
      
      IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>
      
      The number 40 results from downscaling the remaining window:
      
      3111907257 - 3111841425 = 65832
      65832 / 2^7 = 514
      65832 % 2^7 = 40
      
      If the sender uses up the entire window before it is shrunk, this can have
      chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
      will notice that the window has been shrunk since tcp_wnd_end() is before
      tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
      This will fail the receivers checks in tcp_sequence() however since it
      is before it's tp->rcv_wup, making it respond with a dupack.
      
      If both sides are in this condition, this leads to a constant flood of
      ACKs until the connection times out.
      
      Make sure the window is never shrunk by aligning the remaining window to
      the window scaling factor.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      607bfbf2
  24. 12 Mar, 2008 1 commit
  25. 04 Mar, 2008 1 commit
  26. 01 Feb, 2008 1 commit
  27. 28 Jan, 2008 1 commit
    • Ilpo Järvinen's avatar
      [TCP]: Perform setting of common control fields in one place · e870a8ef
      Ilpo Järvinen authored
      In case of segments which are purely for control without any
      data (SYN/ACK/FIN/RST), many fields are set to common values
      in multiple places.
      
      i386 results:
      
      $ gcc --version
      gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13)
      
      $ codiff tcp_output.o.old tcp_output.o.new
      net/ipv4/tcp_output.c:
        tcp_xmit_probe_skb    |  -48
        tcp_send_ack          |  -56
        tcp_retransmit_skb    |  -79
        tcp_connect           |  -43
        tcp_send_active_reset |  -35
        tcp_make_synack       |  -42
        tcp_send_fin          |  -48
       7 functions changed, 351 bytes removed
      
      net/ipv4/tcp_output.c:
        tcp_init_nondata_skb |  +90
       1 function changed, 90 bytes added
      
      tcp_output.o.mid:
       8 functions changed, 90 bytes added, 351 bytes removed, diff: -261
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e870a8ef