1. 16 Mar, 2009 10 commits
    • Herbert Xu's avatar
      GRO: Move netpoll checks to correct location · d1c76af9
      Herbert Xu authored
      As my netpoll fix for net doesn't really work for net-next, we
      need this update to move the checks into the right place.  As it
      stands we may pass freed skbs to netpoll_receive_skb.
      
      This patch also introduces a netpoll_rx_on function to avoid GRO
      completely if we're invoked through netpoll.  This might seem
      paranoid but as netpoll may have an external receive hook it's
      better to be safe than sorry.  I don't think we need this for
      2.6.29 though since there's nothing immediately broken by it.
      
      This patch also moves the GRO_* return values to netdevice.h since
      VLAN needs them too (I tried to avoid this originally but alas
      this seems to be the easiest way out).  This fixes a bug in VLAN
      where it continued to use the old return value 2 instead of the
      correct GRO_DROP.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1c76af9
    • Ilpo Järvinen's avatar
      tcp: make sure xmit goal size never becomes zero · afece1c6
      Ilpo Järvinen authored
      It's not too likely to happen, would basically require crafted
      packets (must hit the max guard in tcp_bound_to_half_wnd()).
      It seems that nothing that bad would happen as there's tcp_mems
      and congestion window that prevent runaway at some point from
      hurting all too much (I'm not that sure what all those zero
      sized segments we would generate do though in write queue).
      Preventing it regardless is certainly the best way to go.
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afece1c6
    • Ilpo Järvinen's avatar
      tcp: cache result of earlier divides when mss-aligning things · 2a3a041c
      Ilpo Järvinen authored
      The results is very unlikely change every so often so we
      hardly need to divide again after doing that once for a
      connection. Yet, if divide still becomes necessary we
      detect that and do the right thing and again settle for
      non-divide state. Takes the u16 space which was previously
      taken by the plain xmit_size_goal.
      
      This should take care part of the tso vs non-tso difference
      we found earlier.
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a3a041c
    • Ilpo Järvinen's avatar
      tcp: simplify tcp_current_mss · 0c54b85f
      Ilpo Järvinen authored
      There's very little need for most of the callsites to get
      tp->xmit_goal_size updated. That will cost us divide as is,
      so slice the function in two. Also, the only users of the
      tp->xmit_goal_size are directly behind tcp_current_mss(),
      so there's no need to store that variable into tcp_sock
      at all! The drop of xmit_goal_size currently leaves 16-bit
      hole and some reorganization would again be necessary to
      change that (but I'm aiming to fill that hole with u16
      xmit_goal_size_segs to cache the results of the remaining
      divide to get that tso on regression).
      
      Bring xmit_goal_size parts into tcp.c
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c54b85f
    • Ilpo Järvinen's avatar
      tcp: don't check mtu probe completion in the loop · 72211e90
      Ilpo Järvinen authored
      It seems that no variables clash such that we couldn't do
      the check just once later on. Therefore move it.
      
      Also kill dead obvious comment, dead argument and add
      unlikely since this mtu probe does not happen too often.
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72211e90
    • Ilpo Järvinen's avatar
      tcp: consolidate paws check · c887e6d2
      Ilpo Järvinen authored
      Wow, it was quite tricky to merge that stream of negations
      but I think I finally got it right:
      
      check & replace_ts_recent:
      (s32)(rcv_tsval - ts_recent) >= 0                  => 0
      (s32)(ts_recent - rcv_tsval) <= 0                  => 0
      
      discard:
      (s32)(ts_recent - rcv_tsval)  > TCP_PAWS_WINDOW    => 1
      (s32)(ts_recent - rcv_tsval) <= TCP_PAWS_WINDOW    => 0
      
      I toggled the return values of tcp_paws_check around since
      the old encoding added yet-another negation making tracking
      of truth-values really complicated.
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c887e6d2
    • Ilpo Järvinen's avatar
      tcp: kill dead end_seq variable in clean_rtx_queue · c43d558a
      Ilpo Järvinen authored
      I've already forgotten what for this was necessary, anyway
      it's no longer used (if it ever was).
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c43d558a
    • Ilpo Järvinen's avatar
      tcp: remove pointless .dsack/.num_sacks code · 5861f8e5
      Ilpo Järvinen authored
      In the pure assignment case, the earlier zeroing is
      still in effect.
      
      David S. Miller raised concerns if the ifs are there to avoid
      dirtying cachelines. I came to these conclusions:
      
      > We'll be dirty it anyway (now that I check), the first "real" statement
      > in tcp_rcv_established is:
      >
      >       tp->rx_opt.saw_tstamp = 0;
      >
      > ...that'll land on the same dword. :-/
      >
      > I suppose the blocks are there just because they had more complexity
      > inside when they had to calculate the eff_sacks too (maybe it would
      > have been better to just remove them in that drop-patch so you would
      > have had less head-ache :-)).
      Signed-off-by: default avatarIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5861f8e5
    • Jarek Poplawski's avatar
      pkt_sched: Change misleading code in class delete. · 7cd0a638
      Jarek Poplawski authored
      While looking for a possible reason of bugzilla report on HTB oops:
      http://bugzilla.kernel.org/show_bug.cgi?id=12858
      I found the code in htb_delete calling htb_destroy_class on zero
      refcount is very misleading: it can suggest this is a common path, and
      destroy is called under sch_tree_lock. Actually, this can never happen
      like this because before deletion cops->get() is done, and after
      delete a class is still used by tclass_notify. The class destroy is
      always called from cops->put(), so without sch_tree_lock.
      
      This doesn't mean much now (since 2.6.27) because all vulnerable calls
      were moved from htb_destroy_class to htb_delete, but there was a bug
      in older kernels. The same change is done for other classful scheds,
      which, it seems, didn't have similar locking problems here.
      Reported-by: default avatarm0sia <m0sia@m0sia.ru>
      Signed-off-by: default avatarJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cd0a638
    • Eric Dumazet's avatar
      net: reorder fields of struct socket · 8bdd663a
      Eric Dumazet authored
      On x86_64, its rather unfortunate that "wait_queue_head_t wait"
      field of "struct socket" spans two cache lines (assuming a 64
      bytes cache line in current cpus)
      
      offsetof(struct socket, wait)=0x30
      sizeof(wait_queue_head_t)=0x18
      
      This might explain why Kenny Chang noticed that his multicast workload
      was performing bad with 64 bit kernels, since more cache lines ping pongs
      were involved.
      
      This litle patch moves "wait" field next "fasync_list" so that both
      fields share a single cache line, to speedup sock_def_readable()
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8bdd663a
  2. 14 Mar, 2009 25 commits
  3. 13 Mar, 2009 5 commits
    • Gabriele Paoloni's avatar
      ppp: ppp_mp_explode() redesign · 9c705260
      Gabriele Paoloni authored
      I found the PPP subsystem to not work properly when connecting channels
      with different speeds to the same bundle.
      
      Problem Description:
      
      As the "ppp_mp_explode" function fragments the sk_buff buffer evenly
      among the PPP channels that are connected to a certain PPP unit to
      make up a bundle, if we are transmitting using an upper layer protocol
      that requires an Ack before sending the next packet (like TCP/IP for
      example), we will have a bandwidth bottleneck on the slowest channel
      of the bundle.
      
      Let's clarify by an example. Let's consider a scenario where we have
      two PPP links making up a bundle: a slow link (10KB/sec) and a fast
      link (1000KB/sec) working at the best (full bandwidth). On the top we
      have a TCP/IP stack sending a 1000 Bytes sk_buff buffer down to the
      PPP subsystem. The "ppp_mp_explode" function will divide the buffer in
      two fragments of 500B each (we are neglecting all the headers, crc,
      flags etc?.). Before the TCP/IP stack sends out the next buffer, it
      will have to wait for the ACK response from the remote peer, so it
      will have to wait for both fragments to have been sent over the two
      PPP links, received by the remote peer and reconstructed. The
      resulting behaviour is that, rather than having a bundle working
      @1010KB/sec (the sum of the channels bandwidths), we'll have a bundle
      working @20KB/sec (the double of the slowest channels bandwidth).
      
      
      Problem Solution:
      
      The problem has been solved by redesigning the "ppp_mp_explode"
      function in such a way to make it split the sk_buff buffer according
      to the speeds of the underlying PPP channels (the speeds of the serial
      interfaces respectively attached to the PPP channels). Referring to
      the above example, the redesigned "ppp_mp_explode" function will now
      divide the 1000 Bytes buffer into two fragments whose sizes are set
      according to the speeds of the channels where they are going to be
      sent on (e.g .  10 Byets on 10KB/sec channel and 990 Bytes on
      1000KB/sec channel).  The reworked function grants the same
      performances of the original one in optimal working conditions (i.e. a
      bundle made up of PPP links all working at the same speed), while
      greatly improving performances on the bundles made up of channels
      working at different speeds.
      Signed-off-by: default avatarGabriele Paoloni <gabriele.paoloni@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c705260
    • Roel Kluin's avatar
      tcp: '< 0' test on unsigned · a2025b8b
      Roel Kluin authored
      promote 'cnt' to size_t, to match 'len'.
      Signed-off-by: default avatarRoel Kluin <roel.kluin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2025b8b
    • Roel Kluin's avatar
      x25: '< 0' and '>= 0' test on unsigned · 8db09f26
      Roel Kluin authored
      skb->len is an unsigned int, so the test in x25_rx_call_request() always
      evaluates to true.
      
      len in x25_sendmsg() is unsigned as well. so -ERRORS returned by x25_output()
      are not noticed.
      Signed-off-by: default avatarRoel Kluin <roel.kluin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8db09f26
    • Denys Fedoryshchenko's avatar
      ipv4: arp announce, arp_proxy and windows ip conflict verification · 73ce7b01
      Denys Fedoryshchenko authored
      Windows (XP at least) hosts on boot, with configured static ip, performing 
      address conflict detection, which is defined in RFC3927.
      Here is quote of important information:
      
      "
      An ARP announcement is identical to the ARP Probe described above, 
      except    that now the sender and target IP addresses are both set 
      to the host's newly selected IPv4 address. 
      "
      
      But it same time this goes wrong with RFC5227.
      "
      The 'sender IP address' field MUST be set to all zeroes; this is to avoid
      polluting ARP caches in other hosts on the same link in the case
      where the address turns out to be already in use by another host.
      "
      
      When ARP proxy configured, it must not answer to both cases, because 
      it is address conflict verification in any case. For Windows it is just 
      causing to detect false "ip conflict". Already there is code for RFC5227, so 
      just trivially we just check also if source ip == target ip.
      Signed-off-by: default avatarDenys Fedoryshchenko <denys@visp.net.lb>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73ce7b01
    • Tomasz Lemiech's avatar
      tulip: Fix for MTU problems with 802.1q tagged frames · 1f8ae0a2
      Tomasz Lemiech authored
      The original patch was submitted last year but wasn't discussed or applied
      because of missing maintainer's CCs. I only fixed some formatting errors,
      but as I saw tulip is very badly formatted and needs further work.
      
      Original description:
      This patch fixes MTU problem, which occurs when using 802.1q VLANs. We
      should allow receiving frames of up to 1518 bytes in length, instead of
      1514.
      
      Based on patch written by Ben McKeegan for 2.4.x kernels. It is archived
      at http://www.candelatech.com/~greear/vlan/howto.html#tulip
      I've adjusted a few things to make it apply on 2.6.x kernels.
      
      Tested on D-Link DFE-570TX quad-fastethernet card.
      Signed-off-by: default avatarTomasz Lemiech <szpajder@staszic.waw.pl>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarBen McKeegan <ben@netservers.co.uk>
      Acked-by: default avatarGrant Grundler <grundler@parisc-linux.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f8ae0a2