1. 27 Aug, 2008 9 commits
  2. 25 Aug, 2008 12 commits
  3. 24 Aug, 2008 6 commits
    • Lennert Buytenhek's avatar
      c4560318
    • Lennert Buytenhek's avatar
      mv643xx_eth: enforce multiple-of-8-bytes receive buffer size restriction · abe78717
      Lennert Buytenhek authored
      The mv643xx_eth hardware ignores the lower three bits of the buffer
      size field in receive descriptors, causing the reception of full-sized
      packets to fail at some MTUs.  Fix this by rounding the size of
      allocated receive buffers up to a multiple of eight bytes.
      
      While we are at it, add a bit of extra space to each receive buffer so
      that we can handle multiple vlan tags on ingress.
      Signed-off-by: default avatarLennert Buytenhek <buytenh@marvell.com>
      abe78717
    • Lennert Buytenhek's avatar
      mv643xx_eth: fix NULL pointer dereference in rxq_process() · 9e1f3772
      Lennert Buytenhek authored
      When we are low on memory, the assumption that every descriptor in the
      receive ring will have an skbuff associated with it does not hold.
      
      rxq_process() was assuming that if the receive descriptor it is working
      on is not owned by the hardware, it can safely be processed and handed
      to the networking stack.  But a descriptor in the receive ring not being
      owned by the hardware can also happen when we are low on memory and did
      not manage to refill the receive ring fully.
      
      This patch changes rxq_process()'s bailout condition from "the first
      receive descriptor to be processed is owned by the hardware" to "the
      first receive descriptor to be processed is owned by the hardware OR
      the number of valid receive descriptors in the ring is zero".
      Signed-off-by: default avatarLennert Buytenhek <buytenh@marvell.com>
      9e1f3772
    • Lennert Buytenhek's avatar
      mv643xx_eth: fix inconsistent lock semantics · 8e0b1bf6
      Lennert Buytenhek authored
      Nicolas Pitre noted that mv643xx_eth_poll was incorrectly using
      non-IRQ-safe locks while checking whether to wake up the netdevice's
      transmit queue.  Convert the locking to *_irq() variants, since we
      are running from softirq context where interrupts are enabled.
      Signed-off-by: default avatarLennert Buytenhek <buytenh@marvell.com>
      8e0b1bf6
    • Lennert Buytenhek's avatar
      mv643xx_eth: fix double add_timer() on the receive oom timer · 92c70f27
      Lennert Buytenhek authored
      Commit 12e4ab79 ("mv643xx_eth: be
      more agressive about RX refill") changed the condition for the receive
      out-of-memory timer to be scheduled from "the receive ring is empty"
      to "the receive ring is not full".
      
      This can lead to a situation where the receive out-of-memory timer is
      pending because a previous rxq_refill() didn't manage to refill the
      receive ring entirely as a result of being out of memory, and
      rxq_refill() is then called again as a side effect of a packet receive
      interrupt, and that rxq_refill() call then again does not succeed to
      refill the entire receive ring with fresh empty skbuffs because we are
      still out of memory, and then tries to call add_timer() on the already
      scheduled out-of-memory timer.
      
      This patch fixes this issue by changing the add_timer() call in
      rxq_refill() to a mod_timer() call.  If the OOM timer was not already
      scheduled, this will behave as before, whereas if it was already
      scheduled, this patch will push back its firing time a bit, which is
      safe because we've (unsuccessfully) attempted to refill the receive
      ring just before we do this.
      Signed-off-by: default avatarLennert Buytenhek <buytenh@marvell.com>
      92c70f27
    • Lennert Buytenhek's avatar
      mv643xx_eth: fix NAPI 'rotting packet' issue · 819ddcaf
      Lennert Buytenhek authored
      When a receive interrupt occurs, mv643xx_eth would first process the
      receive descriptors and then ACK the receive interrupt, instead of the
      other way round.
      
      This would leave a small race window between processing the last
      receive descriptor and clearing the receive interrupt status in which
      a new packet could come in, which would then 'rot' in the receive
      ring until the next receive interrupt would come in.
      
      Fix this by ACKing (clearing) the receive interrupt condition before
      processing the receive descriptors.
      Signed-off-by: default avatarLennert Buytenhek <buytenh@marvell.com>
      819ddcaf
  4. 23 Aug, 2008 2 commits
    • Stephen Hemminger's avatar
      ipv6: protocol for address routes · f410a1fb
      Stephen Hemminger authored
      This fixes a problem spotted with zebra, but not sure if it is
      necessary a kernel problem.  With IPV6 when an address is added to an
      interface, Zebra creates a duplicate RIB entry, one as a connected
      route, and other as a kernel route.
      
      When an address is added to an interface the RTN_NEWADDR message
      causes Zebra to create a connected route. In IPV4 when an address is
      added to an interface a RTN_NEWROUTE message is set to user space with
      the protocol RTPROT_KERNEL. Zebra ignores these messages, because it
      already has the connected route.
      
      The problem is that route created in IPV6 has route protocol ==
      RTPROT_BOOT.  Was this a design decision or a bug? This fixes it. Same
      patch applies to both net-2.6 and stable.
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f410a1fb
    • Denis V. Lunev's avatar
      icmp: icmp_sk() should not use smp_processor_id() in preemptible code · fdc0bde9
      Denis V. Lunev authored
      Pass namespace into icmp_xmit_lock, obtain socket inside and return
      it as a result for caller.
      
      Thanks Alexey Dobryan for this report:
      
      Steps to reproduce:
      
      	CONFIG_PREEMPT=y
      	CONFIG_DEBUG_PREEMPT=y
      	tracepath <something>
      
      BUG: using smp_processor_id() in preemptible [00000000] code: tracepath/3205
      caller is icmp_sk+0x15/0x30
      Pid: 3205, comm: tracepath Not tainted 2.6.27-rc4 #1
      
      Call Trace:
       [<ffffffff8031af14>] debug_smp_processor_id+0xe4/0xf0
       [<ffffffff80409405>] icmp_sk+0x15/0x30
       [<ffffffff8040a17b>] icmp_send+0x4b/0x3f0
       [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160
       [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff8023a475>] ? local_bh_enable_ip+0x95/0x110
       [<ffffffff804285b9>] ? _spin_unlock_bh+0x39/0x40
       [<ffffffff8025a26c>] ? mark_held_locks+0x4c/0x90
       [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160
       [<ffffffff803e91b4>] ip_fragment+0x8d4/0x900
       [<ffffffff803e7030>] ? ip_finish_output2+0x0/0x290
       [<ffffffff803e91e0>] ? ip_finish_output+0x0/0x60
       [<ffffffff803e6650>] ? dst_output+0x0/0x10
       [<ffffffff803e922c>] ip_finish_output+0x4c/0x60
       [<ffffffff803e92e3>] ip_output+0xa3/0xf0
       [<ffffffff803e68d0>] ip_local_out+0x20/0x30
       [<ffffffff803e753f>] ip_push_pending_frames+0x27f/0x400
       [<ffffffff80406313>] udp_push_pending_frames+0x233/0x3d0
       [<ffffffff804067d1>] udp_sendmsg+0x321/0x6f0
       [<ffffffff8040d155>] inet_sendmsg+0x45/0x80
       [<ffffffff803b967f>] sock_sendmsg+0xdf/0x110
       [<ffffffff8024a100>] ? autoremove_wake_function+0x0/0x40
       [<ffffffff80257ce5>] ? validate_chain+0x415/0x1010
       [<ffffffff8027dc10>] ? __do_fault+0x140/0x450
       [<ffffffff802597d0>] ? __lock_acquire+0x260/0x590
       [<ffffffff803b9e55>] ? sockfd_lookup_light+0x45/0x80
       [<ffffffff803ba50a>] sys_sendto+0xea/0x120
       [<ffffffff80428e42>] ? _spin_unlock_irqrestore+0x42/0x80
       [<ffffffff803134bc>] ? __up_read+0x4c/0xb0
       [<ffffffff8024e0c6>] ? up_read+0x26/0x30
       [<ffffffff8020b8bb>] system_call_fastpath+0x16/0x1b
      
      icmp6_sk() is similar.
      Signed-off-by: default avatarDenis V. Lunev <den@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fdc0bde9
  5. 22 Aug, 2008 1 commit
  6. 21 Aug, 2008 3 commits
  7. 20 Aug, 2008 7 commits
    • Al Viro's avatar
      cramfs: fix named-pipe handling · 82d63fc9
      Al Viro authored
      After commit a97c9bf3 (fix cramfs
      making duplicate entries in inode cache) in kernel 2.6.14, named-pipe
      on cramfs does not work properly.
      
      It seems the commit make all named-pipe on cramfs share their inode
      (and named-pipe buffer).
      
      Make ..._test() refuse to merge inodes with ->i_ino == 1, take inode setup
      back to get_cramfs_inode() and make ->drop_inode() evict ones with ->i_ino
      == 1 immediately.
      Reported-by: default avatarAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@kernel.org>		[2.6.14 and later]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      82d63fc9
    • Ian Campbell's avatar
      fbdefio: add set_page_dirty handler to deferred IO FB · d847471d
      Ian Campbell authored
      Fixes kernel BUG at lib/radix-tree.c:473.
      
      Previously the handler was incidentally provided by tmpfs but this was
      removed with:
      
        commit 14fcc23f
        Author: Hugh Dickins <hugh@veritas.com>
        Date:   Mon Jul 28 15:46:19 2008 -0700
      
          tmpfs: fix kernel BUG in shmem_delete_inode
      
      relying on this behaviour was incorrect in any case and the BUG also
      appeared when the device node was on an ext3 filesystem.
      
      v2: override a_ops at open() time rather than mmap() time to minimise
      races per AKPM's concerns.
      Signed-off-by: default avatarIan Campbell <ijc@hellion.org.uk>
      Cc: Jaya Kumar <jayakumar.lkml@gmail.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Kel Modderman <kel@otaku42.de>
      Cc: Markus Armbruster <armbru@redhat.com>
      Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
      Cc: <stable@kernel.org> [14fcc23f is in 2.6.25.14 and 2.6.26.1]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d847471d
    • Anton Vorontsov's avatar
      rtc: rtc-ds1374: fix 'no irq' case handling · b42f9317
      Anton Vorontsov authored
      On a PowerPC board with ds1374 RTC I'm getting this error while RTC tries
      to probe:
      
      rtc-ds1374 0-0068: unable to request IRQ
      
      This happens because I2C probing code (drivers/of/of_i2c.c) is specifying
      IRQ0 for 'no irq' case, which is correct.
      
      The driver handles this incorrectly, though. This patch fixes it.
      Signed-off-by: default avatarAnton Vorontsov <avorontsov@ru.mvista.com>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Acked-by: default avatarPeter Korsgaard <jacmet@sunsite.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b42f9317
    • Nick Piggin's avatar
      mm: xip/ext2 fix block allocation race · 14bac5ac
      Nick Piggin authored
      XIP can call into get_xip_mem concurrently with the same file,offset with
      create=1.  This usually maps down to get_block, which expects the page
      lock to prevent such a situation.  This causes ext2 to explode for one
      reason or another.
      
      Serialise those calls for the moment.  For common usages today, I suspect
      get_xip_mem rarely is called to create new blocks.  In future as XIP
      technologies evolve we might need to look at which operations require
      scalability, and rework the locking to suit.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Acked-by: default avatarCarsten Otte <cotte@freenet.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      14bac5ac
    • Nick Piggin's avatar
      mm: xip fix fault vs sparse page invalidate race · 538f8ea6
      Nick Piggin authored
      XIP has a race between sparse pages being inserted into page tables, and
      sparse pages being zapped when its time to put a non-sparse page in.
      
      What can happen is that a process can be left with a dangling sparse page
      in a MAP_SHARED mapping, while the rest of the world sees the non-sparse
      version.  Ie.  data corruption.
      
      Guard these operations with a seqlock, making fault-in-sparse-pages the
      slowpath, and try-to-unmap-sparse-pages the fastpath.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Acked-by: default avatarCarsten Otte <cotte@freenet.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      538f8ea6
    • Nick Piggin's avatar
      mm: dirty page tracking race fix · 479db0bf
      Nick Piggin authored
      There is a race with dirty page accounting where a page may not properly
      be accounted for.
      
      clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.
      
      page_mkclean walks the rmaps for that page, and for each one it cleans and
      write protects the pte if it was dirty.  It uses page_check_address to
      find the pte.  That function has a shortcut to avoid the ptl if the pte is
      not present.  Unfortunately, the pte can be switched to not-present then
      back to present by other code while holding the page table lock -- this
      should not be a signal for page_mkclean to ignore that pte, because it may
      be dirty.
      
      For example, powerpc64's set_pte_at will clear a previously present pte
      before setting it to the desired value.  There may also be other code in
      core mm or in arch which do similar things.
      
      The consequence of the bug is loss of data integrity due to msync, and
      loss of dirty page accounting accuracy.  XIP's __xip_unmap could easily
      also be unreliable (depending on the exact XIP locking scheme), which can
      lead to data corruption.
      
      Fix this by having an option to always take ptl to check the pte in
      page_check_address.
      
      It's possible to retain this optimization for page_referenced and
      try_to_unmap.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: Carsten Otte <cotte@freenet.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      479db0bf
    • Ken Chen's avatar
      fix setpriority(PRIO_PGRP) thread iterator breakage · 2d70b68d
      Ken Chen authored
      When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
      process, only the task leader of the process is affected, all other
      sibling LWP threads didn't receive the setting.  The problem was that the
      iterator used in sys_setpriority() only iteartes over one task for each
      process, ignoring all other sibling thread.
      
      Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
      each thread of a process.  Convert 4 call sites in {set/get}priority and
      ioprio_{set/get}.
      Signed-off-by: default avatarKen Chen <kenchen@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d70b68d