1. 10 Nov, 2009 12 commits
    • Thomas Gleixner's avatar
      futex: Fix spurious wakeup for requeue_pi really · fc7df048
      Thomas Gleixner authored
      commit 11df6ddd upstream.
      
      The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
      NULL test) nor does it use the wake_list of futex_wake() which where
      the reason for commit 41890f24 (futex: Handle spurious wake up)
      
      See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>
      
      The changes in this fix to the wait_requeue_pi path were considered to
      be a likely unecessary, but harmless safety net. But it turns out that
      due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
      as EAGAIN we built an endless loop in the code path which returns
      correctly EWOULDBLOCK.
      
      Spurious wakeups in wait_requeue_pi code path are unlikely so we do
      the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
      it deal with the spurious wakeup.
      
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      LKML-Reference: <4AE23C74.1090502@us.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      fc7df048
    • Darren Hart's avatar
      futex: Move drop_futex_key_refs out of spinlock'ed region · 81e6fd57
      Darren Hart authored
      commit 89061d3d upstream.
      
      When requeuing tasks from one futex to another, the reference held
      by the requeued task to the original futex location needs to be
      dropped eventually.
      
      Dropping the reference may ultimately lead to a call to
      "iput_final" and subsequently call into filesystem- specific code -
      which may be non-atomic.
      
      It is therefore safer to defer this drop operation until after the
      futex_hash_bucket spinlock has been dropped.
      
      Originally-From: Helge Bahmann <hcb@chaoticmind.net>
      Signed-off-by: default avatarDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      Cc: Sven-Thorsten Dietrich <sdietrich@novell.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <4AD7A298.5040802@us.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      81e6fd57
    • Darren Hart's avatar
      futex: Check for NULL keys in match_futex · 6d57fbdd
      Darren Hart authored
      commit 2bc87203 upstream.
      
      If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
      it will find the futex_q->requeue_pi_key to be NULL and OOPS.
      
      Check for NULL in match_futex() instead of doing explicit NULL pointer
      checks on all call sites.  While match_futex(NULL, NULL) returning
      false is a little odd, it's still correct as we expect valid key
      references.
      Signed-off-by: default avatarDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Dinakar Guniguntala <dino@in.ibm.com>
      CC: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <4AD60687.10306@us.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      6d57fbdd
    • Thomas Gleixner's avatar
      futex: Handle spurious wake up · e68e25e6
      Thomas Gleixner authored
      commit d58e6576 upstream.
      
      The futex code does not handle spurious wake up in futex_wait and
      futex_wait_requeue_pi.
      
      The code assumes that any wake up which was not caused by futex_wake /
      requeue or by a timeout was caused by a signal wake up and returns one
      of the syscall restart error codes.
      
      In case of a spurious wake up the signal delivery code which deals
      with the restart error codes is not invoked and we return that error
      code to user space. That causes applications which actually check the
      return codes to fail. Blaise reported that on preempt-rt a python test
      program run into a exception trap. -rt exposed that due to a built in
      spurious wake up accelerator :)
      
      Solve this by checking signal_pending(current) in the wake up path and
      handle the spurious wake up case w/o returning to user space.
      Reported-by: default avatarBlaise Gassend <blaise@willowgarage.com>
      Debugged-by: default avatarDarren Hart <dvhltc@us.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      e68e25e6
    • Andre Przywara's avatar
      KVM: ignore reads from AMDs C1E enabled MSR · 866b5a4d
      Andre Przywara authored
      commit 1fdbd48c upstream.
      
      If the Linux kernel detects an C1E capable AMD processor (K8 RevF and
      higher), it will access a certain MSR on every attempt to go to halt.
      Explicitly handle this read and return 0 to let KVM run a Linux guest
      with the native AMD host CPU propagated to the guest.
      Signed-off-by: default avatarAndre Przywara <andre.przywara@amd.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      866b5a4d
    • Marcelo Tosatti's avatar
      KVM: use proper hrtimer function to retrieve expiration time · c66415b2
      Marcelo Tosatti authored
      commit ace15464 upstream.
      
      hrtimer->base can be temporarily NULL due to racing hrtimer_start.
      See switch_hrtimer_base/lock_hrtimer_base.
      
      Use hrtimer_get_remaining which is robust against it.
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      c66415b2
    • Yinghai Lu's avatar
      nfs: Fix nfs_parse_mount_options() kfree() leak · 9c367e53
      Yinghai Lu authored
      commit 4223a4a1 upstream.
      
      Fix a (small) memory leak in one of the error paths of the NFS mount
      options parsing code.
      
      Regression introduced in 2.6.30 by commit a67d18f8 (NFS: load the
      rpc/rdma transport module automatically).
      Reported-by: default avatarYinghai Lu <yinghai@kernel.org>
      Reported-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      9c367e53
    • Tejun Heo's avatar
      sata_nv: make sure link is brough up online when skipping hardreset · 76132171
      Tejun Heo authored
      commit 6489e326 upstream.
      
      prereset doesn't bring link online if hardreset is about to happen and
      nv_hardreset() may skip if conditions are not right so softreset may
      be entered with non-working link status if the system firmware didn't
      bring it up before entering OS code which can happen during resume.
      This patch makes nv_hardreset() to bring up the link if it's skipping
      reset.
      
      This bug was reported by frodone@gmail.com in the following bug entry.
      
        http://bugzilla.kernel.org/show_bug.cgi?id=14329Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: frodone@gmail.com
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      76132171
    • Tejun Heo's avatar
      libata: fix PMP initialization · 32f4683d
      Tejun Heo authored
      commit 4f7c2874 upstream.
      
      Commit 842faa6c fixed error handling
      during attach by not committing detected device class to dev->class
      while attaching a new device.  However, this change missed the PMP
      class check in the configuration loop causing a new PMP device to go
      through ata_dev_configure() as if it were an ATA or ATAPI device.
      
      As PMP device doesn't have a regular IDENTIFY data, this makes
      ata_dev_configure() tries to configure a PMP device using an invalid
      data.  For the most part, it wasn't too harmful and went unnoticed but
      this ends up clearing dev->flags which may have ATA_DFLAG_AN set by
      sata_pmp_attach().  This means that SATA_PMP_FEAT_NOTIFY ends up being
      disabled on PMPs and on PMPs which honor the flag breaks hotplug
      support.
      
      This problem was discovered and reported by Ethan Hsiao.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarEthan Hsiao <ethanhsiao@jmicron.com>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      32f4683d
    • Tejun Heo's avatar
      libata: fix internal command failure handling · 99823646
      Tejun Heo authored
      commit f4b31db9 upstream.
      
      When an internal command fails, it should be failed directly without
      invoking EH.  In the original implemetation, this was accomplished by
      letting internal command bypass failure handling in ata_qc_complete().
      However, later changes added post-successful-completion handling to
      that code path and the success path is no longer adequate as internal
      command failure path.  One of the visible problems is that internal
      command failure due to timeout or other freeze conditions would
      spuriously trigger WARN_ON_ONCE() in the success path.
      
      This patch updates failure path such that internal command failure
      handling is contained there.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      99823646
    • Yinghai Lu's avatar
      pci: increase alignment to make more space for hidden code · 2f37b165
      Yinghai Lu authored
      commit 15b812f1 upstream.
      
      As reported in
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=13940
      
      on some system when acpi are enabled, acpi clears some BAR for some
      devices without reason, and kernel will need to allocate devices for
      them.  It then apparently hits some undocumented resource conflict,
      resulting in non-working devices.
      
      Try to increase alignment to get more safe range for unassigned devices.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f37b165
    • Earl Chew's avatar
      fs: pipe.c null pointer dereference · c40ca2aa
      Earl Chew authored
      commit ad396024 upstream.
      
      This patch fixes a null pointer exception in pipe_rdwr_open() which
      generates the stack trace:
      
      > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
      >  [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70
      >  [<ffffffff8028125c>] __dentry_open+0x13c/0x230
      >  [<ffffffff8028143d>] do_filp_open+0x2d/0x40
      >  [<ffffffff802814aa>] do_sys_open+0x5a/0x100
      >  [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67
      
      The failure mode is triggered by an attempt to open an anonymous
      pipe via /proc/pid/fd/* as exemplified by this script:
      
      =============================================================
      while : ; do
         { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
         PID=$!
         OUT=$(ps -efl | grep 'sleep 1' | grep -v grep |
              { read PID REST ; echo $PID; } )
         OUT="${OUT%% *}"
         DELAY=$((RANDOM * 1000 / 32768))
         usleep $((DELAY * 1000 + RANDOM % 1000 ))
         echo n > /proc/$OUT/fd/1                 # Trigger defect
      done
      =============================================================
      
      Note that the failure window is quite small and I could only
      reliably reproduce the defect by inserting a small delay
      in pipe_rdwr_open(). For example:
      
       static int
       pipe_rdwr_open(struct inode *inode, struct file *filp)
       {
             msleep(100);
             mutex_lock(&inode->i_mutex);
      
      Although the defect was observed in pipe_rdwr_open(), I think it
      makes sense to replicate the change through all the pipe_*_open()
      functions.
      
      The core of the change is to verify that inode->i_pipe has not
      been released before attempting to manipulate it. If inode->i_pipe
      is no longer present, return ENOENT to indicate so.
      
      The comment about potentially using atomic_t for i_pipe->readers
      and i_pipe->writers has also been removed because it is no longer
      relevant in this context. The inode->i_mutex lock must be used so
      that inode->i_pipe can be dealt with correctly.
      Signed-off-by: default avatarEarl Chew <earl_chew@agilent.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      c40ca2aa
  2. 22 Oct, 2009 28 commits