- 06 Nov, 2009 2 commits
-
-
David Howells authored
The destination keyring specified to request_key() and co. is made available to the process that instantiates the key (the slave process started by /sbin/request-key typically). This is passed in the request_key_auth struct as the dest_keyring member. keyctl_instantiate_key and keyctl_negate_key() call get_instantiation_keyring() to get the keyring to attach the newly constructed key to at the end of instantiation. This may be given a specific keyring into which a link will be made later, or it may be asked to find the keyring passed to request_key(). In the former case, it returns a keyring with the refcount incremented by lookup_user_key(); in the latter case, it returns the keyring from the request_key_auth struct - and does _not_ increment the refcount. The latter case will eventually result in an oops when the keyring prematurely runs out of references and gets destroyed. The effect may take some time to show up as the key is destroyed lazily. To fix this, the keyring returned by get_instantiation_keyring() must always have its refcount incremented, no matter where it comes from. This can be tested by setting /etc/request-key.conf to: #OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ... #====== ======= =============== =============== =============================== create * test:* * |/bin/false %u %g %d %{user:_display} negate * * * /bin/keyctl negate %k 10 @u and then doing: keyctl add user _display aaaaaaaa @u while keyctl request2 user test:x test:x @u && keyctl list @u; do keyctl request2 user test:x test:x @u; sleep 31; keyctl list @u; done which will oops eventually. Changing the negate line to have @u rather than %S at the end is important as that forces the latter case by passing a special keyring ID rather than an actual keyring ID. Reported-by: Alexander Zangerl <az@bond.edu.au> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Alexander Zangerl <az@bond.edu.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Earl Chew authored
This patch fixes a null pointer exception in pipe_rdwr_open() which generates the stack trace: > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: > [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70 > [<ffffffff8028125c>] __dentry_open+0x13c/0x230 > [<ffffffff8028143d>] do_filp_open+0x2d/0x40 > [<ffffffff802814aa>] do_sys_open+0x5a/0x100 > [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67 The failure mode is triggered by an attempt to open an anonymous pipe via /proc/pid/fd/* as exemplified by this script: ============================================================= while : ; do { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } & PID=$! OUT=$(ps -efl | grep 'sleep 1' | grep -v grep | { read PID REST ; echo $PID; } ) OUT="${OUT%% *}" DELAY=$((RANDOM * 1000 / 32768)) usleep $((DELAY * 1000 + RANDOM % 1000 )) echo n > /proc/$OUT/fd/1 # Trigger defect done ============================================================= Note that the failure window is quite small and I could only reliably reproduce the defect by inserting a small delay in pipe_rdwr_open(). For example: static int pipe_rdwr_open(struct inode *inode, struct file *filp) { msleep(100); mutex_lock(&inode->i_mutex); Although the defect was observed in pipe_rdwr_open(), I think it makes sense to replicate the change through all the pipe_*_open() functions. The core of the change is to verify that inode->i_pipe has not been released before attempting to manipulate it. If inode->i_pipe is no longer present, return ENOENT to indicate so. The comment about potentially using atomic_t for i_pipe->readers and i_pipe->writers has also been removed because it is no longer relevant in this context. The inode->i_mutex lock must be used so that inode->i_pipe can be dealt with correctly. Signed-off-by: Earl Chew <earl_chew@agilent.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- 29 Oct, 2009 17 commits
-
-
Dinakar Guniguntala authored
This crash: [ 1774.088275] divide error: 0000 [#1] SMP [ 1774.100355] CPU 13 [ 1774.102498] Modules linked in: [ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN [ 1774.114807] RIP: 0010:[<ffffffff81041c38>] [<ffffffff81041c38>] sched_balance_self+0x19b/0x2d4 Triggers because update_group_power() modifies the sd tree and does temporary calculations there - not considering that other CPUs could observe intermediate values, such as the zero initial value. Calculate it in a temporary variable instead. (we need no memory barrier as these are all statistical values anyway) Got the same oops with the backport to -rt Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
wake_affine() would always fail under low-load situations where both prev and this were idle, because adding a single task will always be a significant imbalance, even if there's nothing around that could balance it. Deal with this by allowing imbalance when there's nothing you can do about it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Dinakar Guniguntala authored
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
A more readable version, with a few differences: - don't check against the root domain, but instead check SD_LOAD_BALANCE - don't re-iterate the cpus already iterated on the previous SD - use rcu_read_lock() around the sd iteration Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
Hopefully a more readable version of the same. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
APERF/MPERF support for cpu_power. APERF/MPERF is arch defined to be a relative scale of work capacity per logical cpu, this is assumed to include SMT and Turbo mode. APERF/MPERF are specified to both reset to 0 when either counter wraps, which is highly inconvenient, since that'll give a blimp when that happens. The manual specifies writing 0 to the counters after each read, but that's 1) too expensive, and 2) destroys the possibility of sharing these counters with other users, so we live with the blimp - the other existing user does too. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
[ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Dinakar Guniguntala authored
Move some of the aperf/mperf code out from the cpufreq driver thingy so that other people can enjoy it too. Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
Move the APERFMPERF capacility into a X86_FEATURE flag so that it can be used outside of the acpi cpufreq driver. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
Its a source of fail, also, now that cpu_power is dynamical, its a waste of time. before: <idle>-0 [000] 132.877936: find_busiest_group: avg_load: 0 group_load: 8241 power: 1 after: bash-1689 [001] 137.862151: find_busiest_group: avg_load: 10636288 group_load: 10387 power: 1 [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> [andreas.herrmann3@amd.com: remove include] Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
When the capacity drops low, we want to migrate load away. Allow the load-balancer to remove all tasks when we hit rock bottom. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> [ego@in.ibm.com: fix to update_sd_power_savings_stats] Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
Keep an average on the amount of time spend on RT tasks and use that fraction to scale down the cpu_power for regular tasks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
Recompute the cpu_power for each cpu during load-balance Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
The idea is that multi-threading a core yields more work capacity than a single thread, provide a way to express a static gain for threads. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
In order to prepare for a more dynamic cpu_power, update the group sum while walking the sched domains during load-balance. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
Do the placement thing using SD flags XXX: consider degenerate bits [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
cpu_power is supposed to be a representation of the process capacity of the cpu, not a value to randomly tweak in order to affect placement. Remove the placement hacks. [ dino: backport to 31-rt ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
- 28 Oct, 2009 11 commits
-
-
Darren Hart authored
The queue_me/unqueue_me commentary is oddly placed and out of date. Clean it up and correct the inaccurate bits. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922053015.8717.71713.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Darren Hart authored
When requeuing tasks from one futex to another, the reference held by the requeued task to the original futex location needs to be dropped eventually. Dropping the reference may ultimately lead to a call to "iput_final" and subsequently call into filesystem- specific code - which may be non-atomic. It is therefore safer to defer this drop operation until after the futex_hash_bucket spinlock has been dropped. Originally-From: Helge Bahmann <hcb@chaoticmind.net> Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: <stable@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@linux.vnet.ibm.com> Cc: Sven-Thorsten Dietrich <sdietrich@novell.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <4AD7A298.5040802@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Darren Hart authored
The memory barrier semantics of futex_wait_queue_me() are non-obvious. Add some commentary to try and clarify it. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090924185447.694.38948.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Darren Hart authored
The state machine described in the comments wasn't updated with a follow-on fix. Address that and cleanup the corresponding commentary in the function. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <4A737C2A.9090001@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Thomas Gleixner authored
Rich reported a lock imbalance in the futex code: http://bugzilla.kernel.org/show_bug.cgi?id=14288 It's caused by the displacement of the retry_private label in futex_wake_op(). The code unlocks the hash bucket locks in the error handling path and retries without locking them again which makes the next unlock fail. Move retry_private so we lock the hash bucket locks when we retry. Reported-by: Rich Ercolany <rercola@acm.jhu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: stable-2.6.31 <stable@kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Darren Hart authored
Correct various typos and formatting inconsistencies in the commentary of futex_wait_requeue_pi(). Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922052958.8717.21932.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Darren Hart authored
Make the existing function kernel-doc consistent throughout futex.c, following Documentation/kernel-doc-nano-howto.txt as closely as possible. When unsure, at least be consistent within futex.c. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922053022.8717.13339.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Darren Hart authored
Use kernel-doc format to describe struct futex_q. Correct the wakeup definition to eliminate the statement about waking the waiter between the plist_del() and the q->lock_ptr = 0. Note in the comment that PI futexes have a different definition of the woken state. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922053029.8717.62798.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
Darren Hart authored
If userspace tries to perform a requeue_pi on a non-requeue_pi waiter, it will find the futex_q->requeue_pi_key to be NULL and OOPS. Check for NULL in match_futex() instead of doing explicit NULL pointer checks on all call sites. While match_futex(NULL, NULL) returning false is a little odd, it's still correct as we expect valid key references. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Dinakar Guniguntala <dino@in.ibm.com> CC: John Stultz <johnstul@us.ibm.com> Cc: stable@kernel.org LKML-Reference: <4AD60687.10306@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr == NULL test) nor does it use the wake_list of futex_wake() which where the reason for commit 41890f24 (futex: Handle spurious wake up) See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com> The changes in this fix to the wait_requeue_pi path were considered to be a likely unecessary, but harmless safety net. But it turns out that due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined as EAGAIN we built an endless loop in the code path which returns correctly EWOULDBLOCK. Spurious wakeups in wait_requeue_pi code path are unlikely so we do the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let it deal with the spurious wakeup. Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Stultz <johnstul@linux.vnet.ibm.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> LKML-Reference: <4AE23C74.1090502@us.ibm.com> Cc: stable@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
There is currently no check to ensure that userspace uses the same futex requeue target (uaddr2) in futex_requeue() that the waiter used in futex_wait_requeue_pi(). A mismatch here could very unexpected results as the waiter assumes it either wakes on uaddr1 or uaddr2. We could detect this on wakeup in the waiter, but the cleanup is more intense after the improper requeue has occured. This patch stores the waiter's expected requeue target in a new requeue_pi_key pointer in the futex_q which futex_requeue() checks prior to attempting to do a proxy lock acquistion or a requeue when requeue_pi=1. If they don't match, return -EINVAL from futex_requeue, aborting the requeue of any remaining waiters. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090814003650.14634.63916.stgit@Aeon> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Conflicts: kernel/futex.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
- 20 Oct, 2009 2 commits
-
-
Thomas Gleixner authored
-
John Kacur authored
commit 3c96a2 (ipc: Make the ipc code -rt aware) introduced a imbalance of preempt_disable_rt vs. preempt_enable_nort. That results in preempt count leak. Make it symetric. Reported-by: Joerg Abraham <Joerg.Abraham@alcatel-lucent.de> Signed-off-by: John Kacur <jkacur@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
- 13 Oct, 2009 6 commits
-
-
Thomas Gleixner authored
commit e31b7991 (x86: highmem: Remove leftover function prototypes) removed kmap_atomic_prot_pfn() which is not a leftover, but should have been left where it was. Restore it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Peter Zijlstra authored
The numa aliens tear down is not covered by the per cpu locked changes which we did to slab. Fix that. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> mm/slab.c | 36 ++++++++++++++++++++++++++++-------- 1 file changed, 28 insertions(+), 8 deletions(-)
-
Thomas Gleixner authored
The futex code does not handle spurious wake up in futex_wait and futex_wait_requeue_pi. The code assumes that any wake up which was not caused by futex_wake / requeue or by a timeout was caused by a signal wake up and returns one of the syscall restart error codes. In case of a spurious wake up the signal delivery code which deals with the restart error codes is not invoked and we return that error code to user space. That causes applications which actually check the return codes to fail. Blaise reported that on preempt-rt a python test program run into a exception trap. -rt exposed that due to a built in spurious wake up accelerator :) Solve this by checking signal_pending(current) in the wake up path and handle the spurious wake up case w/o returning to user space. Reported-by: Blaise Gassend <blaise@willowgarage.com> Debugged-by: Darren Hart <dvhltc@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@kernel.org LKML-Reference: <new-submission> Conflicts: kernel/futex.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
Remy Bohmer pointed out that we create the hrtimer softirq thread even when CONFIG_HIGH_RES_TIMERS is off. That results in a softirq-NULL name for the thread. The thread is needed on -rt Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
This reverts commit d69c5d37. The softirq is necessary even in the CONFIG_HIGH_RES_TIMERS=n case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Thomas Gleixner authored
This reverts commit 928686b7. The patch was an optimization of the old futex wake code where we woke the waiter and then set q->lock_ptr to NULL. When the waiter preempted the waker then we run into lock contention on q->lock_ptr aka. hb->lock. commit f1a11e (futex: remove the wait queue) changes the wakeup logic by setting q->lock_ptr to NULL _before_ waking the task. It keeps a reference on the task struct of the to be woken task to avoid an exit race. The combination of both patches resulted in different race on -RT: A is blocked on futex B calls futex_wake B sets q(A)->lock_ptr to NULL and puts A on the wake list B is preempted ... A wakes up (e.g. timer, signal) A detects q->lock_ptr = NULL and returns A waits on a different futex B is scheduled back in B wakes A A sees a spurious wake up Reported-by: Blaise Gassend <blaise@willowgarage.com> Debugged-by: Darren Hart <dvhltc@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> enter the commit message for your changes. Lines starting
-
- 12 Oct, 2009 1 commit
-
-
Darren Hart authored
PI futexes do not use the same plist_node_empty() test for wakeup. It was possible for the waiter (in futex_wait_requeue_pi()) to set TASK_INTERRUPTIBLE after the waker assigned the rtmutex to the waiter. The waiter would then note the plist was not empty and call schedule(). The task would not be found by any subsequeuent futex wakeups, resulting in a userspace hang. By moving the setting of TASK_INTERRUPTIBLE to before the call to queue_me(), the race with the waker is eliminated. Since we no longer call get_user() from within queue_me(), there is no need to delay the setting of TASK_INTERRUPTIBLE until after the call to queue_me(). The FUTEX_LOCK_PI operation is not affected as futex_lock_pi() relies entirely on the rtmutex code to handle schedule() and wakeup. The requeue PI code is affected because the waiter starts as a non-PI waiter and is woken on a PI futex. Remove the crusty old comment about holding spinlocks() across get_user() as we no longer do that. Correct the locking statement with a description of why the test is performed. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090922053038.8717.97838.stgit@Aeon> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-
- 09 Oct, 2009 1 commit
-
-
Thomas Gleixner authored
-RT replaces kmap_atomic* functions with macros, but we kept the function prototypes around. Remove them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-