Commits · cd18252d446e641e7b4adc59146c474df0ed7815 · linux / linux-davinci

28 Oct, 2009 11 commits

futex: Correct queue_me and unqueue_me commentary · cd18252d

Darren Hart authored Sep 21, 2009

The queue_me/unqueue_me commentary is oddly placed and out of date.
Clean it up and correct the inaccurate bits.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090922053015.8717.71713.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

cd18252d

futex: Move drop_futex_key_refs out of spinlock'ed region · b9e40b50

Darren Hart authored Oct 15, 2009

When requeuing tasks from one futex to another, the reference held
by the requeued task to the original futex location needs to be
dropped eventually.

Dropping the reference may ultimately lead to a call to
"iput_final" and subsequently call into filesystem- specific code -
which may be non-atomic.

It is therefore safer to defer this drop operation until after the
futex_hash_bucket spinlock has been dropped.

Originally-From: Helge Bahmann <hcb@chaoticmind.net>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: <stable@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Sven-Thorsten Dietrich <sdietrich@novell.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <4AD7A298.5040802@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

b9e40b50

futex: Add memory barrier commentary to futex_wait_queue_me() · d6617954

Darren Hart authored Sep 24, 2009

The memory barrier semantics of futex_wait_queue_me() are
non-obvious. Add some commentary to try and clarify it.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090924185447.694.38948.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

d6617954

futex: Correct futex_wait_requeue_pi() commentary · eb78fc39

Darren Hart authored Jul 31, 2009

The state machine described in the comments wasn't updated with
a follow-on fix.  Address that and cleanup the corresponding
commentary in the function.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <4A737C2A.9090001@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

eb78fc39

futex: Fix locking imbalance · 11bc48db

Thomas Gleixner authored Oct 04, 2009

Rich reported a lock imbalance in the futex code:

   http://bugzilla.kernel.org/show_bug.cgi?id=14288

It's caused by the displacement of the retry_private label in
futex_wake_op(). The code unlocks the hash bucket locks in the
error handling path and retries without locking them again which
makes the next unlock fail.

Move retry_private so we lock the hash bucket locks when we retry.
Reported-by: Rich Ercolany <rercola@acm.jhu.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: stable-2.6.31 <stable@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

11bc48db

futex: Correct futex_wait_requeue_pi() commentary · 9231abe1

Darren Hart authored Sep 21, 2009

Correct various typos and formatting inconsistencies in the
commentary of futex_wait_requeue_pi().
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090922052958.8717.21932.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9231abe1

futex: Make function kernel-doc commentary consistent · 6ca0f2a0

Darren Hart authored Sep 21, 2009

Make the existing function kernel-doc consistent throughout
futex.c, following Documentation/kernel-doc-nano-howto.txt as
closely as possible.

When unsure, at least be consistent within futex.c.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090922053022.8717.13339.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

6ca0f2a0

futex: Correct futex_q woken state commentary · 0699fd94

Darren Hart authored Sep 21, 2009

Use kernel-doc format to describe struct futex_q.

Correct the wakeup definition to eliminate the statement about
waking the waiter between the plist_del() and the q->lock_ptr = 0.

Note in the comment that PI futexes have a different definition of
the woken state.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090922053029.8717.62798.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

0699fd94

futex: Check for NULL keys in match_futex · 29b33bb7

Darren Hart authored Oct 14, 2009

If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
it will find the futex_q->requeue_pi_key to be NULL and OOPS.

Check for NULL in match_futex() instead of doing explicit NULL pointer
checks on all call sites.  While match_futex(NULL, NULL) returning
false is a little odd, it's still correct as we expect valid key
references.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Dinakar Guniguntala <dino@in.ibm.com>
CC: John Stultz <johnstul@us.ibm.com>
Cc: stable@kernel.org
LKML-Reference: <4AD60687.10306@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

29b33bb7

futex: Fix spurious wakeup for requeue_pi really · 43746940

Thomas Gleixner authored Oct 28, 2009

The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
NULL test) nor does it use the wake_list of futex_wake() which where
the reason for commit 41890f24 (futex: Handle spurious wake up)

See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>

The changes in this fix to the wait_requeue_pi path were considered to
be a likely unecessary, but harmless safety net. But it turns out that
due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
as EAGAIN we built an endless loop in the code path which returns
correctly EWOULDBLOCK.

Spurious wakeups in wait_requeue_pi code path are unlikely so we do
the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
it deal with the spurious wakeup.

Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
LKML-Reference: <4AE23C74.1090502@us.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

43746940

futex: Detect mismatched requeue targets · e814515d

Thomas Gleixner authored Oct 28, 2009

There is currently no check to ensure that userspace uses the same
futex requeue target (uaddr2) in futex_requeue() that the waiter used
in futex_wait_requeue_pi().  A mismatch here could very unexpected
results as the waiter assumes it either wakes on uaddr1 or uaddr2. We
could detect this on wakeup in the waiter, but the cleanup is more
intense after the improper requeue has occured.

This patch stores the waiter's expected requeue target in a new
requeue_pi_key pointer in the futex_q which futex_requeue() checks
prior to attempting to do a proxy lock acquistion or a requeue when
requeue_pi=1. If they don't match, return -EINVAL from futex_requeue,
aborting the requeue of any remaining waiters.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090814003650.14634.63916.stgit@Aeon>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Conflicts:

	kernel/futex.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

e814515d

20 Oct, 2009 2 commits

Merge branch 'rt/ipc' into rt/head · 6cb488b0
Thomas Gleixner authored Oct 20, 2009

6cb488b0

ipc: fix rt/non_rt imbalance · 1920d618

John Kacur authored Oct 15, 2009

commit 3c96a2 (ipc: Make the ipc code -rt aware) introduced a
imbalance of preempt_disable_rt vs. preempt_enable_nort. That results
in preempt count leak.

Make it symetric.
Reported-by: Joerg Abraham <Joerg.Abraham@alcatel-lucent.de>
Signed-off-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

1920d618

13 Oct, 2009 6 commits

x86: highmem: Restore the not so leftover function prototype · bfb35e3c

Thomas Gleixner authored Oct 13, 2009

commit e31b7991 (x86: highmem: Remove leftover function prototypes)
removed kmap_atomic_prot_pfn() which is not a leftover, but should
have been left where it was.

Restore it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

bfb35e3c

slab: Cover the numa aliens with the per cpu locked changes · d99f9884

Peter Zijlstra authored Oct 13, 2009

The numa aliens tear down is not covered by the per cpu locked changes
which we did to slab. Fix that.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

 mm/slab.c |   36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

d99f9884

futex: Handle spurious wake up · 41890f24

Thomas Gleixner authored Oct 13, 2009

The futex code does not handle spurious wake up in futex_wait and
futex_wait_requeue_pi.

The code assumes that any wake up which was not caused by futex_wake /
requeue or by a timeout was caused by a signal wake up and returns one
of the syscall restart error codes.

In case of a spurious wake up the signal delivery code which deals
with the restart error codes is not invoked and we return that error
code to user space. That causes applications which actually check the
return codes to fail. Blaise reported that on preempt-rt a python test
program run into a exception trap. -rt exposed that due to a built in
spurious wake up accelerator :)

Solve this by checking signal_pending(current) in the wake up path and
handle the spurious wake up case w/o returning to user space.
Reported-by: Blaise Gassend <blaise@willowgarage.com>
Debugged-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org
LKML-Reference: <new-submission>

Conflicts:

	kernel/futex.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

41890f24

softirq: Name hrtimer softirq also when CONFIG_HIGH_RES_TIMERS=n · d2638539

Thomas Gleixner authored Oct 13, 2009

Remy Bohmer pointed out that we create the hrtimer softirq thread even
when CONFIG_HIGH_RES_TIMERS is off. That results in a softirq-NULL
name for the thread. The thread is needed on -rt
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

d2638539

softirq: Revert "softirq: Do not create hrtimer softirq thread..." · 6b3d1c1f

Thomas Gleixner authored Oct 13, 2009

This reverts commit d69c5d37. The
softirq is necessary even in the CONFIG_HIGH_RES_TIMERS=n case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

6b3d1c1f

futex: Revert "futex: Wake up waiter outside the hb->lock section" · 370eaf38

Thomas Gleixner authored Oct 13, 2009

This reverts commit 928686b7.

The patch was an optimization of the old futex wake code where we woke
the waiter and then set q->lock_ptr to NULL. When the waiter preempted
the waker then we run into lock contention on q->lock_ptr
aka. hb->lock.

commit f1a11e (futex: remove the wait queue) changes the wakeup logic
by setting q->lock_ptr to NULL _before_ waking the task. It keeps a
reference on the task struct of the to be woken task to avoid an exit
race.

The combination of both patches resulted in different race on -RT:

    A is blocked on futex
    B calls futex_wake
    B sets q(A)->lock_ptr to NULL and puts A on the wake list
    B is preempted
    ...
    A wakes up (e.g. timer, signal)
    A detects q->lock_ptr = NULL and returns
    A waits on a different futex

    B is scheduled back in
    B wakes A
    A sees a spurious wake up
Reported-by: Blaise Gassend <blaise@willowgarage.com>
Debugged-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

 enter the commit message for your changes. Lines starting

370eaf38

12 Oct, 2009 1 commit

futex: Fix wakeup race by setting TASK_INTERRUPTIBLE before queue_me() · a03d1035

Darren Hart authored Sep 21, 2009

PI futexes do not use the same plist_node_empty() test for wakeup.
It was possible for the waiter (in futex_wait_requeue_pi()) to set
TASK_INTERRUPTIBLE after the waker assigned the rtmutex to the
waiter. The waiter would then note the plist was not empty and call
schedule(). The task would not be found by any subsequeuent futex
wakeups, resulting in a userspace hang.

By moving the setting of TASK_INTERRUPTIBLE to before the call to
queue_me(), the race with the waker is eliminated. Since we no
longer call get_user() from within queue_me(), there is no need to
delay the setting of TASK_INTERRUPTIBLE until after the call to
queue_me().

The FUTEX_LOCK_PI operation is not affected as futex_lock_pi()
relies entirely on the rtmutex code to handle schedule() and
wakeup.  The requeue PI code is affected because the waiter starts
as a non-PI waiter and is woken on a PI futex.

Remove the crusty old comment about holding spinlocks() across
get_user() as we no longer do that. Correct the locking statement
with a description of why the test is performed.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20090922053038.8717.97838.stgit@Aeon>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

a03d1035

09 Oct, 2009 1 commit

x86: highmem: Remove leftover function prototypes · e31b7991

Thomas Gleixner authored Oct 09, 2009

-RT replaces kmap_atomic* functions with macros, but we kept the
 function prototypes around. Remove them.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

e31b7991

08 Oct, 2009 3 commits

futex: Move exit_pi_state() call to release_mm() · f39bec65

Thomas Gleixner authored Oct 05, 2009

exit_pi_state() is called from do_exit() but not from do_execve().
Move it to release_mm() so it gets called from do_execve() as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: stable@kernel.org
Cc: Anirban Sinha <ani@anirban.org>
Cc: Peter Zijlstra <peterz@infradead.org>

f39bec65

futex: Nullify robust lists after cleanup · a7b1a075

Peter Zijlstra authored Oct 05, 2009

The robust list pointers of user space held futexes are kept intact
over an exec() call. When the exec'ed task exits exit_robust_list() is
called with the stale pointer. The risk of corruption is minimal, but
still it is incorrect to keep the pointers valid. Actually glibc
should uninstall the robust list before calling exec() but we have to
deal with it anyway.

Nullify the pointers after [compat_]exit_robust_list() has been
called.
Reported-by: Anirban Sinha <ani@anirban.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: stable@kernel.org

a7b1a075

NOHZ: update idle state also when NOHZ is inactive · cfa0c194

Eero Nurkkala authored Oct 07, 2009

Commit f2e21c96 had unfortunate side
effects with cpufreq governors on some systems.

If the system did not switch into NOHZ mode ts->inidle is not set when
tick_nohz_stop_sched_tick() is called from the idle routine. Therefor
all subsequent calls from irq_exit() to tick_nohz_stop_sched_tick()
fail to call tick_nohz_start_idle(). This results in bogus idle
accounting information which is passed to cpufreq governors.

Set the inidle flag unconditionally of the NOHZ active state to keep
the idle time accounting correct in any case.

[ tglx: Added comment and tweaked the changelog ]
Reported-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Eero Nurkkala <ext-eero.nurkkala@nokia.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Steven Noonan <steven@uplinklabs.net>
Cc: stable@kernel.org
LKML-Reference: <1254907901.30157.93.camel@eenurkka-desktop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

cfa0c194

07 Oct, 2009 1 commit

futex: fix requeue_pi key imbalance · 6b2396bb

Darren Hart authored Oct 07, 2009

If futex_wait_requeue_pi() wakes prior to requeue, we drop the
reference to the source futex_key twice, once in
handle_early_requeue_pi_wakeup() and once on our way out.

Remove the drop from the handle_early_requeue_pi_wakeup() and keep
the get/drops together in futex_wait_requeue_pi().
Reported-by: Helge Bahmann <hcb@chaoticmind.net>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Helge Bahmann <hcb@chaoticmind.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: stable-2.6.31 <stable@kernel.org>
LKML-Reference: <4ACCE21E.5030805@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

6b2396bb

06 Oct, 2009 1 commit

softirq: Do not create hrtimer softirq thread when CONFIG_HIGH_RES_TIMERS=n · d69c5d37

Thomas Gleixner authored Oct 06, 2009

Remy Bohmer pointed out that we create the hrtimer softirq thread even
when CONFIG_HIGH_RES_TIMERS is off. That results in a softirq-NULL
name for the thread.

Skip the thread creation/wakeup/teardown when CONFIG_HIGH_RES_TIMERS=n
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

d69c5d37

04 Oct, 2009 1 commit

net: Fix netfilter percpu assumptions for real · 00ef66eb

Thomas Gleixner authored Oct 04, 2009

commit 21ece08c (net: fix the xtables smp_processor_id assumptions for
-rt) fixed only half of the problem. The filter functions might run in
thread context and can be preempted and migrated on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

00ef66eb

18 Sep, 2009 1 commit

x86: Disable SPARSE_IRQ, DMAR, INTR_REMAP when PREEMPT_RT=y · 82c07cbb

Thomas Gleixner authored Sep 18, 2009

Memory allocations in irq/preempt disabled regions is the main cause
of grief with these features. Needs some real work to get that solved.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

82c07cbb

17 Sep, 2009 1 commit

latencytop: Convert latency_lock to atomic_spinlock · 0dfea57f

Thomas Gleixner authored Sep 17, 2009

latency_lock is taken in the guts of the scheduler code and needs to
be a real spinlock on RT. convert it to atomic_spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

0dfea57f

15 Sep, 2009 7 commits

tracing: Add histograms of potential and effective wakeup latencies · cc2fa446

Carsten Emde authored Sep 15, 2009

Resuscitated and enhanced the kernel latency histograms provided
originally by Yi Yang and adapted and converted by Steven Rostedt.

Latency histograms in the current version
- can be enabled online and independently
- have virtually no performance penalty when configured but not enabled
- have very little performance penalty when enabled
- use already available wakeup and switch tracepoints
- give corresponding results with the related tracer
- allow to record wakeup latency histograms of a single process
- record the process where the highest wakeup latency occurred 
- are documented in Documentation/trace/histograms.txt
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4AAEDDD5.4040505@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

cc2fa446

mm: Fix the non-RT version of swap_get_cpu_var · edc1a1c3

Wu Zhangjin authored Sep 04, 2009

The commit(f8382688) have converted swap to percpu locked, the
non-RT version of swap_get_cpu_var should be the same as the old
implementation, but in reality, it not works as the old one:

...
+#define swap_get_cpu_var(var, cpu)                     \
+       ({                                              \
+               (void)cpu;                              \
+               &get_cpu_var(var);                      \
+        })
...
 void __lru_cache_add(struct page *page, enum lru_list lru)
 {
-       struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru];
+       struct pagevec *pvec;
+       int cpu;

+       pvec = swap_get_cpu_var(lru_add_pvecs, cpu)[lru];
        page_cache_get(page);
        if (!pagevec_add(pvec, page))
                ____pagevec_lru_add(pvec, lru);
-       put_cpu_var(lru_add_pvecs);
+       swap_put_cpu_var(lru_add_pvecs, cpu);
 }

Here is the point, the old version:

      pvec = &get_cpu_var(lru_add_pvecs)[lru];
     	   = & (get_cpu_var(lru_add_pvecs)[lru]);

new version from commit f8382688:

      pvec = ({ (void)cpu; &get_cpu_var(lru_add_pvecs); })[lru];
           = (&get_cpu_var(lru_add_pvecs)) [lru];

so, we can see, these two are really different. and it made the non-RT boot
fail:

...

ide-gd driver 1.18
hda: max request size: 512KiB
hda: 312581808 sectors (160041 MB) w/8192KiB Cache, CHS=19457/255/63
hda: cache flushes supported
 hda:Unhandled kernel unaligned access[#1]:
Cpu 0
$ 0   : 0000000000000000 000000001400c4e1 98000000013699d0 0000000000000000
$ 4   : 0000000000000000 98000000be04f980 0000000000000010 000000007fd78b57
$ 8   : 0000000000000001 0000000000200200 0000000000100100 98000000be00f210
$12   : 000000001400c4e1 000000001000001e ffffffffffffffff 98000000bd0180a8
$16   : 0000000000000000 98000000be04f998 fffffffb81a72600 ffffffff802d3270
$20   : 0000000000000000 0000000000000000 ffffffffffffffef ffffffff80667a90
$24   : 0000000000000228 ffffffff803ded88
$28   : 98000000be04c000 98000000be04f950 00000000003fffff ffffffff80200404
Hi    : 0000000000000000
Lo    : 0000000000000320
epc   : ffffffff80217194 do_ade+0x298/0x3bc
    Not tainted
ra    : ffffffff80200404 ret_from_exception+0x0/0x10
Status: 1400c4e3    KX SX UX KERNEL EXL IE
Cause : 00000014
BadVA : fffffffb81a72607
PrId  : 00006303 (ICT Loongson-2)
Modules linked in:
Process swapper (pid: 1, threadinfo=98000000be04c000, task=98000000be04b7d8, tls=0000000000000000)
Stack : ffffffff80666f60 ffffffff8025165c 98000000013699d0 0000000000000001
        98000000bd00ae30 ffffffff80200404 0000000000000000 000000001400c4e1
        000000007fd78b57 98000000013699d0 ffffffff80638070 0000000000000002
        fffffffb81a72600 000000007fd78b57 0000000000000001 0000000000200200
        0000000000100100 98000000be00f210 98000000be00f220 000000000000001d
        ffffffffffffffff 98000000bd0180a8 98000000013699d0 0000000000000001
        98000000bd00ae30 ffffffff802d3270 0000000000000000 0000000000000000
        ffffffffffffffef ffffffff80667a90 0000000000000228 ffffffff803ded88
        98000000bd00ae30 98000000bd00ae30 98000000be04c000 98000000be04fab0
        00000000003fffff ffffffff80275350 000000001400c4e3 0000000000000000
        ...
Call Trace:
[<ffffffff80217194>] do_ade+0x298/0x3bc
[<ffffffff80200404>] ret_from_exception+0x0/0x10
[<ffffffff8027fafc>] __lru_cache_add+0x94/0xd8
[<ffffffff80275350>] add_to_page_cache_lru+0x84/0xa8
[<ffffffff80275520>] read_cache_page_async+0xa8/0x1dc
[<ffffffff80275664>] read_cache_page+0x10/0x74
[<ffffffff802fed34>] read_dev_sector+0x34/0xe0
[<ffffffff802ff96c>] adfspart_check_ICS+0x44/0x1b0
[<ffffffff802ff6e4>] rescan_partitions+0x178/0x3a8
[<ffffffff802d3840>] __blkdev_get+0x238/0x318
[<ffffffff802feeb0>] register_disk+0xd0/0x15c
[<ffffffff8037c1e8>] add_disk+0xcc/0x128
[<ffffffff803fcbc4>] ide_gd_probe+0x170/0x1d0
[<ffffffff803e6e08>] driver_probe_device+0xbc/0x180
[<ffffffff803e6f38>] __driver_attach+0x6c/0xa4
[<ffffffff803e6508>] bus_for_each_dev+0x58/0xa4
[<ffffffff803e5bbc>] bus_add_driver+0xc8/0x284
[<ffffffff803e72d8>] driver_register+0xc4/0x17c
[<ffffffff8020fa5c>] do_one_initcall+0x64/0x18c
[<ffffffff806701d8>] kernel_init+0xe0/0x14c
[<ffffffff80212e5c>] kernel_thread_helper+0x10/0x18

Code: 001188f8  00b1882d  de220000 <b2420007> b6420000  24120000  1640000c  00a0202d  8ca20120
Disabling lock debugging due to kernel taint
note: swapper[1] exited with preempt_count 1
Kernel panic - not syncing: Attempted to kill init!

This patch will keep the swap_get_cpu_var as the one before commit f8382688,
and put "(void)cpu;" to swap_put_cpu_var() to avoid warning about unused
variable.
Signed-off-by: Wu Zhangjin <wuzhangjin@gmail.com>
LKML-Reference: <1252034522-32653-1-git-send-email-wuzhangjin@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

edc1a1c3

x86: Suppress empty cpumask ipi warning · 9425c8dc

Thomas Gleixner authored Sep 15, 2009

We know already which code pathes trigger this so we can safely
disable it again and just keep the early return when mask == 0.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

9425c8dc

kvm: Move get_cpu inside of spinlocked region in make_all_cpus_request() · b079dc12

Thomas Gleixner authored Sep 15, 2009

kvm->requests_lock is a sleeping lock in RT, but it's locked inside
the preempt disabled region of get_cpu(). Move the get_cpu() region
inside the spinlocked region to avoid the might sleep warning.

BUG: sleeping function called from invalid context at kernel/rtmutex.c:684
in_atomic(): 1, irqs_disabled(): 0, pid: 10670, name: qemu-kvm
Pid: 10670, comm: qemu-kvm Not tainted 2.6.31-rc9-rt9.1-32bit #47
Call Trace:
[<c022a88a>] __might_sleep+0xcb/0xd0
[<c0498bd9>] rt_spin_lock+0x29/0x5e
[<f9161b54>] make_all_cpus_request+0x36/0xb2 [kvm]
[<f9161bf6>] kvm_flush_remote_tlbs+0x12/0x1f [kvm]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Carsten Emde <carsten.emde@osadl.org>

b079dc12

rt: Fix rwlocks/rwsem rt_[down_]read_trylock() · 00261a94

Thomas Gleixner authored Sep 14, 2009

rt_read_trylock() and rt_down_read_trylock() take the lock / semaphore
unconditionally when it is write locked. Check read_depth if current
owns the lock. If it's 0 we know it is write locked and return 0.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

00261a94

perf_counter: Fix buffer overflow in perf_copy_attr() · 0188eb59

Xiao Guangrong authored Sep 15, 2009

If we pass a big size data over perf_counter_open() syscall,
the kernel will copy this data to a small buffer, it will
cause kernel crash.

This bug makes the kernel unsafe and non-root local user can
trigger it.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org>
LKML-Reference: <4AAF37D4.5010706@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

0188eb59

time: Prevent 32 bit overflow with set_normalized_timespec() · f7ada8bf

Thomas Gleixner authored Sep 14, 2009

set_normalized_timespec() nsec argument is of type long. The recent
timekeeping changes of ktime_get_ts() feed

	ts->tv_nsec + tomono.tv_nsec + nsecs

to set_normalized_timespec(). On 32 bit machines that sum can be
larger than (1 << 31) and therefor result in a negative value which
screws up the result completely.

Make the nsec argument of set_normalized_timespec() s64 to fix the
problem at hand. This also prevents similar problems for future users
of set_normalized_timespec().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Carsten Emde <carsten.emde@osadl.org>
LKML-Reference: <new-submission>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>

f7ada8bf

14 Sep, 2009 4 commits

Merge branch 'rt/trace' into rt/head · 744e5326
Thomas Gleixner authored Sep 14, 2009

744e5326

Merge branch 'tip/tracing/core' of... · a5d1c78f

Thomas Gleixner authored Sep 14, 2009

Merge branch 'tip/tracing/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into rt/trace

a5d1c78f

tracing: make testing syscall events a separate configuration · 1f5a6b45

Steven Rostedt authored Sep 14, 2009

Parag noticed that the number of event tests has increased tremendously:

grep "Testing event" dmesg.31rc9 |wc -l
100

grep "Testing event" dmesg.31git |wc -l
1172

This is due to the testing of every syscall event when ftrace self
test is enabled. This adds a bit more time to kernel boot up and can
affect development by slowing down the time it takes between reboots.

This option makes the testing of the syscall events into a separate
config, to still be able to test most of ftrace internals at boot up
but not have to wait for all the syscall events to be tested.

The syscall event testing only tests the enabling and disabling of
the trace point, since the syscalls are not executed. What really needs
to be done is to somehow have a userspace tool test the syscall tracepoints
as well.
Reported-by: Parag Warudkar <parag.lkml@gmail.com>
LKML-Reference: <f7848160909130815l3e768a30n3b28808bbe5c254b@mail.gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

1f5a6b45

Merge branch 'rt/trace' into rt/head · 499027af

Thomas Gleixner authored Sep 14, 2009

Conflicts:
	kernel/trace/ring_buffer.c
	kernel/trace/trace.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

499027af