Commits · 28303e27fd7d323850dde97b6e06cf727611f348 · linux / linux-davinci

29 Jul, 2009 40 commits

fs: Make jbd assertions smp only · 28303e27

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

28303e27

fs: jbd: replace bh_state lock · c0dd527e

Steven Rostedt authored Jul 03, 2009

I was compiling a kernel in a shell that I set to a priority of 20,
and it locked up on the bit_spin_lock crap of jbd.

This patch adds another spinlock to the buffer head and uses that
instead of the bit_spins.

From: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

--

 fs/buffer.c                 |    3 ++-
 include/linux/buffer_head.h |    1 +
 include/linux/jbd.h         |   12 ++++++------
 3 files changed, 9 insertions(+), 7 deletions(-)

c0dd527e

fs: replace bh_uptodate_lock for -rt · da29da14

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

da29da14

sched: Fix spurious load spikes · b97d6747

Luis Claudio R. Goncalves authored Jul 03, 2009

Fixes spurious system load spikes observed in /proc/loadavgrt, as described in:

Bug 253103: /proc/loadavgrt issues weird results
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=253103Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

b97d6747

sched: fix rt stats output · 59f677c0

Ankita Garg authored Jul 03, 2009

So, I have merged my previous patch (to display rt_nr_running info in
sched_debug.c) with this one.
Signed-off-by: Ankita Garg <ankita@in.ibm.com>
[mingo@elte.hu: fix it to work on !SCHEDSTATS too]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
--
 kernel/sched_debug.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

59f677c0

sched: Fix spurious system load spikes in /proc/loadavgrt · 6c24cddb

Luis Claudio R. Goncalves authored Jul 03, 2009

Hello,

The values in /proc/loadavgrt are sometimes the real load and sometimes
garbage. As you can see in th tests below, it occurs from in 2.6.21.5-rt20
to 2.6.23-rc2-rt2. The code for calc_load(), in kernel/timer.c has not
changed much in -rt patches.

        [lclaudio@lab sandbox]$ ls /proc/loadavg*
        /proc/loadavg  /proc/loadavgrt
        [lclaudio@lab sandbox]$ uname -a
        Linux lab.casa 2.6.21-34.el5rt #1 SMP PREEMPT RT Thu Jul 12 15:26:48 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
        [lclaudio@lab sandbox]$ cat /proc/loadavg*
        4.57 4.90 4.16 3/146 23499
        0.44 0.98 1.78 0/146 23499
        ...
        [lclaudio@lab sandbox]$ cat /proc/loadavg*
        4.65 4.80 4.75 5/144 20720
        23896.04 -898421.23 383170.94 2/144 20720

        [root@neverland ~]# uname -a
        Linux neverland.casa 2.6.21.5-rt20 #2 SMP PREEMPT RT Fri Jul 1318:31:38 BRT 2007 i686 athlon i386 GNU/Linux
        [root@neverland ~]# cat /proc/loadavg*
        0.16 0.16 0.15 1/184 11240
        344.65 0.38 311.71 0/184 11240

        [williams@torg ~]$ uname -a
        Linux torg 2.6.23-rc2-rt2 #14 SMP PREEMPT RT Tue Aug 7 20:07:31 CDT 2007 x86_64 x86_64 x86_64 GNU/Linux
        [williams@torg ~]$ cat /proc/loadavg*
        0.88 0.76 0.57 1/257 7267
        122947.70 103790.53 -564712.87 0/257 7267

---------->

Fixes spurious system load spikes observed in /proc/loadavgrt, as described in:

  Bug 253103: /proc/loadavgrt issues weird results
  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=253103Signed-off-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

6c24cddb

sched:-fix dequeued race · 8002ed49

Thomas Gleixner authored Jul 03, 2009

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

8002ed49

sched: enable irqs in fire_sched_in_preempt_notifier · 54afe890

Thomas Gleixner authored Jul 03, 2009

KVM expects the notifier call with irqs enabled. It's necessary due
to a possible IPI call. Make the preempt-rt version behave the same
way as mainline.
Signed-off-by: Thomas Gleixner <tgxl@linutronix.de>

54afe890

sched: make task->oncpu available in all configurations · 529d35d4

Gregory Haskins authored Jul 03, 2009

We will use this later in the series to eliminate the need for a function
call.

[ Steven Rostedt: added task_is_current function ]
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

529d35d4

sched: wake_up_idle_cpu rt fix · 23ebb33f

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

23ebb33f

timers: mov printk_tick to soft interrupt · 0dfa3e7e

Thomas Gleixner authored Jul 03, 2009

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

0dfa3e7e

timers: fix timer hotplug on -rt · e5e57bd6

Ingo Molnar authored Jul 03, 2009

Here we are in the CPU_DEAD notifier, and we must not sleep nor
enable interrupts.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

e5e57bd6

timers: preempt-rt support · c90fc5b4

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

c90fc5b4

rt: core implementation · e9888fb9

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

e9888fb9

x86: preempt-rt scheduling support (32bit) · 94b3cbf2

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

94b3cbf2

sched: Prevent boosting of idle task on -rt · 4ee888c3

Thomas Gleixner authored Jul 03, 2009

Idle task boosting is a nono in general. There is one exception, when
NOHZ is active:

The idle task calls get_next_timer_interrupt() and holds the timer
wheel base->lock on the CPU and another CPU wants to access the timer
(probably to cancel it). We can safely ignore the boosting request, as
the idle CPU runs this code with interrupts disabled and will complete
the lock protected section without being interrupted. So there is no
real need to boost.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

4ee888c3

sched: mmdrop needs to be delayed on -rt · 5b6e135f

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

5b6e135f

sched: preempt-rt support · 0a930ce9

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

0a930ce9

x86: remove idle check pgt cache calls · 42cd561b

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

42cd561b

x86: preempt-rt preparatory patches (32bit) · 12fdd364

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

12fdd364

x86: preempt-rt preparatory patches for x86 (32bit) · 2469057e
Ingo Molnar authored Jul 03, 2009
```
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
```
2469057e
drivers/serial: call flush_to_ldisc when the irq is threaded · 7f36d9de
Ingo Molnar authored Jul 03, 2009
```
Signed-off-by: Ingo Molnar <mingo@elte.hu>
```
7f36d9de

drivers/serial: Clean up the locking for -rt · 29504810

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

29504810

drivers/net: vortex fix locking issues · 99c5059e

Steven Rostedt authored Jul 03, 2009

Argh, cut and paste wasn't enough...

Use this patch instead.  It needs an irq disable.  But, believe it or not,
on SMP this is actually better.  If the irq is shared (as it is in Mark's
case), we don't stop the irq of other devices from being handled on
another CPU (unfortunately for Mark, he pinned all interrupts to one CPU).
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

 drivers/net/3c59x.c |   16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)
Signed-off-by: Ingo Molnar <mingo@elte.hu>

99c5059e

posix-timers: avoid wakeups when no timers are active · 4aef9893

Thomas Gleixner authored Jul 03, 2009

Waking the thread even when no timers are scheduled is useless.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

4aef9893

posix-timers: Shorten posix_cpu_timers/<CPU> kernel thread names · 80e71499

Arnaldo Carvalho de Melo authored Jul 03, 2009

Shorten the softirq kernel thread names because they always overflow the
limited comm length, appearing as "posix_cpu_timer" CPU# times.

Done on 2.6.24.7, but probably applicable to later kernels.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

80e71499

posix-timers: thread posix-cpu-timers on -rt · 14cbf680

John Stultz authored Jul 03, 2009

posix-cpu-timer code takes non -rt safe locks in hard irq
context. Move it to a thread.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

14cbf680

rt: add rt stats to /proc/stat · 3f79d298

Thomas Gleixner authored Jul 03, 2009

add RT stats to /proc/stat
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 fs/proc/stat.c              |   23 +++++++++++++++++------
 include/linux/kernel_stat.h |    2 ++
 kernel/sched.c              |    6 +++++-
 3 files changed, 24 insertions(+), 7 deletions(-)

3f79d298

genirq: disable irqpoll on -rt · 0e6e7446

Ingo Molnar authored Jul 03, 2009

Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

0e6e7446

tasklet: busy loop workaround · 4bd01256

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

4bd01256

tasklet: redesign: make it saner and make it easier to thread. · 2b3582fb

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

----
 include/linux/interrupt.h |   33 ++++----
 kernel/softirq.c          |  184 ++++++++++++++++++++++++++++++++--------------
 2 files changed, 149 insertions(+), 68 deletions(-)

2b3582fb

net: Reduce preempt disabled region · 5a90027b

Ingo Molnar authored Jul 03, 2009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

5a90027b

net: Convert netfilter to percpu_locked · 75f82937

Ingo Molnar authored Jul 03, 2009

Allows that code to be preemtible
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

75f82937

rt-locks: provide atomic_dec_and_mutex_lock for -rt · 25b665a7
Thomas Gleixner authored Jul 03, 2009
```
Add the missing function
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
```
25b665a7

rt-locks: implement rt_downgrade_write · 86ed1c88

Steven Rostedt authored Jul 03, 2009

The current code of rt_downgrade_write simply does a BUG(). There
are places in the kernel that uses this code, and will crash a runnning
preempt-rt kernel.

The rt_downgrade_write converts a rwsem held for write into a rwsem
held for read without ever releasing the semaphore. In -rt, the rwsems
are simply a mutex. There is nothing different between a rwsem held
for write, and one held for read. The difference is that one held for
read can nest.

This patch changes the BUG_ON() to simply BUG if the caller is not
the owner of the semaphore.

This patch comes from my rt-git repo, and has been tested there.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark.williams@gmail.com>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
LKML-Reference: <alpine.DEB.2.00.0904151142420.31828@gandalf.stny.rr.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

86ed1c88

rt-locks: fix recursive rwlocks and cleanup rwsems · 4981ddc6

Thomas Gleixner authored Jul 03, 2009

recursive rwlocks are only allowed for recursive reads.  recursive
rwsems are not allowed at all.

Follow up to Jan Blunks fix.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

4981ddc6

rt: Make rt_down_read_trylock() behave like down_read_trylock() · 1a0adb98

Jan Blunck authored Jul 03, 2009

This patch removes the stupid "Read locks within the self-held write
lock succeed" behaviour. This is breaking in mm_take_all_locks() since
it is quite common to ensure that a lock is taken with
BUG_ON(down_read_trylock(&mm->mmap_sem)).
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

1a0adb98

rt: Add the preempt-rt lock replacement APIs · 2f0c8457

Thomas Gleixner authored Jul 26, 2009

Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex
based locking functions for preempt-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

2f0c8457

rtmutex: Fix CONFIG_DEBUG_RT_MUTEX lock underflow warnings · 2c237cf3

john stultz authored Jul 03, 2009

So if I enable CONFIG_DEBUG_RT_MUTEXES with 2.6.24.7-rt14, I tend to
quickly see a number of BUG warnings when running Java tests:

BUG: jxeinajar/3383: lock count underflow!
Pid: 3383, comm: jxeinajar Not tainted 2.6.24-ibmrt2.5john #3

Call Trace:
 [<ffffffff8107208d>] rt_mutex_deadlock_account_unlock+0x5d/0x70
 [<ffffffff817d6aa5>] rt_read_slowunlock+0x35/0x550
 [<ffffffff8107173d>] rt_mutex_up_read+0x3d/0xc0
 [<ffffffff81072a99>] rt_up_read+0x29/0x30
 [<ffffffff8106e34e>] do_futex+0x32e/0xd40
 [<ffffffff8107173d>] ? rt_mutex_up_read+0x3d/0xc0
 [<ffffffff81072a99>] ? rt_up_read+0x29/0x30
 [<ffffffff8106f370>] compat_sys_futex+0xa0/0x110
 [<ffffffff81010a36>] ? syscall_trace_enter+0x86/0xb0
 [<ffffffff8102ff04>] cstar_do_call+0x1b/0x65

INFO: lockdep is turned off.
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
... [<ffffffff817d8e42>] .... __spin_lock_irqsave+0x22/0x60
......[<ffffffff817d6a93>] ..   ( <= rt_read_slowunlock+0x23/0x550)

After some debugging and with Steven's help, we realized that with
rwlocks, rt_mutex_deadlock_account_lock can be called multiple times in
parallel (where as in most cases the mutex must be held by the caller to
to call the function). This can cause integer lock_count value being
used to be non-atomically incremented.

The following patch converts lock_count to a atomic_t and resolves the
warnings.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Clark Williams <williams@redhat.com>
Cc: dvhltc <dvhltc@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

2c237cf3

rtmutex: prevent missed wakeups · 487ac708

Thomas Gleixner authored Jul 03, 2009

The sleeping locks implementation based on rtmutexes can miss wakeups
for two reasons:

1) The unconditional usage TASK_UNINTERRUPTIBLE for the blocking state

   Results in missed wakeups from wake_up_interruptible*()

   state = TASK_INTERRUPTIBLE;
   blocks_on_lock()
     state = TASK_UNINTERRUPTIBLE;
     schedule();
     ....
     acquires_lock();
     restore_state();

   Until the waiter has restored its state wake_up_interruptible*() will
   fail.

2) The rtmutex wakeup intermediate state TASK_RUNNING_MUTEX

   Results in missed wakeups from wake_up*()

   waiter is woken by mutex wakeup
   	  waiter->state = TASK_RUNNING_MUTEX;
   ....
   acquires_lock();
   restore_state();

   Until the waiter has restored its state wake_up*() will fail.

Solution:

Instead of setting the state to TASK_RUNNING_MUTEX in the mutex wakeup
case we logically OR TASK_RUNNING_MUTEX to the current waiter
state. This keeps the original bits (TASK_INTERRUPTIBLE /
TASK_UNINTERRUPTIBLE) intact and lets wakeups succeed. When a task
blocks on a lock in state TASK_INTERRUPTIBLE and is woken up by a real
wakeup, then we store the state = TASK_RUNNING for the restore and can
safely use TASK_UNINTERRUPTIBLE from that point to avoid further
wakeups which just let us loop in the lock code.

This also removes the extra TASK_RUNNING_MUTEX flags from the
wakeup_process*() functions as they are not longer necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

487ac708