Commits · 33e8c6e3ff96d6ad6c831cc714136b6fe8125d49 · linux / linux-davinci

13 Nov, 2009 1 commit

Use hweight8 instead of counting for each bit · 33e8c6e3

Akinobu Mita authored Nov 14, 2009

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Anders Larsen <al@alarsen.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

33e8c6e3

10 Oct, 2009 1 commit

commit ("qnx4: remove write support")... · de293f4c

Anders Larsen authored Oct 10, 2009

commit 945ffe54 ("qnx4: remove write support") removed the (defunct)
write support but missed a chunk of related, dead code.
Signed-off-by: Anders Larsen <al@alarsen.net>
Cc:  Jiri Kosina <jkosina@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

de293f4c

12 Nov, 2009 1 commit

resource_size() doesn't change the resource it operates on, so the res · 2fba600c

Jean Delvare authored Nov 12, 2009

parameter can be marked const.  Same for resource_type().
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

2fba600c

10 Nov, 2009 2 commits

ERROR: space prohibited before that close parenthesis ')' · f1bf8841

Andrew Morton authored Nov 10, 2009

#252: FILE: fs/direct-io.c:1212:
+			if (end > isize )

total: 1 errors, 0 warnings, 338 lines checked

./patches/direct-io-cleanup-blockdev_direct_io-locking.patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

f1bf8841

Currently the locking in blockdev_direct_IO is a mess, we have three · 385fc1ba

Christoph Hellwig authored Nov 10, 2009

different locking types and very confusing checks for some of them.  The
most complicated one is DIO_OWN_LOCKING for reads, which happens to not
actually be used.

This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read
case is unused anyway, and the write side is almost identical to
DIO_NO_LOCKING.  The difference is that DIO_NO_LOCKING always sets the
create argument for the get_blocks callback to zero, but we can easily
move that to the actual get_blocks callbacks.  There are four users of the
DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is
fine with the new version, ocfs2 only errors out if create were ever set,
and we can remove this dead code now, the block device code only ever uses
create for an error message if we are fully beyond the device which can
never happen, and last but not least XFS will need the new behavour for
writes.

Now we can replace the lock_type variable with a flags one, where no flag
means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
flag.  Separate out the check for not allowing to fill holes into a
separate flag, although for now both flags always get set at the same
time.

Also revamp the documentation of the locking scheme to actually make
sense.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alex Elder <aelder@sgi.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

385fc1ba

30 Oct, 2009 2 commits

Cc: Jeff Moyer <jmoyer@redhat.com> · fec2996f

Andrew Morton authored Oct 30, 2009

Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fec2996f

Intel reported a performance regression caused by the following commit: · f3260275

Jeff Moyer authored Oct 30, 2009

commit 848c4dd5
Author: Zach Brown <zach.brown@oracle.com>
Date:   Mon Aug 20 17:12:01 2007 -0700

    dio: zero struct dio with kzalloc instead of manually

    This patch uses kzalloc to zero all of struct dio rather than
    manually trying to track which fields we rely on being zero.  It
    passed aio+dio stress testing and some bug regression testing on
    ext3.

    This patch was introduced by Linus in the conversation that lead up
    to Badari's minimal fix to manually zero .map_bh.b_state in commit:

      6a648fa7

    It makes the code a bit smaller.  Maybe a couple fewer cachelines to
    load, if we're lucky:

       text    data     bss     dec     hex filename
    3285925  568506 1304616 5159047  4eb887 vmlinux
    3285797  568506 1304616 5158919  4eb807 vmlinux.patched

    I was unable to measure a stable difference in the number of cpu
    cycles spent in blockdev_direct_IO() when pushing aio+dio 256K reads
    at ~340MB/s.

    So the resulting intent of the patch isn't a performance gain but to
    avoid exposing ourselves to the risk of finding another field like
    .map_bh.b_state where we rely on zeroing but don't enforce it in the
    code.

Zach surmised that zeroing out the page array was what caused most of
the problem, and suggested the approach taken in the attached patch for
resolving the issue.  Intel re-tested with this patch and saw a 0.6%
performance gain (the original regression was 0.5%).
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

f3260275

13 Nov, 2009 2 commits

Don't know the reason, but it appears ki_wait field of iocb never gets used. · cecb8284

Shaohua Li authored Nov 13, 2009

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

cecb8284

Signed-off-by: Joe Perches <joe@perches.com> · 82c7824c

Joe Perches authored Nov 13, 2009

Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

82c7824c

14 Oct, 2009 1 commit

Not makes it a bool before the comparison. · 5cddfd59

Roel Kluin authored Oct 14, 2009

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

5cddfd59

09 Nov, 2009 1 commit

dma_mask is, when interpreted as address, the last valid byte, and hence · 5a0c307d

Jan Beulich authored Nov 09, 2009

comparison msut also be done using the last valid of the buffer in
question.

Also fix the open-coded instances in lib/swiotlb.c.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Becky Bruce <beckyb@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

5a0c307d

30 Sep, 2009 3 commits

Add support for 6 ranks per channel to the i5100 chipset. I have tested · bb455322

Nils Carlson authored Oct 01, 2009

the patch as far as possible with correctible errors and things appear
good.  The DIMM mapping is correct for our board, but boards may differ.
Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Acked-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

bb455322

Addscrubbing to the i5100 chipset. The i5100 chipset only supports one · 43694831

Nils Carlson authored Oct 01, 2009

scrubbing rate, which is not constant but dependent on memory load. The
rate returned by this driver is an estimate based on some experimentation,
but is substantially closer to the truth than the speed supplied in the
documentation.

Also, scrubbing is done once, and then a done-bit is set. This means that
to accomplish continuous scrubbing a re-enabling mechanism must be used.
I have created the simplest possible such mechanism in the form of a
work-queue which will check every five minutes. This interval is quite
arbitrary but should be sufficient for all sizes of system memory.
Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

43694831

The i5100 driver uses the word controller instead of channel in a lot of · 8e5f87de

Nils Carlson authored Oct 01, 2009

places, this is simply a cleanup of the patch.
Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

8e5f87de

13 Nov, 2009 2 commits

The size of EFI GPT header is not static, but whole sector is · c9f1dfbe

Karel Zak authored Nov 13, 2009

allocated for the header. The HeaderSize field must be greater
than 92 (= sizeof(struct gpt_header) and must be less than or
equal to the logical block size.

It means we have to read whole sector with the header, because the
header crc32 checksum is calculated according to HeaderSize.

For more details see UEFI standard (version 2.3, May 2009):
  - 5.3.1 GUID Format overview, page 93
  - Table 13. GUID Partition Table Header, page 96
Signed-off-by: Karel Zak <kzak@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

c9f1dfbe

Currently, kernel uses strictly 512-byte sectors for EFI GPT parsing. · 9ec8011b

Karel Zak authored Nov 13, 2009

That's wrong.

UEFI standard (version 2.3, May 2009, 5.3.1 GUID Format overview, page
95) defines that LBA is always based on the logical block size. It
means bdev_logical_block_size() (aka BLKSSZGET) for Linux.

This patch removes static sector size from EFI GPT parser.

The problem is reproducible with the latest GNU Parted:

 # modprobe scsi_debug dev_size_mb=50 sector_size=4096

  # ./parted /dev/sdb print
  Model: Linux scsi_debug (scsi)
  Disk /dev/sdb: 52.4MB
  Sector size (logical/physical): 4096B/4096B
  Partition Table: gpt

  Number  Start   End     Size    File system  Name     Flags
   1      24.6kB  3002kB  2978kB               primary
   2      3002kB  6001kB  2998kB               primary
   3      6001kB  9003kB  3002kB               primary

  # blockdev --rereadpt /dev/sdb
  # dmesg | tail -1
   sdb: unknown partition table      <---- !!!

with this patch:

  # blockdev --rereadpt /dev/sdb
  # dmesg | tail -1
   sdb: sdb1 sdb2 sdb3
Signed-off-by: Karel Zak <kzak@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

9ec8011b

12 Nov, 2009 1 commit

We can pass "module parameters" on the kernel command line even when · b40f654c

Rakib Mullick authored Nov 12, 2009

!MODULE.  So, #ifdef MODULE becomes obsolete.  Also move the declaration
moxa_board_conf at the start of the function, since we were hit by the
following warning.

drivers/char/moxa.c: In function `moxa_init':
drivers/char/moxa.c:1040: warning: ISO C90 forbids mixed declarations and code

Signed-off-by: Rakib Mullick<rakib.mullick@gmail.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b40f654c

16 Oct, 2009 1 commit

vtermnos[] is unsigned, so this test was wrong. · d3b2937d

Roel Kluin authored Oct 17, 2009

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d3b2937d

09 Oct, 2009 1 commit

Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt · a669c3ce

Joe Perches authored Oct 09, 2009

Convert printks to pr_<level>
Convert some embedded function names to %s...__func__
Remove a period after exclamation points.
Remove #define pr_dbg which could be used by future kernel.h includes
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

a669c3ce

30 Sep, 2009 1 commit

Stanse found unnecessary test in mxser_startup. · d459cadb

Jiri Slaby authored Oct 01, 2009

tty is dereferenced earlier, the test is superfluous. Remove it.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d459cadb

24 Aug, 2009 1 commit
- Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> · c0a6a6e4
  Bartlomiej Zolnierkiewicz authored Aug 25, 2009
```
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
```
  c0a6a6e4
09 Nov, 2009 1 commit

Currently all architectures but microblaze unconditionally define · 0f063ccc

Christoph Hellwig authored Nov 09, 2009

USE_ELF_CORE_DUMP.  The microblaze omission seems like an error to me, so
let's kill this ifdef and make sure we are the same everywhere.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: <linux-arch@vger.kernel.org>
Cc: Michal Simek <michal.simek@petalogix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

0f063ccc

29 Sep, 2009 1 commit

KCS_IDLE and KCS_IDLE state have the same value, but in this function the · 9eef83a8

Julia Lawall authored Sep 30, 2009

constants ending in _STATE are compared to the state variable.
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

9eef83a8

09 Nov, 2009 7 commits

If multiple simple decrements on the same semaphore are pending, then the · f5961270

Manfred Spraul authored Nov 09, 2009

current code scans all decrement operations, even if the semaphore value
is already 0.

The patch optimizes that: if the semaphore value is 0, then there is no
need to scan the q->alter entries.

Note that this is a common case: It happens if 100 decrements by one are
pending and now an increment by one increases the semaphore value from 0
to 1.  Without this patch, all 100 entries are scanned.  With the patch,
only one entry is scanned, then woken up.  Then the new rule triggers and
the scanning is aborted, without looking at the remaining 99 tasks.

With this patch, single sop increment/decrement by 1 are now O(1).
(same as with Nick's patch)
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

f5961270

sysv sem has the concept of semaphore arrays that consist out of multiple · 7a32c923

Manfred Spraul authored Nov 09, 2009

semaphores.  Atomic operations that affect multiple semaphores are
supported.

The patch optimizes single semaphore operation calls that affect only one
semaphore: It's not necessary to scan all pending operations, it is
sufficient to scan the per-semaphore list.

The idea is from Nick Piggin version of an ipc sem improvement, the
implementation is different: The code tries to keep as much common code as
possible.

As the result, the patch is simpler, but optimizes fewer cases.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

7a32c923

Based on Nick's findings: · 728d8dcd

Manfred Spraul authored Nov 09, 2009

sysv sem has the concept of semaphore arrays that consist out of multiple
semaphores.  Atomic operations that affect multiple semaphores are
supported.

The patch is the first step for optimizing simple, single semaphore
operations: In addition to the global list of all pending operations, a
2nd, per-semaphore list with the simple operations is added.

Note: this patch does not make sense by itself, the new list is used
nowhere.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

728d8dcd

Reduce the amount of scanning of the list of pending semaphore operations: · ea193f12

Manfred Spraul authored Nov 09, 2009

If try_atomic_semop failed, then no changes were applied.  Thus no need to
restart.

Additionally, this patch correct an incorrect comment: It's possible to
wait for arbitrary semaphore values (do a dec by <x>, wait-for-zero, inc
by <x> in one atomic operation)

Both changes are from Nick Piggin, the patch is the result of a different
split of the individual changes.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ea193f12

The strange sysv semaphore wakeup scheme has a kind of busy-wait lock · bbf9b0a0

Nick Piggin authored Nov 09, 2009

involved, which could deadlock if preemption is enabled during the "lock".

It is an implementation detail (due to a spinlock being held) that this is
actually the case. However if "spinlocks" are made preemptible, or if the
sem lock is changed to a sleeping lock for example, then the wakeup would
become buggy. So this might be a bugfix for -rt kernels.

Imagine waker being preempted by wakee and never clearing IN_WAKEUP -- if
wakee has higher RT priority then there is a priority inversion deadlock.
Even if there is not a priority inversion to cause a deadlock, then there
is still time wasted spinning.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

bbf9b0a0

Replace the handcoded list operations in update_queue() with the standard · 846d12d7

Nick Piggin authored Nov 09, 2009

list_for_each_entry macros.

list_for_each_entry_safe() must be used, because list entries can
disappear immediately uppon the wakeup event.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

846d12d7

Around a month ago, there was some discussion about an improvement of the · 9c0f1100

Nick Piggin authored Nov 09, 2009

sysv sem algorithm: Most (at least: some important) users only use simple
semaphore operations, therefore it's worthwile to optimize this use case.


This patch:

Move last looked up sem_undo struct to the head of the task's undo list. 
Attempt to move common entries to the front of the list so search time is
reduced.  This reduces lookup_undo on oprofile of problematic SAP workload
by 30% (see patch 4 for a description of SAP workload).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

9c0f1100

13 Oct, 2009 1 commit

We have apparently had a memory leak since · ad3982c4

Serge E. Hallyn authored Oct 13, 2009

7ca7e564 "ipc: store ipcs into IDRs" in
2007.  The idr of which 3 exist for each ipc namespace is never freed.

This patch simply frees them when the ipcns is freed.  I don't believe any
idr_remove() are done from rcu (and could therefore be delayed until after
this idr_destroy()), so the patch should be safe.  Some quick testing
showed no harm, and the memory leak fixed.

Caught by kmemleak.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

ad3982c4

09 Nov, 2009 1 commit

Thanks to Roland who pointed out de_thread() issues. · 86370047

Oleg Nesterov authored Nov 09, 2009

Currently we add sub-threads to ->real_parent->children list.  This buys
nothing but slows down do_wait().

With this patch ->children contains only main threads (group leaders). 
The only complication is that forget_original_parent() should iterate over
sub-threads by hand, and de_thread() needs another list_replace() when it
changes ->group_leader.

Henceforth do_wait_thread() can never see task_detached() && !EXIT_DEAD
tasks, we can remove this check (and we can unify do_wait_thread() and
ptrace_do_wait()).

This change can confuse the optimistic search in mm_update_next_owner(),
but this is fixable and minor.

Perhaps badness() and oom_kill_process() should be updated, but they
should be fixed in any case.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ratan Nalumasu <rnalumasu@gmail.com>
Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

86370047

30 Oct, 2009 1 commit

Move the call to do_signal_stop() down, after tracehook call. This makes · d5730d11

Oleg Nesterov authored Oct 31, 2009

->group_stop_count condition visible to tracers before do_signal_stop()
will participate in this group-stop.

Currently the patch has no effect, tracehook_get_signal() always returns 0.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d5730d11

16 Oct, 2009 3 commits

Kill force_sig_specific(), this trivial wrapper has no callers. · 89b66b22

Oleg Nesterov authored Oct 17, 2009

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

89b66b22

Trivial, s/0/SI_USER/ in collect_signal() for grep. · b2a15199

Oleg Nesterov authored Oct 17, 2009

This is a bit confusing, we don't know the source of this signal.
But we don't care, and "info->si_code = 0" is imho worse.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b2a15199

Change send_signal() to use si_fromuser(). From now SEND_SIG_NOINFO · a961890b

Oleg Nesterov authored Oct 17, 2009

triggers the "from_ancestor_ns" check.

This fixes reparent_thread()->group_send_sig_info(pdeath_signal)
behaviour, before this patch send_signal() does not detect the
cross-namespace case when the child of the dying parent belongs to the
sub-namespace.

This patch can affect the behaviour of send_sig(), kill_pgrp() and
kill_pid() when the caller sends the signal to the sub-namespace with
"priv == 0" but surprisingly all callers seem to use them correctly,
including disassociate_ctty(on_exit).

Except: drivers/staging/comedi/drivers/addi-data/*.c incorrectly use
send_sig(priv => 0).  But his is minor and should be fixed anyway.
Reported-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Reviewed-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

a961890b

11 Nov, 2009 1 commit

No changes in compiled code. The patch adds the new helper, si_fromuser() · 0deae8c4

Oleg Nesterov authored Nov 11, 2009

and changes check_kill_permission() to use this helper.

The real effect of this patch is that from now we "officially" consider
SEND_SIG_NOINFO signal as "from user-space" signals. This is already true
if we look at the code which uses SEND_SIG_NOINFO, except __send_signal()
has another opinion - see the next patch.

The naming of these special SEND_SIG_XXX siginfo's is really bad
imho.  From __send_signal()'s pov they mean

	SEND_SIG_NOINFO		from user
	SEND_SIG_PRIV		from kernel
	SEND_SIG_FORCED		no info
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Reviewed-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

0deae8c4

12 Nov, 2009 3 commits

Suggested by Roland. · e9fb74f5

Oleg Nesterov authored Nov 13, 2009

Unlike powepc, x86 always calls tracehook_report_syscall_exit(step) with
step = 0, and sends the trap by hand.

This results in unnecessary SIGTRAP when PTRACE_SINGLESTEP follows the
syscall-exit stop.

Change syscall_trace_leave() to pass the correct "step" argument to
tracehook and remove the send_sigtrap() logic.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

e9fb74f5

Suggested by Roland. · 8c1fda69

Oleg Nesterov authored Nov 13, 2009

Implement user_single_step_siginfo() for x86.  Extract this code from
send_sigtrap().

Since x86 calls tracehook_report_syscall_exit(step => 0) the new helper is
not used yet.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

8c1fda69

Suggested by Roland. · 57ef3511

Oleg Nesterov authored Nov 13, 2009

Change tracehook_report_syscall_exit() to look at step flag and send the
trap signal if needed.

This change affects ia64, microblaze, parisc, powerpc, sh.  They pass
nonzero "step" argument to tracehook but since it was ignored the tracee
reports via ptrace_notify(), this is not right and not consistent.

	- PTRACE_SETSIGINFO doesn't work

	- if the tracer resumes the tracee with signr != 0 the new signal
	  is generated rather than delivering it

	- If PT_TRACESYSGOOD is set the tracee reports the wrong exit_code

I don't have a powerpc machine, but I think this test-case should see the
difference:

	#include <unistd.h>
	#include <sys/ptrace.h>
	#include <sys/wait.h>
	#include <assert.h>
	#include <stdio.h>

	int main(void)
	{
		int pid, status;

		if (!(pid = fork())) {
			assert(ptrace(PTRACE_TRACEME) == 0);
			kill(getpid(), SIGSTOP);

			getppid();

			return 0;
		}

		assert(pid == wait(&status));
		assert(ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD) == 0);

		assert(ptrace(PTRACE_SYSCALL, pid, 0,0) == 0);
		assert(pid == wait(&status));

		assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
		assert(pid == wait(&status));

		if (status == 0x57F)
			return 0;

		printf("kernel bug: status=%X shouldn't have 0x80\n", status);
		return 1;
	}
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

57ef3511