Commits · f4f62301c6f42127b7462274abfcbc278f84d59a · linux / linux-davinci

27 Aug, 2008 9 commits

fs_enet: Fix SCC Ethernet on CPM2, and crash in fs_enet_rx_napi() · f4f62301

Heiko Schocher authored Aug 25, 2008

Signed-off-by: Heiko Schocher <hs@denx.de>
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

f4f62301

igb: fix setting the number of tx queues · 34a20e89

Alexander Duyck authored Aug 26, 2008

The real_num_tx_queues was not being set when in MSI-X only mode. This patch
corrects that path so all interrupt types are correctly configured.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

34a20e89

igb: ethtool -d reads EICR which is incorrect as it is read on clear · fe59de38

Alexander Duyck authored Aug 26, 2008

Ethtool -d is reading the EICR and ICR registers which is currently
clearing these registers and masking off interrupts.  To prevent this we
read the EICS and ICS equivilents as they can be read without clearing or
masking.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

fe59de38

igb: force all queues to interrupt once every 2 seconds · 7a6ea550

Alexander Duyck authored Aug 26, 2008

Set the EICS bit for each of the RX queues at least once every 2 seconds to
prevent the rx queues from stalling.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

7a6ea550

r8169: balance pci_map / pci_unmap pair · a866bbf6

Francois Romieu authored Aug 26, 2008

The leak hurts with swiotlb and jumbo frames.

Fix http://bugzilla.kernel.org/show_bug.cgi?id=9468.

Heavily hinted by Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Alistair John Strachan <alistair@devzero.co.uk>
Tested-by: Timothy J Fontaine <tjfontaine@atxconsulting.com>
Cc: Edward Hsu <edward_hsu@realtek.com.tw>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

a866bbf6

myri10ge: update version string to 1.4.3-1.358 · 0623807a

Brice Goglin authored Aug 26, 2008

Update myri10ge version string to 1.4.3-1.358.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

0623807a

ixgbe: fix vlan filtering · 3d01625a

Alexander Duyck authored Aug 26, 2008

VLAN filtering is broken, due to reading the incorrect register for
the VLAN filtering settings.  Fixed by reading/writing the correct
register.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

3d01625a

Blackfin EMAC Driver: the BF526 also supports the MAC, · 736783b8

Mike Frysinger authored Aug 27, 2008

so update things accordingly
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

736783b8

Merge branch 'for-2.6.27' of git://git.marvell.com/mv643xx_eth into upstream-fixes · 373a5e02
Jeff Garzik authored Aug 27, 2008

373a5e02

25 Aug, 2008 12 commits

bnx2x: Version update · c2d42545

Eilon Greenstein authored Aug 25, 2008

Version update
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2d42545

bnx2x: Multi Queue · 231fd58a

Yitchak Gertner authored Aug 25, 2008

The multi queue support is still disabled by default for the bnx2x
(needs some more testing and validation), but there are 2 obvious bug in
it which are fixed in this patch
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

231fd58a

bnx2x: NAPI and interrupts enable/disable · 65abd74d

Yitchak Gertner authored Aug 25, 2008

Fixing the order of enabling and disabling NAPI and the interrupts
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65abd74d

bnx2x: NIC load failure cleanup · d1014634

Yitchak Gertner authored Aug 25, 2008

Load failures were not handled correctly
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1014634

bnx2x: Initialization structure · 3cdf1db7

Yitchak Gertner authored Aug 25, 2008

The TPA initialization is part of the FW internal memory initialization
and so it is moved to the appropriate function
Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3cdf1db7

bnx2x: HW lock timeout · 46230476

Eilon Greenstein authored Aug 25, 2008

Increasing the lock timeout to 5 seconds instead of 1 second to minimize
the chance of failures due to timeout
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

46230476

bnx2x: Minimize lock time · 76b190c5

Eilon Greenstein authored Aug 25, 2008

After iSCSI boot, the HW lock should only protect the flag so only the
first function will reset the chip and not then entire chip reset
process
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76b190c5

bnx2x: Fan failure mechanism on additional design · 7add905f

Eilon Greenstein authored Aug 25, 2008

The A1021G board is also using the fan failure mechanism in the same way
the A1022G board does
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7add905f

bnx2x: Rx work check · 2772f903

Eilon Greenstein authored Aug 25, 2008

The has Rx work check was wrong: when the FW was at the end of the page,
the driver was already at the beginning of the next page. Since the
check only validated that both driver and FW are pointing to the same
place, it concluded that there is still work to be done. This caused
some serious issues including long latency results on ping-pong test and
lockups while unloading the driver in that condition.
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2772f903

ipv6: sysctl fixes · ce3113ec

Al Viro authored Aug 25, 2008

Braino: net.ipv6 in ipv6 skeleton has no business in rotable
class
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce3113ec

ipv4: sysctl fixes · 2f4520d3

Al Viro authored Aug 25, 2008

net.ipv4.neigh should be a part of skeleton to avoid ordering problems
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f4520d3

sctp: add verification checks to SCTP_AUTH_KEY option · 30c2235c

Vlad Yasevich authored Aug 25, 2008

The structure used for SCTP_AUTH_KEY option contains a
length that needs to be verfied to prevent buffer overflow
conditions.  Spoted by Eugene Teo <eteo@redhat.com>.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30c2235c

24 Aug, 2008 6 commits

mv643xx_eth: bump version to 1.3 · c4560318
Lennert Buytenhek authored Aug 24, 2008
```
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
```
c4560318

mv643xx_eth: enforce multiple-of-8-bytes receive buffer size restriction · abe78717

Lennert Buytenhek authored Aug 24, 2008

The mv643xx_eth hardware ignores the lower three bits of the buffer
size field in receive descriptors, causing the reception of full-sized
packets to fail at some MTUs. Fix this by rounding the size of
allocated receive buffers up to a multiple of eight bytes.

While we are at it, add a bit of extra space to each receive buffer so
that we can handle multiple vlan tags on ingress.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>

abe78717

mv643xx_eth: fix NULL pointer dereference in rxq_process() · 9e1f3772

Lennert Buytenhek authored Aug 24, 2008

When we are low on memory, the assumption that every descriptor in the
receive ring will have an skbuff associated with it does not hold.

rxq_process() was assuming that if the receive descriptor it is working
on is not owned by the hardware, it can safely be processed and handed
to the networking stack. But a descriptor in the receive ring not being
owned by the hardware can also happen when we are low on memory and did
not manage to refill the receive ring fully.

This patch changes rxq_process()'s bailout condition from "the first
receive descriptor to be processed is owned by the hardware" to "the
first receive descriptor to be processed is owned by the hardware OR
the number of valid receive descriptors in the ring is zero".
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>

9e1f3772

mv643xx_eth: fix inconsistent lock semantics · 8e0b1bf6

Lennert Buytenhek authored Aug 24, 2008

Nicolas Pitre noted that mv643xx_eth_poll was incorrectly using
non-IRQ-safe locks while checking whether to wake up the netdevice's
transmit queue. Convert the locking to *_irq() variants, since we
are running from softirq context where interrupts are enabled.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>

8e0b1bf6

mv643xx_eth: fix double add_timer() on the receive oom timer · 92c70f27

Lennert Buytenhek authored Aug 24, 2008

Commit 12e4ab79 ("mv643xx_eth: be
more agressive about RX refill") changed the condition for the receive
out-of-memory timer to be scheduled from "the receive ring is empty"
to "the receive ring is not full".

This can lead to a situation where the receive out-of-memory timer is
pending because a previous rxq_refill() didn't manage to refill the
receive ring entirely as a result of being out of memory, and
rxq_refill() is then called again as a side effect of a packet receive
interrupt, and that rxq_refill() call then again does not succeed to
refill the entire receive ring with fresh empty skbuffs because we are
still out of memory, and then tries to call add_timer() on the already
scheduled out-of-memory timer.

This patch fixes this issue by changing the add_timer() call in
rxq_refill() to a mod_timer() call.  If the OOM timer was not already
scheduled, this will behave as before, whereas if it was already
scheduled, this patch will push back its firing time a bit, which is
safe because we've (unsuccessfully) attempted to refill the receive
ring just before we do this.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>

92c70f27

mv643xx_eth: fix NAPI 'rotting packet' issue · 819ddcaf

Lennert Buytenhek authored Aug 24, 2008

When a receive interrupt occurs, mv643xx_eth would first process the
receive descriptors and then ACK the receive interrupt, instead of the
other way round.

This would leave a small race window between processing the last
receive descriptor and clearing the receive interrupt status in which
a new packet could come in, which would then 'rot' in the receive
ring until the next receive interrupt would come in.

Fix this by ACKing (clearing) the receive interrupt condition before
processing the receive descriptors.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>

819ddcaf

23 Aug, 2008 2 commits

ipv6: protocol for address routes · f410a1fb

Stephen Hemminger authored Aug 23, 2008

This fixes a problem spotted with zebra, but not sure if it is
necessary a kernel problem.  With IPV6 when an address is added to an
interface, Zebra creates a duplicate RIB entry, one as a connected
route, and other as a kernel route.

When an address is added to an interface the RTN_NEWADDR message
causes Zebra to create a connected route. In IPV4 when an address is
added to an interface a RTN_NEWROUTE message is set to user space with
the protocol RTPROT_KERNEL. Zebra ignores these messages, because it
already has the connected route.

The problem is that route created in IPV6 has route protocol ==
RTPROT_BOOT.  Was this a design decision or a bug? This fixes it. Same
patch applies to both net-2.6 and stable.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f410a1fb

icmp: icmp_sk() should not use smp_processor_id() in preemptible code · fdc0bde9

Denis V. Lunev authored Aug 23, 2008

Pass namespace into icmp_xmit_lock, obtain socket inside and return
it as a result for caller.

Thanks Alexey Dobryan for this report:

Steps to reproduce:

	CONFIG_PREEMPT=y
	CONFIG_DEBUG_PREEMPT=y
	tracepath <something>

BUG: using smp_processor_id() in preemptible [00000000] code: tracepath/3205
caller is icmp_sk+0x15/0x30
Pid: 3205, comm: tracepath Not tainted 2.6.27-rc4 #1

Call Trace:
 [<ffffffff8031af14>] debug_smp_processor_id+0xe4/0xf0
 [<ffffffff80409405>] icmp_sk+0x15/0x30
 [<ffffffff8040a17b>] icmp_send+0x4b/0x3f0
 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160
 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8023a475>] ? local_bh_enable_ip+0x95/0x110
 [<ffffffff804285b9>] ? _spin_unlock_bh+0x39/0x40
 [<ffffffff8025a26c>] ? mark_held_locks+0x4c/0x90
 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160
 [<ffffffff803e91b4>] ip_fragment+0x8d4/0x900
 [<ffffffff803e7030>] ? ip_finish_output2+0x0/0x290
 [<ffffffff803e91e0>] ? ip_finish_output+0x0/0x60
 [<ffffffff803e6650>] ? dst_output+0x0/0x10
 [<ffffffff803e922c>] ip_finish_output+0x4c/0x60
 [<ffffffff803e92e3>] ip_output+0xa3/0xf0
 [<ffffffff803e68d0>] ip_local_out+0x20/0x30
 [<ffffffff803e753f>] ip_push_pending_frames+0x27f/0x400
 [<ffffffff80406313>] udp_push_pending_frames+0x233/0x3d0
 [<ffffffff804067d1>] udp_sendmsg+0x321/0x6f0
 [<ffffffff8040d155>] inet_sendmsg+0x45/0x80
 [<ffffffff803b967f>] sock_sendmsg+0xdf/0x110
 [<ffffffff8024a100>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff80257ce5>] ? validate_chain+0x415/0x1010
 [<ffffffff8027dc10>] ? __do_fault+0x140/0x450
 [<ffffffff802597d0>] ? __lock_acquire+0x260/0x590
 [<ffffffff803b9e55>] ? sockfd_lookup_light+0x45/0x80
 [<ffffffff803ba50a>] sys_sendto+0xea/0x120
 [<ffffffff80428e42>] ? _spin_unlock_irqrestore+0x42/0x80
 [<ffffffff803134bc>] ? __up_read+0x4c/0xb0
 [<ffffffff8024e0c6>] ? up_read+0x26/0x30
 [<ffffffff8020b8bb>] system_call_fastpath+0x16/0x1b

icmp6_sk() is similar.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fdc0bde9

22 Aug, 2008 1 commit

pkt_sched: Fix qdisc list locking · f6e0b239

Jarek Poplawski authored Aug 22, 2008

Since some qdiscs call qdisc_tree_decrease_qlen() (so qdisc_lookup())
without rtnl_lock(), adding and deleting from a qdisc list needs
additional locking. This patch adds global spinlock qdisc_list_lock
and wrapper functions for modifying the list. It is considered as a
temporary solution until hfsc_dequeue(), netem_dequeue() and
tbf_dequeue() (or qdisc_tree_decrease_qlen()) are redone.

With feedback from Herbert Xu and David S. Miller.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6e0b239

21 Aug, 2008 3 commits

pkt_sched: Fix qdisc_watchdog() vs. dev_deactivate() race · 2540e051

Jarek Poplawski authored Aug 21, 2008

dev_deactivate() can skip rescheduling of a qdisc by qdisc_watchdog()
or other timer calling netif_schedule() after dev_queue_deactivate().
We prevent this checking aliveness before scheduling the timer. Since
during deactivation the root qdisc is available only as qdisc_sleeping
additional accessor qdisc_root_sleeping() is created.

With feedback from Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2540e051

sctp: fix potential panics in the SCTP-AUTH API. · 5e739d17

Vlad Yasevich authored Aug 21, 2008

All of the SCTP-AUTH socket options could cause a panic
if the extension is disabled and the API is envoked.

Additionally, there were some additional assumptions that
certain pointers would always be valid which may not
always be the case.

This patch hardens the API and address all of the crash
scenarios.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5e739d17

Linux v2.6.27-rc4 · 6a55617e
Linus Torvalds authored Aug 20, 2008

6a55617e

20 Aug, 2008 7 commits

cramfs: fix named-pipe handling · 82d63fc9

Al Viro authored Aug 20, 2008

After commit a97c9bf3 (fix cramfs
making duplicate entries in inode cache) in kernel 2.6.14, named-pipe
on cramfs does not work properly.

It seems the commit make all named-pipe on cramfs share their inode
(and named-pipe buffer).

Make ..._test() refuse to merge inodes with ->i_ino == 1, take inode setup
back to get_cramfs_inode() and make ->drop_inode() evict ones with ->i_ino
== 1 immediately.
Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>		[2.6.14 and later]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

82d63fc9

fbdefio: add set_page_dirty handler to deferred IO FB · d847471d

Ian Campbell authored Aug 20, 2008

Fixes kernel BUG at lib/radix-tree.c:473.

Previously the handler was incidentally provided by tmpfs but this was
removed with:

  commit 14fcc23f
  Author: Hugh Dickins <hugh@veritas.com>
  Date:   Mon Jul 28 15:46:19 2008 -0700

    tmpfs: fix kernel BUG in shmem_delete_inode

relying on this behaviour was incorrect in any case and the BUG also
appeared when the device node was on an ext3 filesystem.

v2: override a_ops at open() time rather than mmap() time to minimise
races per AKPM's concerns.
Signed-off-by: Ian Campbell <ijc@hellion.org.uk>
Cc: Jaya Kumar <jayakumar.lkml@gmail.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Johannes Weiner <hannes@saeurebad.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Kel Modderman <kel@otaku42.de>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: <stable@kernel.org> [14fcc23f is in 2.6.25.14 and 2.6.26.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

d847471d

rtc: rtc-ds1374: fix 'no irq' case handling · b42f9317

Anton Vorontsov authored Aug 20, 2008

On a PowerPC board with ds1374 RTC I'm getting this error while RTC tries
to probe:

rtc-ds1374 0-0068: unable to request IRQ

This happens because I2C probing code (drivers/of/of_i2c.c) is specifying
IRQ0 for 'no irq' case, which is correct.

The driver handles this incorrectly, though. This patch fixes it.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

b42f9317

mm: xip/ext2 fix block allocation race · 14bac5ac

Nick Piggin authored Aug 20, 2008

XIP can call into get_xip_mem concurrently with the same file,offset with
create=1.  This usually maps down to get_block, which expects the page
lock to prevent such a situation.  This causes ext2 to explode for one
reason or another.

Serialise those calls for the moment.  For common usages today, I suspect
get_xip_mem rarely is called to create new blocks.  In future as XIP
technologies evolve we might need to look at which operations require
scalability, and rework the locking to suit.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Acked-by: Carsten Otte <cotte@freenet.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

14bac5ac

mm: xip fix fault vs sparse page invalidate race · 538f8ea6

Nick Piggin authored Aug 20, 2008

XIP has a race between sparse pages being inserted into page tables, and
sparse pages being zapped when its time to put a non-sparse page in.

What can happen is that a process can be left with a dangling sparse page
in a MAP_SHARED mapping, while the rest of the world sees the non-sparse
version.  Ie.  data corruption.

Guard these operations with a seqlock, making fault-in-sparse-pages the
slowpath, and try-to-unmap-sparse-pages the fastpath.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Acked-by: Carsten Otte <cotte@freenet.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

538f8ea6

mm: dirty page tracking race fix · 479db0bf

Nick Piggin authored Aug 20, 2008

There is a race with dirty page accounting where a page may not properly
be accounted for.

clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.

page_mkclean walks the rmaps for that page, and for each one it cleans and
write protects the pte if it was dirty.  It uses page_check_address to
find the pte.  That function has a shortcut to avoid the ptl if the pte is
not present.  Unfortunately, the pte can be switched to not-present then
back to present by other code while holding the page table lock -- this
should not be a signal for page_mkclean to ignore that pte, because it may
be dirty.

For example, powerpc64's set_pte_at will clear a previously present pte
before setting it to the desired value.  There may also be other code in
core mm or in arch which do similar things.

The consequence of the bug is loss of data integrity due to msync, and
loss of dirty page accounting accuracy.  XIP's __xip_unmap could easily
also be unreliable (depending on the exact XIP locking scheme), which can
lead to data corruption.

Fix this by having an option to always take ptl to check the pte in
page_check_address.

It's possible to retain this optimization for page_referenced and
try_to_unmap.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Carsten Otte <cotte@freenet.de>
Cc: Hugh Dickins <hugh@veritas.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

479db0bf

fix setpriority(PRIO_PGRP) thread iterator breakage · 2d70b68d

Ken Chen authored Aug 20, 2008

When user calls sys_setpriority(PRIO_PGRP ...) on a NPTL style multi-LWP
process, only the task leader of the process is affected, all other
sibling LWP threads didn't receive the setting.  The problem was that the
iterator used in sys_setpriority() only iteartes over one task for each
process, ignoring all other sibling thread.

Introduce a new macro do_each_pid_thread / while_each_pid_thread to walk
each thread of a process.  Convert 4 call sites in {set/get}priority and
ioprio_{set/get}.
Signed-off-by: Ken Chen <kenchen@google.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2d70b68d