Commits · 5090213c504c4ec36f9b2619a14e2ad4db025c93 · linux / linux-davinci

24 Aug, 2009 4 commits

use pr_debug for debug printk. · 5090213c

Dave Young authored Aug 25, 2009

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

5090213c

Cc: Dave Young <hidave.darkstar@gmail.com> · ced7929e
Andrew Morton authored Aug 25, 2009
```
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
```
ced7929e

Rename `printk_delay_msec' to `loops_per_msec', because the patch "printk: · 2ff001d1

Dave Young authored Aug 25, 2009

add printk_delay to make messages readable for some scenarios" wishes to
more appropriately use the `printk_delay_msec' identifier.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

2ff001d1

If pmd_alloc() fails we should only free the prior allocated pud, if · 1688c762

Roel Kluin authored Aug 25, 2009

pte_alloc_map() fails, we should free pmd as well.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

1688c762

13 Aug, 2009 1 commit
- Signed-off-by: Christoph Hellwig <hch@lst.de> · 2398094e
  Christoph Hellwig authored Aug 13, 2009
```
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
```
  2398094e
23 Jul, 2009 1 commit

Convert cris to use GENERIC_TIME via the arch_getoffset() infrastructure, · 314d5202

john stultz authored Jul 23, 2009

reducing the amount of arch specific code we need to maintain.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

314d5202

13 Aug, 2009 1 commit

Signed-off-by: Christoph Hellwig <hch@lst.de> · 22654960

Christoph Hellwig authored Aug 13, 2009

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

22654960

23 Jul, 2009 1 commit

Convert m68k to use GENERIC_TIME via the arch_getoffset() infrastructure, · 703f9552

john stultz authored Jul 23, 2009

reducing the amount of arch specific code we need to maintain.

I've taken my best swing at converting this, but I'm not 100% confident
I got it right. My cross-compiler is now out of date (gcc4.2) so I
wasn't able to  check if it compiled. Any assistance from arch
maintainers or testers to get this merged would be great.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

703f9552

13 Aug, 2009 1 commit

Signed-off-by: Christoph Hellwig <hch@lst.de> · d5f30d04

Christoph Hellwig authored Aug 13, 2009

Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d5f30d04

23 Jul, 2009 1 commit

Convert m32r to use GENERIC_TIME via the arch_getoffset() infrastructure, · 62f31041

john stultz authored Jul 23, 2009

reducing the amount of arch specific code we need to maintain.

I also noted that m32r doesn't seem to be taking the xtime write lock
before calling do_timer()!  That looks like a pretty bad bug to me.  If
folks agree, let me know and I can move the lock grab to the correct spot.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

62f31041

13 Jul, 2009 1 commit

`off' and `max_cpus' are unsigned. When negative they are wrapped and · 4763f5fa

Roel Kluin authored Jul 13, 2009

caught by the other test.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

4763f5fa

13 Aug, 2009 1 commit

Signed-off-by: Christoph Hellwig <hch@lst.de> · e792d697

Christoph Hellwig authored Aug 13, 2009

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

e792d697

12 Aug, 2009 1 commit

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> · d932a8db

Marcin Slusarz authored Aug 12, 2009

Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d932a8db

31 Jul, 2009 1 commit

The incorrect variable is tested. fd is used for another open() · 18397648

Roel Kluin authored Jul 31, 2009

and is already tested.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

18397648

23 Jul, 2009 1 commit

Converts alpha to use GENERIC_TIME via the arch_getoffset() · b2664fde

john stultz authored Jul 23, 2009

infrastructure, reducing the amount of arch specific code we need to
maintain.

I suspect the alpha arch could even be further improved to provide and
rpcc() based clocksource, but not having the hardware, I don't feel
comfortable attempting the more complicated conversion (but I'd be glad to
help if anyone else is interested).
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b2664fde

13 Aug, 2009 1 commit

Signed-off-by: Christoph Hellwig <hch@lst.de> · 00b31a1a

Christoph Hellwig authored Aug 13, 2009

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

00b31a1a

04 Aug, 2009 1 commit

Check whether index is within bounds before testing the element. · b1bb71b5

Roel Kluin authored Aug 04, 2009

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b1bb71b5

23 Jul, 2009 1 commit

Convert blackfin to use GENERIC_TIME via the arch_getoffset() · 713a72df

john stultz authored Jul 23, 2009

infrastructure, reducing the amount of arch specific code we need to
maintain.

I've taken my best swing at converting this, but I'm not 100% confident
I got it right. My cross-compiler is now out of date (gcc4.2) so I
wasn't able to  check if it compiled. Any assistance from arch
maintainers or testers to get this merged would be great.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

713a72df

13 Aug, 2009 1 commit

And also remove the unused idle_timestamp field in irq_cpustat. · 5208044e

Christoph Hellwig authored Aug 13, 2009

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

5208044e

14 Feb, 2009 2 commits

ERROR: space required after that ',' (ctx:VxV) · 499de132

Andrew Morton authored Feb 14, 2009

#23: FILE: arch/frv/kernel/gdb-stub.c:1883:
+				gdbstub_strcpy(output_buffer,"E02");
 				                            ^

ERROR: space required after that ',' (ctx:VxV)
#32: FILE: arch/frv/kernel/gdb-stub.c:1911:
+				gdbstub_strcpy(output_buffer,"E02");
 				                            ^

total: 2 errors, 0 warnings, 16 lines checked

./patches/frv-duplicate-output_buffer-of-e03.patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

499de132

In the case of an error, output_buffer gets an E0X value. E03 was set in · 6b73a93d

Roel Kluin authored Feb 14, 2009

two.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6b73a93d

13 Sep, 2009 1 commit

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> · 7565ddb5

Wu Fengguang authored Sep 14, 2009

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

7565ddb5

14 Sep, 2009 1 commit

> @@ -547,20 +541,20 @@ static ssize_t write_kmem(struct file * · 7f61d18b

Wu Fengguang authored Sep 14, 2009

>  		if (!kbuf)
>  			return wrote ? wrote : -ENOMEM;
>  		while (count > 0) {
> -			int len = size_inside_page(p, count);
> +			unsigned long sz = size_inside_page(p, count);
>
> -			written = copy_from_user(kbuf, buf, len);
> -			if (written) {
> +			sz = copy_from_user(kbuf, buf, sz);

Sorry, it introduced a bug: the "sz" will be zero in normal,

> +			if (sz) {
>  				if (wrote + virtr)
>  					break;
>  				free_page((unsigned long)kbuf);
>  				return -EFAULT;
>  			}
> -			len = vwrite(kbuf, (char *)p, len);
> +			sz = vwrite(kbuf, (char *)p, sz);

and get passed to vwrite here.

This patch fixes it, the new var "n" will be used in another bug
fixing patch following this one.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

7f61d18b

13 Sep, 2009 2 commits

Also rename "len" to "sz". No behavior change. · 3d64448c

Wu Fengguang authored Sep 14, 2009

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

3d64448c

Also convert more size_inside_page() users. · 7dc19b33

Wu Fengguang authored Sep 14, 2009

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

7dc19b33

14 Sep, 2009 1 commit

Cc: Andi Kleen <ak@linux.intel.com> · b8e4adc7

Andrew Morton authored Sep 14, 2009

Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b8e4adc7

12 Sep, 2009 5 commits

cleanuplets. · 9b5d92c6

Andrew Morton authored Sep 12, 2009

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

9b5d92c6

No behaviour change. · 8b0ae513

Wu Fengguang authored Sep 12, 2009

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

8b0ae513

Introduce size_inside_page() to replace duplicate /dev/mem code. · 0082492f

Wu Fengguang authored Sep 12, 2009

Also apply it to /dev/kmem, whose alignment logic was buggy.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

0082492f

The len test in write_kmem() is always true, so can be reduced. · 564eac08

Wu Fengguang authored Sep 12, 2009

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

564eac08

shmem_zero_setup() does not change vm_start, pgoff or vm_flags, only some · d6f76892

Huang Shijie authored Sep 12, 2009

drivers change them (such as /driver/video/bfin-t350mcqb-fb.c).

Move these codes to a more proper place to save cycles for shared
anonymous mapping.
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d6f76892

11 Sep, 2009 1 commit

We noticed very erratic behavior [throughput] with the AIM7 shared · 87e1a47d

Lee Schermerhorn authored Sep 11, 2009

workload running on recent distro [SLES11] and mainline kernels on an
8-socket, 32-core, 256GB x86_64 platform.  On the SLES11 kernel
[2.6.27.19+] with Barcelona processors, as we increased the load [10s of
thousands of tasks], the throughput would vary between two "plateaus"--one
at ~65K jobs per minute and one at ~130K jpm.  The simple patch below
causes the results to smooth out at the ~130k plateau.

But wait, there's more:

We do not see this behavior on smaller platforms--e.g., 4 socket/8 core. 
This could be the result of the larger number of cpus on the larger
platform--a scalability issue--or it could be the result of the larger
number of interconnect "hops" between some nodes in this platform and how
the tasks for a given load end up distributed over the nodes' cpus and
memories--a stochastic NUMA effect.

The variability in the results are less pronounced [on the same platform]
with Shanghai processors and with mainline kernels.  With 31-rc6 on
Shanghai processors and 288 file systems on 288 fibre attached storage
volumes, the curves [jpm vs load] are both quite flat with the patched
kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K
jpm] than the unpatched kernel.

Profiling indicated that the "slow" runs were incurring high[er]
contention on an anon_vma lock in vma_adjust(), apparently called from the
sbrk() system call.

The patch:

A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the
anon_vma lock when we're only adjusting the end of a vma, as is the case
for brk().  The comment questions whether it's worth while to optimize for
this case.  Apparently, on the newer, larger x86_64 platforms, with
interesting NUMA topologies, it is worth while--especially considering
that the patch [if correct!] is quite simple.

We can detect this condition--no overlap with next vma--by noting a NULL
"importer".  The anon_vma pointer will also be NULL in this case, so
simply avoid loading vma->anon_vma to avoid the lock.  However, we
apparently DO need to take the anon_vma lock when we're inserting a vma
['insert' non-NULL] even when we have no overlap [NULL "importer"], so we
need to check for 'insert', as well.

I have tested with and without the 'file || ' test in the patch.  This
does not seem to matter for stability nor performance.  I left this
check/filter in, so we only optimize away the anon_vma lock acquisition
when adjusting the end of a non- importing, non-inserting, anon vma.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Eric Whitney <eric.whitney@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

87e1a47d

10 Sep, 2009 2 commits

This is necessary to make the mmap ring buffer work properly on platforms · 9ba02e11

David Miller authored Sep 11, 2009

where D-cache aliasing is an issue.

vmalloc_user() ensures that the kernel side mapping is SHMLBA aligned, and
on platforms with D-cache aliasing matters the presence of VM_SHARED will
similarly SHMLBA align the user side mapping.

Thus the kernel and the user will be writing to the same D-cache aliases
and we'll avoid inconsistencies and corruption.

The only trick with this change is that vfree() cannot be invoked from
interrupt context, and thus it's not allowed from RCU callbacks.

We deal with this by using schedule_work().

Since the ring buffer is now completely linear even on the kernel side,
several simplifications are probably now possible in the code where we add
entries to the ring.

With help from Peter Zijlstra.
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

9ba02e11

When a vmalloc'd area is mmap'd into userspace, some kind of co-ordination · eb7cc917

David Miller authored Sep 11, 2009

is necessary for this to work on platforms with cpu D-caches which can
have aliases.

Otherwise kernel side writes won't be seen properly in userspace and vice
versa.

If the kernel side mapping and the user side one have the same alignment,
modulo SHMLBA, this can work as long as VM_SHARED is shared of VMA and for
all current users this is true.  VM_SHARED will force SHMLBA alignment of
the user side mmap on platforms with D-cache aliasing matters.

The bulk of this patch is just making it so that a specific alignment can
be passed down into __get_vm_area_node().  All existing callers pass in
'1' which preserves existing behavior.  vmalloc_user() gives SHMLBA for
the alignment.

As a side effect this should get the video media drivers and other
vmalloc_user() users into more working shape on such systems.
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

eb7cc917

09 Sep, 2009 6 commits

CONFIG_SHMEM off gives you (ramfs masquerading as) tmpfs, even when · 3979bd5c

Hugh Dickins authored Sep 10, 2009

CONFIG_TMPFS is off: that's a little anomalous, and I'd intended to make
more sense of it by removing CONFIG_TMPFS altogether, always enabling its
code when CONFIG_SHMEM; but so many defconfigs have CONFIG_SHMEM on
CONFIG_TMPFS off that we'd better leave that as is.

But there is no point in asking for CONFIG_TMPFS if CONFIG_SHMEM is off:
make TMPFS depend on SHMEM, which also prevents TMPFS_POSIX_ACL
shmem_acl.o being pointlessly built into the kernel when SHMEM is off.

And a selfish change, to prevent the world from being rebuilt when I
switch between CONFIG_SHMEM on and off: the only CONFIG_SHMEM in the
header files is mm.h shmem_lock() - give that a shmem.c stub instead.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

3979bd5c

If (flags & MAP_LOCKED) is true, it means vm_flags has already contained · d50281b9

Huang Shijie authored Sep 10, 2009

the bit VM_LOCKED which is set by calc_vm_flag_bits().

So there is no need to reset it again, just remove it.
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d50281b9

__get_user_pages() has been taking its own GUP flags, then processing · 02941858

Hugh Dickins authored Sep 10, 2009

them into FOLL flags for follow_page().  Though oddly named, the FOLL
flags are more widely used, so pass them to __get_user_pages() now.
Sorry, VM flags, VM_FAULT flags and FAULT_FLAGs are still distinct.

(The patch to __get_user_pages() looks peculiar, with both gup_flags
and foll_flags: the gup_flags remain constant; but as before there's
an exceptional case, out of scope of the patch, in which foll_flags
per page have FOLL_WRITE masked off.)
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

02941858

KAMEZAWA Hiroyuki has observed customers of earlier kernels taking · 2a952ef0

Hugh Dickins authored Sep 10, 2009

advantage of the ZERO_PAGE: which we stopped do_anonymous_page() from
using in 2.6.24.  And there were a couple of regression reports on LKML.

Following suggestions from Linus, reinstate do_anonymous_page() use of
the ZERO_PAGE; but this time avoid dirtying its struct page cacheline
with (map)count updates - let vm_normal_page() regard it as abnormal.

Use it only on arches which __HAVE_ARCH_PTE_SPECIAL (x86, s390, sh32,
most powerpc): that's not essential, but minimizes additional branches
(keeping them in the unlikely pte_special case); and incidentally
excludes mips (some models of which needed eight colours of ZERO_PAGE
to avoid costly exceptions).

Don't be fanatical about avoiding ZERO_PAGE updates: get_user_pages()
callers won't want to make exceptions for it, so increment its count
there.  Changes to mlock and migration? happily seems not needed.

In most places it's quicker to check pfn than struct page address:
prepare a __read_mostly zero_pfn for that.  Does get_dump_page()
still need its ZERO_PAGE check? probably not, but keep it anyway.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

2a952ef0

do_anonymous_page() has been wrong to dirty the pte regardless. · 97f76f91

Hugh Dickins authored Sep 10, 2009

If it's not going to mark the pte writable, then it won't help
to mark it dirty here, and clogs up memory with pages which will
need swap instead of being thrown away.  Especially wrong if no
overcommit is chosen, and this vma is not yet VM_ACCOUNTed -
we could exceed the limit and OOM despite no overcommit.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: <stable@kernel.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

97f76f91

follow_hugetlb_page() shouldn't be guessing about the coredump case · 3013b510

Hugh Dickins authored Sep 10, 2009

either: pass the foll_flags down to it, instead of just the write bit.

Remove that obscure huge_zeropage_ok() test.  The decision is easy,
though unlike the non-huge case - here vm_ops->fault is always set.
But we know that a fault would serve up zeroes, unless there's
already a hugetlbfs pagecache page to back the range.

(Alternatively, since hugetlb pages aren't swapped out under pressure,
you could save more dump space by arguing that a page not yet faulted
into this process cannot be relevant to the dump; but that would be
more surprising.)
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

3013b510