Commits · 0c35bbadc59f5ed105c34471143eceb4c0dd9c95 · linux / linux-davinci-2.6.23

22 Jun, 2005 14 commits

[PATCH] VM: add __GFP_NORECLAIM · 0c35bbad

Martin Hicks authored Jun 21, 2005

When using the early zone reclaim, it was noticed that allocating new pages
that should be spread across the whole system caused eviction of local pages.

This adds a new GFP flag to prevent early reclaim from happening during
certain allocation attempts. The example that is implemented here is for page
cache pages. We want page cache pages to be spread across the whole system,
and we don't want page cache pages to evict other pages to get local memory.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

0c35bbad

[PATCH] VM: early zone reclaim · 753ee728

Martin Hicks authored Jun 21, 2005

This is the core of the (much simplified) early reclaim.  The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.

One of the major uses of this is NUMA machines.  With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.

This adds a zone tuneable to enable early zone reclaim.  It is selected on a
per-zone basis and is turned on/off via syscall.

Adding some extra throttling on the reclaim was also required (patch
4/4).  Without the machine would grind to a crawl when doing a "make -j"
kernel build.  Even with this patch the System Time is higher on
average, but it seems tolerable.  Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:

			wall  user   sys   %cpu  ctx sw.  sleeps
			----  ----   ---   ----   ------  ------
No patch		1009  1384   847   258   298170   504402
w/patch, no reclaim     880   1376   667   288   254064   396745
w/patch & reclaim       1079  1385   926   252   291625   548873

These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot.  Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.

I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.

Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).

The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.cSigned-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

753ee728

[PATCH] VM: add may_swap flag to scan_control · bfbb38fb

Martin Hicks authored Jun 21, 2005

Here's the next round of these patches.  These are totally different in
an attempt to meet the "simpler" request after the last patches.  For
reference the earlier threads are:

http://marc.theaimsgroup.com/?l=linux-kernel&m=110839604924587&w=2
http://marc.theaimsgroup.com/?l=linux-mm&m=111461480721249&w=2

This set of patches replaces my other vm- patches that are currently in
-mm.  So they're against 2.6.12-rc5-mm1 about half way through the -mm
patchset.

As I said already this patch is a lot simpler.  The reclaim is turned on
or off on a per-zone basis using a syscall.  I haven't tested the x86
syscall, so it might be wrong.  It uses the existing reclaim/pageout
code with the small addition of a may_swap flag to scan_control
(patch 1/4).

I also added __GFP_NORECLAIM (patch 3/4) so that certain allocation
types can be flagged to never cause reclaim.  This was a deficiency
that was in all of my earlier patch sets.  Previously, doing a big
buffered read would fill one zone with page cache and then start to
reclaim from that same zone, leaving the other zones untouched.

Adding some extra throttling on the reclaim was also required (patch
4/4).  Without the machine would grind to a crawl when doing a "make -j"
kernel build.  Even with this patch the System Time is higher on
average, but it seems tolerable.  Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:

			wall  user   sys   %cpu  ctx sw.  sleeps
			----  ----   ---   ----   ------  ------
No patch		1009  1384   847   258   298170   504402
w/patch, no reclaim     880   1376   667   288   254064   396745
w/patch & reclaim       1079  1385   926   252   291625   548873

These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot.  Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.

I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.

Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).

The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c

This patch:

This adds an extra switch to the scan_control struct.  It simply lets the
reclaim code know if its allowed to swap pages out.

This was required for a simple per-zone reclaimer.  Without this addition
pages would be swapped out as soon as a zone ran out of memory and the early
reclaim kicked in.
Signed-off-by: Martin Hicks <mort@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bfbb38fb

[PATCH] mm: add /proc/zoneinfo · 295ab934

Nikita Danilov authored Jun 21, 2005

Add /proc/zoneinfo file to display information about memory zones.  Useful
to analyze VM behaviour.
Signed-off-by: Nikita Danilov <nikita@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

295ab934

[PATCH] madvise: merge the maps · 05b74384

Prasanna Meda authored Jun 21, 2005

This attempts to merge back the split maps.  This code is mostly copied
from Chrisw's mlock merging from post 2.6.11 trees.  The only difference is
in munmapped_error handling.  Also passed prev to willneed/dontneed,
eventhogh they do not handle it now, since I felt it will be cleaner,
instead of handling prev in madvise_vma in some cases and in subfunction in
some cases.
Signed-off-by: Prasanna Meda <pmeda@akamai.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

05b74384

[PATCH] madvise: do not split the maps · e798c6e8

Prasanna Meda authored Jun 21, 2005

This attempts to avoid splittings when it is not needed, that is when
vm_flags are same as new flags.  The idea is from the <2.6.11 mlock_fixup
and others.  This will provide base for the next madvise merging patch.
Signed-off-by: Prasanna Meda <pmeda@akamai.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

e798c6e8

[PATCH] vmscan: notice slab shrinking · b15e0905

akpm@osdl.org authored Jun 21, 2005

Fix a problem identified by Andrea Arcangeli <andrea@suse.de>

kswapd will set a zone into all_unreclaimable state if it sees that we're not
successfully reclaiming LRU pages.  But that fails to notice that we're
successfully reclaiming slab obects, so we can set all_unreclaimable too soon.

So change shrink_slab() to return a success indication if it actually
reclaimed some objects, and don't assume that the zone is all_unreclaimable if
that is true.  This means that we won't enter all_unreclaimable state if we
are successfully freeing slab objects but we're not yet actually freeing slab
pages, due to internal fragmentation.

(hm, this has a shortcoming.  We could be successfully freeing ZONE_NORMAL
slab objects while being really oom on ZONE_DMA.  If that happens then kswapd
might burn a lot of CPU.  But given that there might be some slab objects in
ZONE_DMA, perhaps that is appropriate.)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b15e0905

[PATCH] smp_processor_id() cleanup · 39c715b7

Ingo Molnar authored Jun 21, 2005

This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.

The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.

Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.

In the new code, there are two externally visible symbols:

 - smp_processor_id(): debug variant.

 - raw_smp_processor_id(): nondebug variant. Replaces all existing
   uses of _smp_processor_id() and __smp_processor_id(). Defined
   by every SMP architecture in include/asm-*/smp.h.

There is one new internal symbol, dependent on DEBUG_PREEMPT:

 - debug_smp_processor_id(): internal debug variant, mapped to
                             smp_processor_id().

Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file.  All related comments got updated and/or
clarified.

I have build/boot tested the following 8 .config combinations on x86:

 {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}

I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT.  (Other
architectures are untested, but should work just fine.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

39c715b7

[PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes · 84929801

Suresh Siddha authored Jun 21, 2005

Appended patch will setup compatibility mode TASK_SIZE properly.  This will
fix atleast three known bugs that can be encountered while running
compatibility mode apps.

a) A malicious 32bit app can have an elf section at 0xffffe000.  During
   exec of this app, we will have a memory leak as insert_vm_struct() is
   not checking for return value in syscall32_setup_pages() and thus not
   freeing the vma allocated for the vsyscall page.  And instead of exec
   failing (as it has addresses > TASK_SIZE), we were allowing it to
   succeed previously.

b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area
   may return addresses beyond 32bits, ultimately causing corruption
   because of wrap-around and resulting in SEGFAULT, instead of returning
   ENOMEM.

c) 32bit app doing this below mmap will now fail.

  mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE,
	MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0);
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

84929801

[PATCH] coverity: idr_get_new_above_int() overrun fix · 589777ea

Zaur Kambarov authored Jun 21, 2005

This patch fixes overrun of array pa:
92   		struct idr_layer *pa[MAX_LEVEL];

in

98   		l = idp->layers;
99   		pa[l--] = NULL;

by passing idp->layers, set in
202  		idp->layers = layers;
to function  sub_alloc in
203  		v = sub_alloc(idp, ptr, &id);
Signed-off-by: Zaur Kambarov <zkambarov@coverity.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

589777ea

[PATCH] coverity: ipmi: avoid overrun of ipmi_interfaces[] · 3a845099

Zaur Kambarov authored Jun 21, 2005

Fix overrun of static array "ipmi_interfaces" of size 4 at position 4 with
index variable "if_num".

Definitions involved:
297  	#define MAX_IPMI_INTERFACES 4
298  	static ipmi_smi_t ipmi_interfaces[MAX_IPMI_INTERFACES];
Signed-off-by: Zaur Kambarov <zkambarov@coverity.com>
Cc: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

3a845099

[PATCH] megaraid build fix · 7f20b6a4

bobl authored Jun 21, 2005

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

7f20b6a4

[PATCH] arm: irqs_disabled() type fix · 9a558cb4

Andrew Morton authored Jun 21, 2005

kernel/sched.c: In function `__might_sleep':
kernel/sched.c:5461: warning: int format, long unsigned int arg (arg 3)

We expect irqs_disabled() to return an int (poor man's bool).
Acked-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

9a558cb4

Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · 9723d95d
Linus Torvalds authored Jun 21, 2005

9723d95d

21 Jun, 2005 19 commits

[SPARC64]: Add prefetch support. · 7049e680

David S. Miller authored Jun 21, 2005

The implementation is optimal for UltraSPARC-III and later.
It will work, however suboptimally, on UltraSPARC-II and
be treated as a NOP on UltraSPARC-I.

It is not worth code patching this thing as the highest cost
is the code space, and code patching cannot eliminate that.
Signed-off-by: David S. Miller <davem@davemloft.net>

7049e680

Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 4a4f8fdb
Linus Torvalds authored Jun 21, 2005

4a4f8fdb

[PATCH] devfs: remove devfs from Kconfig preventing it from being built · 2c6e5a83

Greg KH authored Jun 21, 2005

Here's a much smaller patch to simply disable devfs from the build.  If
this goes well, and there are no complaints for a few weeks, I'll resend
my big "devfs-die-die-die" series of patches that rip the whole thing
out of the kernel tree.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2c6e5a83

[SPARC64]: Fix cmsg length checks in Solaris emulation layer. · 8005aba6
David S. Miller authored Jun 21, 2005
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
8005aba6
Merge 'for-linus' branch of rsync://rsync.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 · 9527cc77
Linus Torvalds authored Jun 21, 2005

9527cc77
[IPV4]: Fix fib_trie.c's args to fib_dump_info(). · 90f66914
David S. Miller authored Jun 21, 2005
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
90f66914

[NETFILTER]: Fix ip6t_LOG sit tunnel logging · 047601bf

Patrick McHardy authored Jun 21, 2005

Sit tunnel logging is currently broken:

MAC=01:23:45:67:89:ab->01:23:45:47:89:ac TUNNEL=123.123.  0.123-> 12.123.  6.123

Apart from the broken IP address, MAC addresses are printed differently
for sit tunnels than for everything else.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

047601bf

[NETFILTER]: Drop conntrack reference in ip_call_ra_chain()/ip_mr_input() · 2715bcf9

Patrick McHardy authored Jun 21, 2005

Drop reference before handing the packets to raw_rcv()
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

2715bcf9

[NETFILTER]: Check TCP checksum in ipt_REJECT · 6150bacf

Patrick McHardy authored Jun 21, 2005

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

6150bacf

[NETFILTER]: Avoid unncessary checksum validation in UDP connection tracking · e3be8ba7

Keir Fraser authored Jun 21, 2005

Signed-off-by: Keir Fraser <Keir.Fraser@xl.cam.ac.uk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3be8ba7

[NETFILTER]: Missing owner-field initialization in ip6table_raw · 97216c79

Patrick McHardy authored Jun 21, 2005

I missed this one when fixing up iptable_raw.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

97216c79

[NETFILTER]: expectation timeouts are compulsory · 1d3cdb41

Phil Oester authored Jun 21, 2005

Since expectation timeouts were made compulsory [1], there is no need to
check for them in ip_conntrack_expect_insert.

[1] https://lists.netfilter.org/pipermail/netfilter-devel/2005-January/018143.htmlSigned-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

1d3cdb41

[NETFILTER]: Restore netfilter assumptions in IPv6 multicast · e9823185

Patrick McHardy authored Jun 21, 2005

Netfilter assumes that skb->data == skb->nh.ipv6h
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9823185

[NETFILTER]: Kill nf_debug · 18b8afc7

Patrick McHardy authored Jun 21, 2005

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

18b8afc7

[NETFILTER]: Kill lockhelp.h · e45b1be8

Patrick McHardy authored Jun 21, 2005

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

e45b1be8

[IPV6]: multicast join and misc · c9e3e8b6

David L Stevens authored Jun 21, 2005

Here is a simplified version of the patch to fix a bug in IPv6
multicasting. It:

1) adds existence check & EADDRINUSE error for regular joins
2) adds an exception for EADDRINUSE in the source-specific multicast
        join (where a prior join is ok)
3) adds a missing/needed read_lock on sock_mc_list; would've raced
        with destroying the socket on interface down without
4) adds a "leave group" in the (INCLUDE, empty) source filter case.
        This frees unneeded socket buffer memory, but also prevents
        an inappropriate interaction among the 8 socket options that
        mess with this. Some would fail as if in the group when you
        aren't really.

Item #4 had a locking bug in the last version of this patch; rather than
removing the idev->lock read lock only, I've simplified it to remove
all lock state in the path and treat it as a direct "leave group" call for
the (INCLUDE,empty) case it covers. Tested on an MP machine. :-)

Much thanks to HoerdtMickael <hoerdt@clarinet.u-strasbg.fr> who
reported the original bug.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9e3e8b6

[IPV6]: V6 route events reported with wrong netlink PID and seq number · 0d51aa80

Jamal Hadi Salim authored Jun 21, 2005

Essentially netlink at the moment always reports a pid and sequence of 0
always for v6 route activities. 
To understand the repurcassions of this look at:
http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html

While fixing this, i took the liberty to resolve the outstanding issue
of IPV6 routes inserted via ioctls to have the correct pids as well.

This patch tries to behave as close as possible to the v4 routes i.e
maintains whatever PID the socket issuing the command owns as opposed to
the process. That made the patch a little bulky.

I have tested against both netlink derived utility to add/del routes as
well as ioctl derived one. The Quagga folks have tested against quagga.
This fixes the problem and so far hasnt been detected to introduce any
new issues.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d51aa80

[IPV4]: Add LC-Trie FIB lookup algorithm. · 19baf839

Robert Olsson authored Jun 21, 2005

Signed-off-by: Robert Olsson <Robert.Olsson@data.slu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

19baf839

[NETLINK]: netlink_callback structure needs 5 args not 4 · 18b504e2
Alexey Kuznetsov authored Jun 21, 2005
```
net/ipv4/tcp_diag.c uses up to ->args[4]
Signed-off-by: David S. Miller <davem@davemloft.net>
```
18b504e2

20 Jun, 2005 7 commits

Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 · 1d345dac
Linus Torvalds authored Jun 20, 2005

1d345dac
Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · fb395884
Linus Torvalds authored Jun 20, 2005

fb395884
[PATCH] PCI: fix show_modalias() function due to attribute change · 87c8a443
Greg Kroah-Hartman authored Jun 19, 2005
```
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
```
87c8a443
[PATCH] USB: fix show_modalias() function due to attribute change · 4893e9d1
Greg Kroah-Hartman authored Jun 19, 2005
```
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
```
4893e9d1

[PATCH] SYSFS: fix PAGE_SIZE check · 9d9d27fb

Jon Smirl authored Jun 14, 2005

Without this change I can't set an attribute exactly PAGE_SIZE in
length. There is no need for zero termination because the interface
uses lengths.

From: Jon Smirl <jonsmirl@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

9d9d27fb

[PATCH] Driver core: Don't "lose" devices on suspend on failure · 42b16c05

Benjamin Herrenschmidt authored May 31, 2005

I think we need this patch or we might "lose" devices to the dpm_irq_off
list if a failure occurs during the suspend process.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

42b16c05

[PATCH] sysfs-iattr: set inode attributes · 8215534c

Maneesh Soni authored May 31, 2005

o Following patch sets the attributes for newly allocated inodes for sysfs
objects. If the object has non-default attributes, inode attributes are
set as saved in sysfs_dirent->s_iattr, pointer to struct iattr.
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

8215534c