- 13 Nov, 2009 1 commit
-
-
Lee Schermerhorn authored
the task modifying the number of persistent huge pages to control the allocation, freeing and adjusting of surplus huge pages when the pool page count is modified via the new sysctl or sysfs attribute "nr_hugepages_mempolicy". The nodes_allowed mask is derived as follows: * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer is produced. This will cause the hugetlb subsystem to use node_online_map as the "nodes_allowed". This preserves the behavior before this patch. * For "preferred" mempolicy, including explicit local allocation, a nodemask with the single preferred node will be produced. "local" policy will NOT track any internode migrations of the task adjusting nr_hugepages. * For "bind" and "interleave" policy, the mempolicy's nodemask will be used. * Other than to inform the construction of the nodes_allowed node mask, the actual mempolicy mode is ignored. That is, all modes behave like interleave over the resulting nodes_allowed mask with no "fallback". See the updated documentation [next patch] for more information about the implications of this patch. Examples: Starting with: Node 0 HugePages_Total: 0 Node 1 HugePages_Total: 0 Node 2 HugePages_Total: 0 Node 3 HugePages_Total: 0 Default behavior [with or without this patch] balances persistent hugepage allocation across nodes [with sufficient contiguous memory]: sysctl vm.nr_hugepages[_mempolicy]=32 yields: Node 0 HugePages_Total: 8 Node 1 HugePages_Total: 8 Node 2 HugePages_Total: 8 Node 3 HugePages_Total: 8 Of course, we only have nr_hugepages_mempolicy with the patch, but with default mempolicy, nr_hugepages_mempolicy behaves the same as nr_hugepages. Applying mempolicy--e.g., with numactl [using '-m' a.k.a. '--membind' because it allows multiple nodes to be specified and it's easy to type]--we can allocate huge pages on individual nodes or sets of nodes. So, starting from the condition above, with 8 huge pages per node, add 8 more to node 2 using: numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40 This yields: Node 0 HugePages_Total: 8 Node 1 HugePages_Total: 8 Node 2 HugePages_Total: 16 Node 3 HugePages_Total: 8 The incremental 8 huge pages were restricted to node 2 by the specified mempolicy. Similarly, we can use mempolicy to free persistent huge pages from specified nodes: numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32 yields: Node 0 HugePages_Total: 4 Node 1 HugePages_Total: 4 Node 2 HugePages_Total: 16 Node 3 HugePages_Total: 8 The 8 huge pages freed were balanced over nodes 0 and 1. [rientjes@google.com: accomodate reworked NODEMASK_ALLOC] Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Oct, 2009 5 commits
-
-
Lee Schermerhorn authored
This will be used to populate the huge pages "nodes_allowed" nodemask for a single node when basing nodes_allowed on a preferred/local mempolicy or when a persistent huge page pool page count is modified via a per node sysfs attribute. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
David Rientjes authored
> @@ -1144,14 +1156,15 @@ static void __init report_hugepages(void > } > > #ifdef CONFIG_HIGHMEM > -static void try_to_free_low(struct hstate *h, unsigned long count) > +static void try_to_free_low(struct hstate *h, unsigned long count, > + nodemask_t *nodes_allowed) > { > int i; > > if (h->order >= MAX_ORDER) > return; > > - for (i = 0; i < MAX_NUMNODES; ++i) { > + for_each_node_mask(node, nodes_allowed_) { > struct page *page, *next; > struct list_head *freel = &h->hugepage_freelists[i]; > list_for_each_entry_safe(page, next, freel, lru) { That's not looking good for i386, Andrew please fold the following into this patch when it's merged into -mm: [rientjes@google.com: fix HIGHMEM compile error] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lee Schermerhorn authored
controlling task's numa mempolicy, add a "nodes_allowed" nodemask pointer to the allocate, free and surplus adjustment functions. For now, pass NULL to indicate default behavior--i.e., use node_online_map. A subsqeuent patch will derive a non-default mask from the controlling task's numa mempolicy. Note that this method of updating the global hstate nr_hugepages under the constraint of a nodemask simplifies keeping the global state consistent--especially the number of persistent and surplus pages relative to reservations and overcommit limits. There are undoubtedly other ways to do this, but this works for both interfaces: mempolicy and per node attributes. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Lee Schermerhorn authored
obtain the "start_nid". Then, whereas prior to this patch we unconditionally called hstate_next_node_to_{alloc|free}(), whether or not we successfully allocated/freed a huge page on the node, now we only call these functions on failure to alloc/free to advance to next allowed node. Factor out the next_node_allowed() function to handle wrap at end of node_online_map. In this version, the allowed nodes include all of the online nodes. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
David Rientjes authored
allocation and freeing of persistent huge pages on a NUMA platform. Please consider for merging into mmotm. This series uses two mechanisms to constrain the nodes from which persistent huge pages are allocated: 1) the task NUMA mempolicy of the task modifying a new sysctl "nr_hugepages_mempolicy", based on a suggestion by Mel Gorman; and 2) a subset of the hugepages hstate sysfs attributes have been added [in V4] to each node system device under: /sys/devices/node/node[0-9]*/hugepages The per node attibutes allow direct assignment of a huge page count on a specific node, regardless of the task's mempolicy or cpuset constraints. This patch: NODEMASK_ALLOC(x, m) assumes x is a type of struct, which is unnecessary. It's perfectly reasonable to use this macro to allocate a nodemask_t, which is anonymous, either dynamically or on the stack depending on NODES_SHIFT. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Nov, 2009 1 commit
-
-
KOSAKI Motohiro authored
in right after isolate_page(). This patch does it. Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 25 Sep, 2009 4 commits
-
-
Wu Fengguang authored
Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
> if (!kbuf) > return wrote ? wrote : -ENOMEM; > while (count > 0) { > - int len = size_inside_page(p, count); > + unsigned long sz = size_inside_page(p, count); > > - written = copy_from_user(kbuf, buf, len); > - if (written) { > + sz = copy_from_user(kbuf, buf, sz); Sorry, it introduced a bug: the "sz" will be zero in normal, > + if (sz) { > if (wrote + virtr) > break; > free_page((unsigned long)kbuf); > return -EFAULT; > } > - len = vwrite(kbuf, (char *)p, len); > + sz = vwrite(kbuf, (char *)p, sz); and get passed to vwrite here. This patch fixes it, the new var "n" will be used in another bug fixing patch following this one. Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 14 Sep, 2009 1 commit
-
-
Andrew Morton authored
Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Sep, 2009 1 commit
-
-
Andrew Morton authored
Cc: Andi Kleen <ak@linux.intel.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 12 Sep, 2009 3 commits
-
-
Wu Fengguang authored
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
Also apply it to /dev/kmem, whose alignment logic was buggy. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Wu Fengguang authored
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 11 Nov, 2009 4 commits
-
-
Vincent Li authored
movement, either. Already, in commit 5343daceec, KOSAKI did it about shrink_inactive_list. This patch removes unnecessary overhead of page accounting and locking in shrink_active_list as follow-up work of commit 5343daceec. Signed-off-by: Vincent Li <macli@brc.ubc.ca> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Andrew Morton authored
#116: FILE: mm/mmap.c:1835: +static int __split_vma(struct mm_struct * mm, struct vm_area_struct * vma, ERROR: "foo * bar" should be "foo *bar" #138: FILE: mm/mmap.c:1888: +int split_vma(struct mm_struct * mm, struct vm_area_struct * vma, total: 2 errors, 0 warnings, 67 lines checked ./patches/mmap-dont-return-enomem-when-mapcount-is-temporarily-exceeded-in-munmap.patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
KOSAKI Motohiro authored
library called abort(). ======================================================== (gdb) bt #0 0xa000000000010620 in __kernel_syscall_via_break () #1 0x20000000003208e0 in raise () from /lib/libc.so.6.1 #2 0x2000000000324090 in abort () from /lib/libc.so.6.1 #3 0x200000000027c3e0 in __deallocate_stack () from /lib/libpthread.so.0 #4 0x200000000027f7c0 in start_thread () from /lib/libpthread.so.0 #5 0x200000000047ef60 in __clone2 () from /lib/libc.so.6.1 ======================================================== The fact is, glibc call munmap() when thread exitng time for freeing stack, and it assume munlock() never fail. However, munmap() often make vma splitting and it with many mapcount make -ENOMEM. Oh well, that's crazy, because stack unmapping never increase mapcount. The maxcount exceeding is only temporary. internal temporary exceeding shouldn't make ENOMEM. This patch does it. test_max_mapcount.c ================================================================== #include<stdio.h> #include<stdlib.h> #include<string.h> #include<pthread.h> #include<errno.h> #include<unistd.h> #define THREAD_NUM 30000 #define MAL_SIZE (8*1024*1024) void *wait_thread(void *args) { void *addr; addr = malloc(MAL_SIZE); sleep(10); return NULL; } void *wait_thread2(void *args) { sleep(60); return NULL; } int main(int argc, char *argv[]) { int i; pthread_t thread[THREAD_NUM], th; int ret, count = 0; pthread_attr_t attr; ret = pthread_attr_init(&attr); if(ret) { perror("pthread_attr_init"); } ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); if(ret) { perror("pthread_attr_setdetachstate"); } for (i = 0; i < THREAD_NUM; i++) { ret = pthread_create(&th, &attr, wait_thread, NULL); if(ret) { fprintf(stderr, "[%d] ", count); perror("pthread_create"); } else { printf("[%d] create OK.\n", count); } count++; ret = pthread_create(&thread[i], &attr, wait_thread2, NULL); if(ret) { fprintf(stderr, "[%d] ", count); perror("pthread_create"); } else { printf("[%d] create OK.\n", count); } count++; } sleep(3600); return 0; } ================================================================== Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Alex Chiang authored
take quite a long time, which is unreasonable considering the user only wants a description of the flags: # time ./page-types -d 0x10 0x0000000000000010 ____D_____________________________ dirty real 0m34.285s user 0m1.966s sys 0m32.313s This is because we still walk the entire address range. Exiting early seems like a reasonble solution: # time ./page-types -d 0x10 0x0000000000000010 ____D_____________________________ dirty real 0m0.007s user 0m0.001s sys 0m0.005s Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Haicheng Li <haicheng.li@intel.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 17 Nov, 2009 1 commit
-
-
Alex Chiang authored
Signed-off-by: Alex Chiang <achiang@hp.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 11 Nov, 2009 1 commit
-
-
Wu Fengguang authored
line. Why is this useful? For instance, if you're using memory hotplug and see this in /var/log/messages: kernel: removing from LRU failed 3836dd0/1/1e00000000000010 It would be nice to decode those page flags without staring at the source. Example usage and output: # Documentation/vm/page-types -d 0x10 0x0000000000000010 ____D_____________________________ dirty # Documentation/vm/page-types -d anon 0x0000000000001000 ____________a_____________________ anonymous # Documentation/vm/page-types -d anon,0x10 0x0000000000001010 ____D_______a_____________________ dirty,anonymous [achiang@hp.com: documentation] Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Haicheng Li <haicheng.li@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 12 Oct, 2009 2 commits
-
-
Roel Kluin authored
will not work. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Tommi Rantala authored
Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 11 Nov, 2009 1 commit
-
-
Mel Gorman authored
unexpected but recoverable situation. A counter records how often this event happens but it is easy to miss that this event has occured at all. This patch warns once when PG_mlocked is set to prompt debuggers to check the counter to see how often it is happening. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 21 Sep, 2009 1 commit
-
-
Hisashi Hifumi authored
is unpluged to improve throughput on especially RAID environment. The normal case is, if page N become uptodate at time T(N), then T(N) <= T(N+1) holds. With RAID (and NFS to some degree), there is no strict ordering, the data arrival time depends on runtime status of individual disks, which breaks that formula. So in do_generic_file_read(), just after submitting the async readahead IO request, the current page may well be uptodate, so the page won't be locked, and the block device won't be implicitly unplugged: if (PageReadahead(page)) page_cache_async_readahead() if (!PageUptodate(page)) goto page_not_up_to_date; //... page_not_up_to_date: lock_page_killable(page); Therefore explicit unplugging can help. Following is the test result with dd. #dd if=testdir/testfile of=/dev/null bs=16384 -2.6.30-rc6 1048576+0 records in 1048576+0 records out 17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s -2.6.30-rc6-patched 1048576+0 records in 1048576+0 records out 17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s (7Disks RAID-0 Array) -2.6.30-rc6 1054976+0 records in 1054976+0 records out 17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s -2.6.30-rc6-patched 1054976+0 records out 17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s (7Disks RAID-5 Array) The patch was found to improve performance with the SCST scsi target driver. See http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel [akpm@linux-foundation.org: unbust comment layout] [akpm@linux-foundation.org: "fix" CONFIG_BLOCK=n] Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: Ronald <intercommit@gmail.com> Cc: Bart Van Assche <bart.vanassche@gmail.com> Cc: Vladislav Bolkhovitin <vst@vlnb.net> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Oct, 2009 1 commit
-
-
David Rientjes authored
and gfp mask, current's cpuset and memory controller, call trace, and VM state information is currently only shown when the oom killer has selected a task to kill. This information is omitted, however, when the oom killer panics either because of panic_on_oom sysctl settings or when no killable task was found. It is still relevant to know crucial pieces of information such as the allocation order and VM state when diagnosing such issues, especially at boot. This patch displays the oom killer header whenever it panics so that bug reports can include pertinent information to debug the issue, if possible. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Nov, 2009 1 commit
-
-
james toy authored
- add -mmN to EXTRAVERSION - Add a marker to make the v4l build environment happier Signed-off-by: Michael Krufky <mkrufky@m1k.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 11 Nov, 2009 1 commit
-
-
Andrew Morton authored
behave well if passed a NULL pointer. So avoid calling it until we have verified that its arg is not NULL. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 30 Sep, 2009 1 commit
-
-
Jaswinder Singh Rajput authored
arch/xtensa/kernel/vectors.S: asm/processor.h is included more than once. arch/xtensa/kernel/vectors.S: asm/ptrace.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Aug, 2009 1 commit
-
-
Christoph Hellwig authored
Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 24 Jul, 2009 1 commit
-
-
Amerigo Wang authored
xtensa_pipe() for xtensa. Signed-off-by: WANG Cong <amwang@redhat.com> Reviewed-by: Johannes Weiner <jw@emlix.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 17 Nov, 2009 1 commit
-
-
Jeff Layton authored
Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 30 Sep, 2009 2 commits
-
-
Sage Weil authored
a bit more descriptive. Fix a few kernel style problems. No functional changes. Cc: Ian Kent <raven@themaw.net> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Signed-off-by: Yehuda Sadeh <yehuda@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Sage Weil authored
the cache is re-populated while waiting for i_mutex, it may find that a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment). Previously, real_lookup() would drop i_mutex and do_revalidate() again. If revalidate failed _again_, however, it would give up with -ENOENT. The problem here that network file systems may be invalidating dentries via server callbacks, e.g. due to concurrent access from another client, and -ENOENT is frequently the wrong answer. This problem has been seen with both Lustre and Ceph. It seems possible to hit this case with NFS as well if the cache lifetime is very short. Instead, we should do_revalidate() while i_mutex is still held. If revalidation fails, we can move on to a ->lookup() and ensure a correct result without worrying about any subsequent races. Note that do_revalidate() is called with i_mutex held elsewhere. For example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(), and possibly others all take the directory i_mutex, and then -> lookup_hash -> __lookup_hash -> cached_lookup -> do_revalidate so this does not introduce any new locking rules for d_revalidate implementations. Yes, the goto is ugly. A cleanup patch follows. Cc: Ian Kent <raven@themaw.net> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Signed-off-by: Yehuda Sadeh <yehuda@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 25 Sep, 2009 1 commit
-
-
Nick Piggin authored
Fixes a problem reported by Jorge Boncompte who is seeing corruption trying to snapshot a minix filesystem image. Some filesystems modify their metadata via a path other than the bdev buffer cache (eg. they may use a private linear mapping for their metadata, or implement directories in pagecache, etc). Also, file data modifications usually go to the bdev via their own mappings. These updates are not coherent with buffercache IO (eg. via /dev/bdev) and never have been. However there could be a reasonable expectation that after a mount -oremount,ro operation then the buffercache should subsequently be coherent with previous filesystem modifications. So invalidate the bdev mappings on a remount,ro operation to provide a coherency point. The problem was exposed when we switched the old rd to brd because old rd didn't really function like a normal block device and updates to rd via mappings other than the buffercache would still end up going into its buffercache. But the same problem has always affected other "normal" block devices, including loop. [akpm@linux-foundation.org: repair comment layout] Reported-by: "Jorge Boncompte [DTI2]" <jorge@dti2.net> Tested-by: "Jorge Boncompte [DTI2]" <jorge@dti2.net> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 11 Nov, 2009 4 commits
-
-
Nick Piggin authored
DCACHE_UNHASHED in order to have a working /proc/$pid/fd/XXX. Nothing in proc prevents the fd link from being used if its dentry is not in the hash. Also, it does not get put into the dcache hash if DCACHE_UNHASHED is clear; that depends on the filesystem calling d_add or d_rehash. So delete the misleading comments and needless code. Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Roland Dreier authored
> [ INFO: possible recursive locking detected ] > 2.6.31-2-generic #14~rbd3 > --------------------------------------------- > firefox-3.5/4162 is trying to acquire lock: > (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0 > > but task is already holding lock: > (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0 > > other info that might help us debug this: > 3 locks held by firefox-3.5/4162: > #0: (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0 > #1: (&sb->s_type->i_mutex_key#11/1){+.+.+.}, at: [<ffffffff81139d5a>] lock_rename+0x6a/0xf0 > #2: (&sb->s_type->i_mutex_key#11/2){+.+.+.}, at: [<ffffffff81139d6f>] lock_rename+0x7f/0xf0 > > stack backtrace: > Pid: 4162, comm: firefox-3.5 Tainted: G C 2.6.31-2-generic #14~rbd3 > Call Trace: > [<ffffffff8108ae74>] print_deadlock_bug+0xf4/0x100 > [<ffffffff8108ce26>] validate_chain+0x4c6/0x750 > [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430 > [<ffffffff8108d585>] lock_acquire+0xa5/0x150 > [<ffffffff81139d31>] ? lock_rename+0x41/0xf0 > [<ffffffff815526ad>] __mutex_lock_common+0x4d/0x3d0 > [<ffffffff81139d31>] ? lock_rename+0x41/0xf0 > [<ffffffff81139d31>] ? lock_rename+0x41/0xf0 > [<ffffffff8120eaf9>] ? ecryptfs_rename+0x99/0x170 > [<ffffffff81552b36>] mutex_lock_nested+0x46/0x60 > [<ffffffff81139d31>] lock_rename+0x41/0xf0 > [<ffffffff8120eb2a>] ecryptfs_rename+0xca/0x170 > [<ffffffff81139a9e>] vfs_rename_dir+0x13e/0x160 > [<ffffffff8113ac7e>] vfs_rename+0xee/0x290 > [<ffffffff8113c212>] ? __lookup_hash+0x102/0x160 > [<ffffffff8113d512>] sys_renameat+0x252/0x280 > [<ffffffff81133eb4>] ? cp_new_stat+0xe4/0x100 > [<ffffffff8101316a>] ? sysret_check+0x2e/0x69 > [<ffffffff8108c34d>] ? trace_hardirqs_on_caller+0x14d/0x190 > [<ffffffff8113d55b>] sys_rename+0x1b/0x20 > [<ffffffff81013132>] system_call_fastpath+0x16/0x1b The trace above is totally reproducible by doing a cross-directory rename on an ecryptfs directory. The issue seems to be that sys_renameat() does lock_rename() then calls into the filesystem; if the filesystem is ecryptfs, then ecryptfs_rename() again does lock_rename() on the lower filesystem, and lockdep can't tell that the two s_vfs_rename_mutexes are different. It seems an annotation like the following is sufficient to fix this (it does get rid of the lockdep trace in my simple tests); however I would like to make sure I'm not misunderstanding the locking, hence the CC list... Signed-off-by: Roland Dreier <rdreier@cisco.com> Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Tony Battersby authored
about needing a prior refcnt (judging by the way it is actually used). Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Al Viro authored
1) fs/compat_ioctl.c has COMPATIBLE_IOCTL(RAW_SETBIND) followed by HANDLE_IOCTL(RAW_SETBIND, raw_ioctl). The latter is ignored. 2) on amd64 (and itanic) the damn thing is broken - we have int + u64 + u64 and layouts on i386 and amd64 are _not_ the same. raw_ioctl() would work there, but it's never called due to (1). As it is, i386 /sbin/raw definitely doesn't work on amd64 boxen. 3) switching to raw_ioctl() as is would *not* work on e.g. sparc64 and ppc64, which would be rather sad, seeing that normal userland there is 32bit. The thing is, slapping __packed on the struct in question does not DTRT - it eliminates *all* padding. The real solution is to use compat_u64. 4) of course, all that stuff has no business being outside of raw.c in the first place - there should be ->compat_ioctl() for /dev/rawctl instead of messing with compat_ioctl.c. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-