1. 13 Oct, 2009 5 commits
    • David Rientjes's avatar
      On Thu, 8 Oct 2009, Lee Schermerhorn wrote: · e651309a
      David Rientjes authored
      > @@ -1144,14 +1156,15 @@ static void __init report_hugepages(void
      >  }
      >
      >  #ifdef CONFIG_HIGHMEM
      > -static void try_to_free_low(struct hstate *h, unsigned long count)
      > +static void try_to_free_low(struct hstate *h, unsigned long count,
      > +						nodemask_t *nodes_allowed)
      >  {
      >  	int i;
      >
      >  	if (h->order >= MAX_ORDER)
      >  		return;
      >
      > -	for (i = 0; i < MAX_NUMNODES; ++i) {
      > +	for_each_node_mask(node, nodes_allowed_) {
      >  		struct page *page, *next;
      >  		struct list_head *freel = &h->hugepage_freelists[i];
      >  		list_for_each_entry_safe(page, next, freel, lru) {
      
      That's not looking good for i386, Andrew please fold the following into
      this patch when it's merged into -mm:
      
      [rientjes@google.com: fix HIGHMEM compile error]
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e651309a
    • Lee Schermerhorn's avatar
      In preparation for constraining huge page allocation and freeing by the · d1945f93
      Lee Schermerhorn authored
      controlling task's numa mempolicy, add a "nodes_allowed" nodemask pointer
      to the allocate, free and surplus adjustment functions.  For now, pass
      NULL to indicate default behavior--i.e., use node_online_map.  A
      subsqeuent patch will derive a non-default mask from the controlling
      task's numa mempolicy.
      
      Note that this method of updating the global hstate nr_hugepages under the
      constraint of a nodemask simplifies keeping the global state
      consistent--especially the number of persistent and surplus pages relative
      to reservations and overcommit limits.  There are undoubtedly other ways
      to do this, but this works for both interfaces: mempolicy and per node
      attributes.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d1945f93
    • Lee Schermerhorn's avatar
      Modify the hstate_next_node* functions to allow them to be called to · 993a43a5
      Lee Schermerhorn authored
      obtain the "start_nid".  Then, whereas prior to this patch we
      unconditionally called hstate_next_node_to_{alloc|free}(), whether or not
      we successfully allocated/freed a huge page on the node, now we only call
      these functions on failure to alloc/free to advance to next allowed node.
      
      Factor out the next_node_allowed() function to handle wrap at end of
      node_online_map.  In this version, the allowed nodes include all of the
      online nodes.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Reviewed-by: default avatarAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      993a43a5
    • David Rientjes's avatar
      This is a series of patches to provide control over the location of the · 9d9f5506
      David Rientjes authored
      allocation and freeing of persistent huge pages on a NUMA platform. 
      Please consider for merging into mmotm.
      
      This series uses two mechanisms to constrain the nodes from which
      persistent huge pages are allocated: 1) the task NUMA mempolicy of the
      task modifying a new sysctl "nr_hugepages_mempolicy", based on a
      suggestion by Mel Gorman; and 2) a subset of the hugepages hstate sysfs
      attributes have been added [in V4] to each node system device under:
      
      	/sys/devices/node/node[0-9]*/hugepages
      
      The per node attibutes allow direct assignment of a huge page count on a
      specific node, regardless of the task's mempolicy or cpuset constraints.  
      
      
      This patch:
      
      NODEMASK_ALLOC(x, m) assumes x is a type of struct, which is unnecessary. 
      It's perfectly reasonable to use this macro to allocate a nodemask_t,
      which is anonymous, either dynamically or on the stack depending on
      NODES_SHIFT.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9d9f5506
    • KOSAKI Motohiro's avatar
      Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed · 4f50eaca
      KOSAKI Motohiro authored
      in right after isolate_page().
      
      This patch does it.
      Acked-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4f50eaca
  2. 25 Sep, 2009 4 commits
  3. 15 Sep, 2009 1 commit
  4. 12 Sep, 2009 4 commits
  5. 04 Sep, 2009 1 commit
  6. 12 Oct, 2009 5 commits
    • Andrew Morton's avatar
      ERROR: "foo * bar" should be "foo *bar" · 9e00c8bd
      Andrew Morton authored
      #116: FILE: mm/mmap.c:1835:
      +static int __split_vma(struct mm_struct * mm, struct vm_area_struct * vma,
      
      ERROR: "foo * bar" should be "foo *bar"
      #138: FILE: mm/mmap.c:1888:
      +int split_vma(struct mm_struct * mm, struct vm_area_struct * vma,
      
      total: 2 errors, 0 warnings, 67 lines checked
      
      ./patches/mmap-dont-return-enomem-when-mapcount-is-temporarily-exceeded-in-munmap.patch has style problems, please review.  If any of these errors
      are false positives report them to the maintainer, see
      CHECKPATCH in MAINTAINERS.
      
      Please run checkpatch prior to sending patches
      
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9e00c8bd
    • KOSAKI Motohiro's avatar
      On ia64, the following test program exit abnormally, because glibc thread · 6c431a13
      KOSAKI Motohiro authored
      library called abort().
      
       ========================================================
       (gdb) bt
       #0  0xa000000000010620 in __kernel_syscall_via_break ()
       #1  0x20000000003208e0 in raise () from /lib/libc.so.6.1
       #2  0x2000000000324090 in abort () from /lib/libc.so.6.1
       #3  0x200000000027c3e0 in __deallocate_stack () from /lib/libpthread.so.0
       #4  0x200000000027f7c0 in start_thread () from /lib/libpthread.so.0
       #5  0x200000000047ef60 in __clone2 () from /lib/libc.so.6.1
       ========================================================
      
      The fact is, glibc call munmap() when thread exitng time for freeing
      stack, and it assume munlock() never fail.  However, munmap() often make
      vma splitting and it with many mapcount make -ENOMEM.
      
      Oh well, that's crazy, because stack unmapping never increase mapcount. 
      The maxcount exceeding is only temporary.  internal temporary exceeding
      shouldn't make ENOMEM.
      
      This patch does it.
      
       test_max_mapcount.c
       ==================================================================
        #include<stdio.h>
        #include<stdlib.h>
        #include<string.h>
        #include<pthread.h>
        #include<errno.h>
        #include<unistd.h>
      
        #define THREAD_NUM 30000
        #define MAL_SIZE (8*1024*1024)
      
       void *wait_thread(void *args)
       {
       	void *addr;
      
       	addr = malloc(MAL_SIZE);
       	sleep(10);
      
       	return NULL;
       }
      
       void *wait_thread2(void *args)
       {
       	sleep(60);
      
       	return NULL;
       }
      
       int main(int argc, char *argv[])
       {
       	int i;
       	pthread_t thread[THREAD_NUM], th;
       	int ret, count = 0;
       	pthread_attr_t attr;
      
       	ret = pthread_attr_init(&attr);
       	if(ret) {
       		perror("pthread_attr_init");
       	}
      
       	ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
       	if(ret) {
       		perror("pthread_attr_setdetachstate");
       	}
      
       	for (i = 0; i < THREAD_NUM; i++) {
       		ret = pthread_create(&th, &attr, wait_thread, NULL);
       		if(ret) {
       			fprintf(stderr, "[%d] ", count);
       			perror("pthread_create");
       		} else {
       			printf("[%d] create OK.\n", count);
       		}
       		count++;
      
       		ret = pthread_create(&thread[i], &attr, wait_thread2, NULL);
       		if(ret) {
       			fprintf(stderr, "[%d] ", count);
       			perror("pthread_create");
       		} else {
       			printf("[%d] create OK.\n", count);
       		}
       		count++;
       	}
      
       	sleep(3600);
       	return 0;
       }
       ==================================================================
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6c431a13
    • Roel Kluin's avatar
      If not signed, testing of the read() return value in this function · e7ff8c38
      Roel Kluin authored
      will not work.
      Signed-off-by: default avatarRoel Kluin <roel.kluin@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e7ff8c38
    • Tommi Rantala's avatar
      Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> · 28a5fc7f
      Tommi Rantala authored
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      28a5fc7f
    • Tommi Rantala's avatar
      Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> · dbd6585d
      Tommi Rantala authored
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dbd6585d
  7. 24 Aug, 2009 1 commit
  8. 22 Sep, 2009 1 commit
    • Hisashi Hifumi's avatar
      I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O · 1a9aa809
      Hisashi Hifumi authored
      is unpluged to improve throughput on especially RAID environment.
      
      The normal case is, if page N become uptodate at time T(N), then T(N) <=
      T(N+1) holds.  With RAID (and NFS to some degree), there is no strict
      ordering, the data arrival time depends on runtime status of individual
      disks, which breaks that formula.  So in do_generic_file_read(), just
      after submitting the async readahead IO request, the current page may well
      be uptodate, so the page won't be locked, and the block device won't be
      implicitly unplugged:
      
                     if (PageReadahead(page))
                              page_cache_async_readahead()
                      if (!PageUptodate(page))
                                      goto page_not_up_to_date;
                      //...
      page_not_up_to_date:
                      lock_page_killable(page);
      
      Therefore explicit unplugging can help.
      
      Following is the test result with dd.
      
      #dd if=testdir/testfile of=/dev/null bs=16384
      
      -2.6.30-rc6
      1048576+0 records in
      1048576+0 records out
      17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s
      
      -2.6.30-rc6-patched
      1048576+0 records in
      1048576+0 records out
      17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s
      
      (7Disks RAID-0 Array)
      
      -2.6.30-rc6
      1054976+0 records in
      1054976+0 records out
      17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s
      
      -2.6.30-rc6-patched
      1054976+0 records out
      17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s
      
      (7Disks RAID-5 Array)
      
      The patch was found to improve performance with the SCST scsi target
      driver.  See
      http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel
      
      [akpm@linux-foundation.org: unbust comment layout]
      [akpm@linux-foundation.org: "fix" CONFIG_BLOCK=n]
      Signed-off-by: default avatarHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Acked-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Tested-by: default avatarRonald <intercommit@gmail.com>
      Cc: Bart Van Assche <bart.vanassche@gmail.com>
      Cc: Vladislav Bolkhovitin <vst@vlnb.net>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1a9aa809
  9. 09 Oct, 2009 1 commit
  10. 28 Oct, 2009 2 commits
  11. 14 Oct, 2009 1 commit
  12. 06 Oct, 2009 1 commit
  13. 30 Sep, 2009 1 commit
  14. 24 Sep, 2009 1 commit
    • Feng Tang's avatar
      Recent hrtimer code will set the start info to a hrtimer only when that · 68ad718d
      Feng Tang authored
      flag is set, then the start info of all hrtimers will always be
      uninitialised before a "echo 1 > /proc/timer_stats", thus the
      /proc/timer_lists will have something like:
      
      active timers:
       #0: <c27d46b0>, tick_sched_timer, S:01, <(null)>, /-1
       # expires at 91062000000-91062000000 nsecs [in 156071 to 156071 nsecs]
       #1: <efb81b6c>, hrtimer_wakeup, S:01, <(null)>, /-1
       # expires at 91062300331-91062350331 nsecs [in 456402 to 506402 nsecs]
       #2: <efac9b6c>, hrtimer_wakeup, S:01, <(null)>, /-1
       # expires at 91068699811-91068749811 nsecs [in 6855882 to 6905882 nsecs]
       #3: <efacdb6c>, hrtimer_wakeup, S:01, <(null)>, /-1
       # expires at 91068755511-91068805511 nsecs [in 6911582 to 6961582 nsecs]
       #4: <efa95b6c>, hrtimer_wakeup, S:01, <(null)>, /-1
       # expires at 91068806066-91068856066 nsecs [in 6962137 to 7012137 nsecs]
       .....
      
      This patch fixes it.
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      68ad718d
  15. 16 Oct, 2009 1 commit
  16. 24 Sep, 2009 1 commit
    • Alexander Strakh's avatar
      Driver scsi_lib.c might sleep in atomic context, because it calls · c1b57296
      Alexander Strakh authored
      scsi_device_put under spin_lock_irqsave.
      
      drivers/scsi/scsi_lib.c:356:
      	spin_lock_irqsave(shost->host_lock, flags);
      	scsi_device_put(sdev);
      Path to might_sleep macro from scsi_device_put:
      1. scsi_device_put calls put_device at ./drivers/scsi/scsi.c:1111
      2. put_device calls kobject_put at ./drivers/base/core.c:1038
      3. kobject_put calls kref_put at ./lib/kobject.c
      4. kref_put may call callback function kobject_release at ./lib/kref.c if
      refcount becomes zero, which might_sleep because it calls user event. Details:
      	4.1 kobject_cleanup calls kobject_uevent at ./lib/kobject.c:555
      	4.2 kobject_uevent calls kobject_uevent_env at  ./lib/kobject_uevent.c:282
      	4.3 kobject_uevent_env calls call_usermodehelper_exec at
      ./include/linux/kmod.h:83
      	4.4 call_usermodehelper_exec calls wait_for_completion at
      ./kernel/kmod.c:481
      	4.5 wait_for_completion calls wait_for_common at ./kernel/sched.c:5710
      	4.5 wait_for_common calls might_sleep at ./kernels/sched.c:5692
      
      Found by Linux Driver Verification project.
      
      Delete wrong sleeping function calls.
      Signed-off-by: default avatarAlexander Strakh <strakh@ispras.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c1b57296
  17. 25 Sep, 2009 1 commit
  18. 06 Oct, 2009 1 commit
  19. 01 Oct, 2009 1 commit
  20. 29 Sep, 2009 1 commit
  21. 13 Oct, 2009 3 commits
  22. 31 Oct, 2009 1 commit
  23. 16 Oct, 2009 1 commit