1. 24 Aug, 2009 4 commits
  2. 13 Aug, 2009 1 commit
  3. 23 Jul, 2009 1 commit
  4. 13 Aug, 2009 1 commit
  5. 23 Jul, 2009 1 commit
  6. 13 Aug, 2009 1 commit
  7. 23 Jul, 2009 1 commit
  8. 13 Jul, 2009 1 commit
  9. 13 Aug, 2009 1 commit
  10. 12 Aug, 2009 1 commit
  11. 31 Jul, 2009 1 commit
  12. 23 Jul, 2009 1 commit
  13. 13 Aug, 2009 1 commit
  14. 04 Aug, 2009 1 commit
  15. 23 Jul, 2009 1 commit
  16. 13 Aug, 2009 1 commit
  17. 14 Feb, 2009 2 commits
  18. 13 Sep, 2009 1 commit
  19. 14 Sep, 2009 1 commit
    • Wu Fengguang's avatar
      > @@ -547,20 +541,20 @@ static ssize_t write_kmem(struct file * · 7f61d18b
      Wu Fengguang authored
      >  		if (!kbuf)
      >  			return wrote ? wrote : -ENOMEM;
      >  		while (count > 0) {
      > -			int len = size_inside_page(p, count);
      > +			unsigned long sz = size_inside_page(p, count);
      >
      > -			written = copy_from_user(kbuf, buf, len);
      > -			if (written) {
      > +			sz = copy_from_user(kbuf, buf, sz);
      
      Sorry, it introduced a bug: the "sz" will be zero in normal,
      
      > +			if (sz) {
      >  				if (wrote + virtr)
      >  					break;
      >  				free_page((unsigned long)kbuf);
      >  				return -EFAULT;
      >  			}
      > -			len = vwrite(kbuf, (char *)p, len);
      > +			sz = vwrite(kbuf, (char *)p, sz);
      
      and get passed to vwrite here.
      
      This patch fixes it, the new var "n" will be used in another bug
      fixing patch following this one.
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7f61d18b
  20. 13 Sep, 2009 2 commits
  21. 14 Sep, 2009 1 commit
  22. 12 Sep, 2009 5 commits
  23. 11 Sep, 2009 1 commit
    • Lee Schermerhorn's avatar
      We noticed very erratic behavior [throughput] with the AIM7 shared · 87e1a47d
      Lee Schermerhorn authored
      workload running on recent distro [SLES11] and mainline kernels on an
      8-socket, 32-core, 256GB x86_64 platform.  On the SLES11 kernel
      [2.6.27.19+] with Barcelona processors, as we increased the load [10s of
      thousands of tasks], the throughput would vary between two "plateaus"--one
      at ~65K jobs per minute and one at ~130K jpm.  The simple patch below
      causes the results to smooth out at the ~130k plateau.
      
      But wait, there's more:
      
      We do not see this behavior on smaller platforms--e.g., 4 socket/8 core. 
      This could be the result of the larger number of cpus on the larger
      platform--a scalability issue--or it could be the result of the larger
      number of interconnect "hops" between some nodes in this platform and how
      the tasks for a given load end up distributed over the nodes' cpus and
      memories--a stochastic NUMA effect.
      
      The variability in the results are less pronounced [on the same platform]
      with Shanghai processors and with mainline kernels.  With 31-rc6 on
      Shanghai processors and 288 file systems on 288 fibre attached storage
      volumes, the curves [jpm vs load] are both quite flat with the patched
      kernel consistently producing ~3.9% better throughput [~80K jpm vs ~77K
      jpm] than the unpatched kernel.
      
      Profiling indicated that the "slow" runs were incurring high[er]
      contention on an anon_vma lock in vma_adjust(), apparently called from the
      sbrk() system call.
      
      The patch:
      
      A comment in mm/mmap.c:vma_adjust() suggests that we don't really need the
      anon_vma lock when we're only adjusting the end of a vma, as is the case
      for brk().  The comment questions whether it's worth while to optimize for
      this case.  Apparently, on the newer, larger x86_64 platforms, with
      interesting NUMA topologies, it is worth while--especially considering
      that the patch [if correct!] is quite simple.
      
      We can detect this condition--no overlap with next vma--by noting a NULL
      "importer".  The anon_vma pointer will also be NULL in this case, so
      simply avoid loading vma->anon_vma to avoid the lock.  However, we
      apparently DO need to take the anon_vma lock when we're inserting a vma
      ['insert' non-NULL] even when we have no overlap [NULL "importer"], so we
      need to check for 'insert', as well.
      
      I have tested with and without the 'file || ' test in the patch.  This
      does not seem to matter for stability nor performance.  I left this
      check/filter in, so we only optimize away the anon_vma lock acquisition
      when adjusting the end of a non- importing, non-inserting, anon vma.
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      87e1a47d
  24. 10 Sep, 2009 2 commits
    • David Miller's avatar
      This is necessary to make the mmap ring buffer work properly on platforms · 9ba02e11
      David Miller authored
      where D-cache aliasing is an issue.
      
      vmalloc_user() ensures that the kernel side mapping is SHMLBA aligned, and
      on platforms with D-cache aliasing matters the presence of VM_SHARED will
      similarly SHMLBA align the user side mapping.
      
      Thus the kernel and the user will be writing to the same D-cache aliases
      and we'll avoid inconsistencies and corruption.
      
      The only trick with this change is that vfree() cannot be invoked from
      interrupt context, and thus it's not allowed from RCU callbacks.
      
      We deal with this by using schedule_work().
      
      Since the ring buffer is now completely linear even on the kernel side,
      several simplifications are probably now possible in the code where we add
      entries to the ring.
      
      With help from Peter Zijlstra.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9ba02e11
    • David Miller's avatar
      When a vmalloc'd area is mmap'd into userspace, some kind of co-ordination · eb7cc917
      David Miller authored
      is necessary for this to work on platforms with cpu D-caches which can
      have aliases.
      
      Otherwise kernel side writes won't be seen properly in userspace and vice
      versa.
      
      If the kernel side mapping and the user side one have the same alignment,
      modulo SHMLBA, this can work as long as VM_SHARED is shared of VMA and for
      all current users this is true.  VM_SHARED will force SHMLBA alignment of
      the user side mmap on platforms with D-cache aliasing matters.
      
      The bulk of this patch is just making it so that a specific alignment can
      be passed down into __get_vm_area_node().  All existing callers pass in
      '1' which preserves existing behavior.  vmalloc_user() gives SHMLBA for
      the alignment.
      
      As a side effect this should get the video media drivers and other
      vmalloc_user() users into more working shape on such systems.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eb7cc917
  25. 09 Sep, 2009 6 commits
    • Hugh Dickins's avatar
      CONFIG_SHMEM off gives you (ramfs masquerading as) tmpfs, even when · 3979bd5c
      Hugh Dickins authored
      CONFIG_TMPFS is off: that's a little anomalous, and I'd intended to make
      more sense of it by removing CONFIG_TMPFS altogether, always enabling its
      code when CONFIG_SHMEM; but so many defconfigs have CONFIG_SHMEM on
      CONFIG_TMPFS off that we'd better leave that as is.
      
      But there is no point in asking for CONFIG_TMPFS if CONFIG_SHMEM is off:
      make TMPFS depend on SHMEM, which also prevents TMPFS_POSIX_ACL
      shmem_acl.o being pointlessly built into the kernel when SHMEM is off.
      
      And a selfish change, to prevent the world from being rebuilt when I
      switch between CONFIG_SHMEM on and off: the only CONFIG_SHMEM in the
      header files is mm.h shmem_lock() - give that a shmem.c stub instead.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3979bd5c
    • Huang Shijie's avatar
      If (flags & MAP_LOCKED) is true, it means vm_flags has already contained · d50281b9
      Huang Shijie authored
      the bit VM_LOCKED which is set by calc_vm_flag_bits().
      
      So there is no need to reset it again, just remove it.
      Signed-off-by: default avatarHuang Shijie <shijie8@gmail.com>
      Acked-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d50281b9
    • Hugh Dickins's avatar
      __get_user_pages() has been taking its own GUP flags, then processing · 02941858
      Hugh Dickins authored
      them into FOLL flags for follow_page().  Though oddly named, the FOLL
      flags are more widely used, so pass them to __get_user_pages() now.
      Sorry, VM flags, VM_FAULT flags and FAULT_FLAGs are still distinct.
      
      (The patch to __get_user_pages() looks peculiar, with both gup_flags
      and foll_flags: the gup_flags remain constant; but as before there's
      an exceptional case, out of scope of the patch, in which foll_flags
      per page have FOLL_WRITE masked off.)
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      02941858
    • Hugh Dickins's avatar
      KAMEZAWA Hiroyuki has observed customers of earlier kernels taking · 2a952ef0
      Hugh Dickins authored
      advantage of the ZERO_PAGE: which we stopped do_anonymous_page() from
      using in 2.6.24.  And there were a couple of regression reports on LKML.
      
      Following suggestions from Linus, reinstate do_anonymous_page() use of
      the ZERO_PAGE; but this time avoid dirtying its struct page cacheline
      with (map)count updates - let vm_normal_page() regard it as abnormal.
      
      Use it only on arches which __HAVE_ARCH_PTE_SPECIAL (x86, s390, sh32,
      most powerpc): that's not essential, but minimizes additional branches
      (keeping them in the unlikely pte_special case); and incidentally
      excludes mips (some models of which needed eight colours of ZERO_PAGE
      to avoid costly exceptions).
      
      Don't be fanatical about avoiding ZERO_PAGE updates: get_user_pages()
      callers won't want to make exceptions for it, so increment its count
      there.  Changes to mlock and migration? happily seems not needed.
      
      In most places it's quicker to check pfn than struct page address:
      prepare a __read_mostly zero_pfn for that.  Does get_dump_page()
      still need its ZERO_PAGE check? probably not, but keep it anyway.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2a952ef0
    • Hugh Dickins's avatar
      do_anonymous_page() has been wrong to dirty the pte regardless. · 97f76f91
      Hugh Dickins authored
      If it's not going to mark the pte writable, then it won't help
      to mark it dirty here, and clogs up memory with pages which will
      need swap instead of being thrown away.  Especially wrong if no
      overcommit is chosen, and this vma is not yet VM_ACCOUNTed -
      we could exceed the limit and OOM despite no overcommit.
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: <stable@kernel.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      97f76f91
    • Hugh Dickins's avatar
      follow_hugetlb_page() shouldn't be guessing about the coredump case · 3013b510
      Hugh Dickins authored
      either: pass the foll_flags down to it, instead of just the write bit.
      
      Remove that obscure huge_zeropage_ok() test.  The decision is easy,
      though unlike the non-huge case - here vm_ops->fault is always set.
      But we know that a fault would serve up zeroes, unless there's
      already a hugetlbfs pagecache page to back the range.
      
      (Alternatively, since hugetlb pages aren't swapped out under pressure,
      you could save more dump space by arguing that a page not yet faulted
      into this process cannot be relevant to the dump; but that would be
      more surprising.)
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3013b510