1. 12 Oct, 2009 4 commits
    • KOSAKI Motohiro's avatar
      On ia64, the following test program exit abnormally, because glibc thread · 6c431a13
      KOSAKI Motohiro authored
      library called abort().
      
       ========================================================
       (gdb) bt
       #0  0xa000000000010620 in __kernel_syscall_via_break ()
       #1  0x20000000003208e0 in raise () from /lib/libc.so.6.1
       #2  0x2000000000324090 in abort () from /lib/libc.so.6.1
       #3  0x200000000027c3e0 in __deallocate_stack () from /lib/libpthread.so.0
       #4  0x200000000027f7c0 in start_thread () from /lib/libpthread.so.0
       #5  0x200000000047ef60 in __clone2 () from /lib/libc.so.6.1
       ========================================================
      
      The fact is, glibc call munmap() when thread exitng time for freeing
      stack, and it assume munlock() never fail.  However, munmap() often make
      vma splitting and it with many mapcount make -ENOMEM.
      
      Oh well, that's crazy, because stack unmapping never increase mapcount. 
      The maxcount exceeding is only temporary.  internal temporary exceeding
      shouldn't make ENOMEM.
      
      This patch does it.
      
       test_max_mapcount.c
       ==================================================================
        #include<stdio.h>
        #include<stdlib.h>
        #include<string.h>
        #include<pthread.h>
        #include<errno.h>
        #include<unistd.h>
      
        #define THREAD_NUM 30000
        #define MAL_SIZE (8*1024*1024)
      
       void *wait_thread(void *args)
       {
       	void *addr;
      
       	addr = malloc(MAL_SIZE);
       	sleep(10);
      
       	return NULL;
       }
      
       void *wait_thread2(void *args)
       {
       	sleep(60);
      
       	return NULL;
       }
      
       int main(int argc, char *argv[])
       {
       	int i;
       	pthread_t thread[THREAD_NUM], th;
       	int ret, count = 0;
       	pthread_attr_t attr;
      
       	ret = pthread_attr_init(&attr);
       	if(ret) {
       		perror("pthread_attr_init");
       	}
      
       	ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
       	if(ret) {
       		perror("pthread_attr_setdetachstate");
       	}
      
       	for (i = 0; i < THREAD_NUM; i++) {
       		ret = pthread_create(&th, &attr, wait_thread, NULL);
       		if(ret) {
       			fprintf(stderr, "[%d] ", count);
       			perror("pthread_create");
       		} else {
       			printf("[%d] create OK.\n", count);
       		}
       		count++;
      
       		ret = pthread_create(&thread[i], &attr, wait_thread2, NULL);
       		if(ret) {
       			fprintf(stderr, "[%d] ", count);
       			perror("pthread_create");
       		} else {
       			printf("[%d] create OK.\n", count);
       		}
       		count++;
       	}
      
       	sleep(3600);
       	return 0;
       }
       ==================================================================
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6c431a13
    • Roel Kluin's avatar
      If not signed, testing of the read() return value in this function · e7ff8c38
      Roel Kluin authored
      will not work.
      Signed-off-by: default avatarRoel Kluin <roel.kluin@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e7ff8c38
    • Tommi Rantala's avatar
      Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> · 28a5fc7f
      Tommi Rantala authored
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      28a5fc7f
    • Tommi Rantala's avatar
      Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> · dbd6585d
      Tommi Rantala authored
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dbd6585d
  2. 24 Aug, 2009 1 commit
  3. 22 Sep, 2009 1 commit
    • Hisashi Hifumi's avatar
      I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O · 1a9aa809
      Hisashi Hifumi authored
      is unpluged to improve throughput on especially RAID environment.
      
      The normal case is, if page N become uptodate at time T(N), then T(N) <=
      T(N+1) holds.  With RAID (and NFS to some degree), there is no strict
      ordering, the data arrival time depends on runtime status of individual
      disks, which breaks that formula.  So in do_generic_file_read(), just
      after submitting the async readahead IO request, the current page may well
      be uptodate, so the page won't be locked, and the block device won't be
      implicitly unplugged:
      
                     if (PageReadahead(page))
                              page_cache_async_readahead()
                      if (!PageUptodate(page))
                                      goto page_not_up_to_date;
                      //...
      page_not_up_to_date:
                      lock_page_killable(page);
      
      Therefore explicit unplugging can help.
      
      Following is the test result with dd.
      
      #dd if=testdir/testfile of=/dev/null bs=16384
      
      -2.6.30-rc6
      1048576+0 records in
      1048576+0 records out
      17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s
      
      -2.6.30-rc6-patched
      1048576+0 records in
      1048576+0 records out
      17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s
      
      (7Disks RAID-0 Array)
      
      -2.6.30-rc6
      1054976+0 records in
      1054976+0 records out
      17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s
      
      -2.6.30-rc6-patched
      1054976+0 records out
      17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s
      
      (7Disks RAID-5 Array)
      
      The patch was found to improve performance with the SCST scsi target
      driver.  See
      http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel
      
      [akpm@linux-foundation.org: unbust comment layout]
      [akpm@linux-foundation.org: "fix" CONFIG_BLOCK=n]
      Signed-off-by: default avatarHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Acked-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Tested-by: default avatarRonald <intercommit@gmail.com>
      Cc: Bart Van Assche <bart.vanassche@gmail.com>
      Cc: Vladislav Bolkhovitin <vst@vlnb.net>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1a9aa809
  4. 09 Oct, 2009 1 commit
  5. 28 Oct, 2009 2 commits
  6. 14 Oct, 2009 1 commit
  7. 06 Oct, 2009 1 commit
  8. 30 Sep, 2009 1 commit
  9. 24 Sep, 2009 1 commit
    • Feng Tang's avatar
      Recent hrtimer code will set the start info to a hrtimer only when that · 68ad718d
      Feng Tang authored
      flag is set, then the start info of all hrtimers will always be
      uninitialised before a "echo 1 > /proc/timer_stats", thus the
      /proc/timer_lists will have something like:
      
      active timers:
       #0: <c27d46b0>, tick_sched_timer, S:01, <(null)>, /-1
       # expires at 91062000000-91062000000 nsecs [in 156071 to 156071 nsecs]
       #1: <efb81b6c>, hrtimer_wakeup, S:01, <(null)>, /-1
       # expires at 91062300331-91062350331 nsecs [in 456402 to 506402 nsecs]
       #2: <efac9b6c>, hrtimer_wakeup, S:01, <(null)>, /-1
       # expires at 91068699811-91068749811 nsecs [in 6855882 to 6905882 nsecs]
       #3: <efacdb6c>, hrtimer_wakeup, S:01, <(null)>, /-1
       # expires at 91068755511-91068805511 nsecs [in 6911582 to 6961582 nsecs]
       #4: <efa95b6c>, hrtimer_wakeup, S:01, <(null)>, /-1
       # expires at 91068806066-91068856066 nsecs [in 6962137 to 7012137 nsecs]
       .....
      
      This patch fixes it.
      Signed-off-by: default avatarFeng Tang <feng.tang@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      68ad718d
  10. 16 Oct, 2009 1 commit
  11. 24 Sep, 2009 1 commit
    • Alexander Strakh's avatar
      Driver scsi_lib.c might sleep in atomic context, because it calls · c1b57296
      Alexander Strakh authored
      scsi_device_put under spin_lock_irqsave.
      
      drivers/scsi/scsi_lib.c:356:
      	spin_lock_irqsave(shost->host_lock, flags);
      	scsi_device_put(sdev);
      Path to might_sleep macro from scsi_device_put:
      1. scsi_device_put calls put_device at ./drivers/scsi/scsi.c:1111
      2. put_device calls kobject_put at ./drivers/base/core.c:1038
      3. kobject_put calls kref_put at ./lib/kobject.c
      4. kref_put may call callback function kobject_release at ./lib/kref.c if
      refcount becomes zero, which might_sleep because it calls user event. Details:
      	4.1 kobject_cleanup calls kobject_uevent at ./lib/kobject.c:555
      	4.2 kobject_uevent calls kobject_uevent_env at  ./lib/kobject_uevent.c:282
      	4.3 kobject_uevent_env calls call_usermodehelper_exec at
      ./include/linux/kmod.h:83
      	4.4 call_usermodehelper_exec calls wait_for_completion at
      ./kernel/kmod.c:481
      	4.5 wait_for_completion calls wait_for_common at ./kernel/sched.c:5710
      	4.5 wait_for_common calls might_sleep at ./kernels/sched.c:5692
      
      Found by Linux Driver Verification project.
      
      Delete wrong sleeping function calls.
      Signed-off-by: default avatarAlexander Strakh <strakh@ispras.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c1b57296
  12. 25 Sep, 2009 1 commit
  13. 06 Oct, 2009 1 commit
  14. 01 Oct, 2009 1 commit
  15. 29 Sep, 2009 1 commit
  16. 13 Oct, 2009 3 commits
  17. 31 Oct, 2009 1 commit
  18. 16 Oct, 2009 1 commit
  19. 29 Oct, 2009 1 commit
  20. 30 Sep, 2009 1 commit
  21. 13 Aug, 2009 1 commit
  22. 24 Jul, 2009 1 commit
  23. 30 Sep, 2009 2 commits
    • Sage Weil's avatar
      Get rid of the goto by flipping the if (!result) over. Make the comments · b5f53b38
      Sage Weil authored
      a bit more descriptive.  Fix a few kernel style problems.  No functional
      changes.
      
      Cc: Ian Kent <raven@themaw.net>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Signed-off-by: default avatarYehuda Sadeh <yehuda@newdream.net>
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b5f53b38
    • Sage Weil's avatar
      real_lookup() is called by do_lookup() if dentry revalidation fails. If · 6220eef6
      Sage Weil authored
      the cache is re-populated while waiting for i_mutex, it may find that a
      d_lookup() subsequently succeeds (see the "Uhhuh!  Nasty case" comment).
      
      Previously, real_lookup() would drop i_mutex and do_revalidate() again. 
      If revalidate failed _again_, however, it would give up with -ENOENT.  The
      problem here that network file systems may be invalidating dentries via
      server callbacks, e.g.  due to concurrent access from another client, and
      -ENOENT is frequently the wrong answer.
      
      This problem has been seen with both Lustre and Ceph.  It seems possible
      to hit this case with NFS as well if the cache lifetime is very short.
      
      Instead, we should do_revalidate() while i_mutex is still held.  If
      revalidation fails, we can move on to a ->lookup() and ensure a correct
      result without worrying about any subsequent races.
      
      Note that do_revalidate() is called with i_mutex held elsewhere.  For
      example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(), and
      possibly others all take the directory i_mutex, and then
      
      -> lookup_hash
              -> __lookup_hash
                      -> cached_lookup
                              -> do_revalidate
      
      so this does not introduce any new locking rules for d_revalidate
      implementations.
      
      Yes, the goto is ugly.  A cleanup patch follows.
      
      Cc: Ian Kent <raven@themaw.net>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Signed-off-by: default avatarYehuda Sadeh <yehuda@newdream.net>
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6220eef6
  24. 24 Sep, 2009 2 commits
    • Nick Piggin's avatar
      Invalidate sb->s_bdev on remount,ro. · d7dde392
      Nick Piggin authored
      Fixes a problem reported by Jorge Boncompte who is seeing corruption
      trying to snapshot a minix filesystem image.  Some filesystems modify
      their metadata via a path other than the bdev buffer cache (eg.  they may
      use a private linear mapping for their metadata, or implement directories
      in pagecache, etc).  Also, file data modifications usually go to the bdev
      via their own mappings.
      
      These updates are not coherent with buffercache IO (eg.  via /dev/bdev)
      and never have been.  However there could be a reasonable expectation that
      after a mount -oremount,ro operation then the buffercache should
      subsequently be coherent with previous filesystem modifications.
      
      So invalidate the bdev mappings on a remount,ro operation to provide a
      coherency point.
      
      The problem was exposed when we switched the old rd to brd because old rd
      didn't really function like a normal block device and updates to rd via
      mappings other than the buffercache would still end up going into its
      buffercache.  But the same problem has always affected other "normal"
      block devices, including loop.
      
      [akpm@linux-foundation.org: repair comment layout]
      Reported-by: default avatar"Jorge Boncompte [DTI2]" <jorge@dti2.net>
      Tested-by: default avatar"Jorge Boncompte [DTI2]" <jorge@dti2.net>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d7dde392
    • Nick Piggin's avatar
      Filesystems outside the regular namespace do not have to clear · 4bca9bd4
      Nick Piggin authored
      DCACHE_UNHASHED in order to have a working /proc/$pid/fd/XXX.  Nothing in
      proc prevents the fd link from being used if its dentry is not in the
      hash.
      
      Also, it does not get put into the dcache hash if DCACHE_UNHASHED is
      clear; that depends on the filesystem calling d_add or d_rehash.
      
      So delete the misleading comments and needless code.
      Acked-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4bca9bd4
  25. 25 Sep, 2009 1 commit
    • Roland Dreier's avatar
      > ============================================= · a470a30a
      Roland Dreier authored
       >  [ INFO: possible recursive locking detected ]
       >  2.6.31-2-generic #14~rbd3
       >  ---------------------------------------------
       >  firefox-3.5/4162 is trying to acquire lock:
       >   (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
       >
       >  but task is already holding lock:
       >   (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
       >
       >  other info that might help us debug this:
       >  3 locks held by firefox-3.5/4162:
       >   #0:  (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
       >   #1:  (&sb->s_type->i_mutex_key#11/1){+.+.+.}, at: [<ffffffff81139d5a>] lock_rename+0x6a/0xf0
       >   #2:  (&sb->s_type->i_mutex_key#11/2){+.+.+.}, at: [<ffffffff81139d6f>] lock_rename+0x7f/0xf0
       >
       >  stack backtrace:
       >  Pid: 4162, comm: firefox-3.5 Tainted: G         C 2.6.31-2-generic #14~rbd3
       >  Call Trace:
       >   [<ffffffff8108ae74>] print_deadlock_bug+0xf4/0x100
       >   [<ffffffff8108ce26>] validate_chain+0x4c6/0x750
       >   [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430
       >   [<ffffffff8108d585>] lock_acquire+0xa5/0x150
       >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
       >   [<ffffffff815526ad>] __mutex_lock_common+0x4d/0x3d0
       >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
       >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
       >   [<ffffffff8120eaf9>] ? ecryptfs_rename+0x99/0x170
       >   [<ffffffff81552b36>] mutex_lock_nested+0x46/0x60
       >   [<ffffffff81139d31>] lock_rename+0x41/0xf0
       >   [<ffffffff8120eb2a>] ecryptfs_rename+0xca/0x170
       >   [<ffffffff81139a9e>] vfs_rename_dir+0x13e/0x160
       >   [<ffffffff8113ac7e>] vfs_rename+0xee/0x290
       >   [<ffffffff8113c212>] ? __lookup_hash+0x102/0x160
       >   [<ffffffff8113d512>] sys_renameat+0x252/0x280
       >   [<ffffffff81133eb4>] ? cp_new_stat+0xe4/0x100
       >   [<ffffffff8101316a>] ? sysret_check+0x2e/0x69
       >   [<ffffffff8108c34d>] ? trace_hardirqs_on_caller+0x14d/0x190
       >   [<ffffffff8113d55b>] sys_rename+0x1b/0x20
       >   [<ffffffff81013132>] system_call_fastpath+0x16/0x1b
      
      The trace above is totally reproducible by doing a cross-directory
      rename on an ecryptfs directory.
      
      The issue seems to be that sys_renameat() does lock_rename() then calls
      into the filesystem; if the filesystem is ecryptfs, then
      ecryptfs_rename() again does lock_rename() on the lower filesystem, and
      lockdep can't tell that the two s_vfs_rename_mutexes are different.  It
      seems an annotation like the following is sufficient to fix this (it
      does get rid of the lockdep trace in my simple tests); however I would
      like to make sure I'm not misunderstanding the locking, hence the CC
      list...
      Signed-off-by: default avatarRoland Dreier <rdreier@cisco.com>
      Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
      Cc: Dustin Kirkland <kirkland@canonical.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a470a30a
  26. 20 Apr, 2009 1 commit
  27. 24 Aug, 2009 1 commit
    • Al Viro's avatar
      RAW_SETBIND and RAW_GETBIND 32bit versions are fscked in interesting ways. · cb8c8dac
      Al Viro authored
      1) fs/compat_ioctl.c has COMPATIBLE_IOCTL(RAW_SETBIND) followed by
      HANDLE_IOCTL(RAW_SETBIND, raw_ioctl).  The latter is ignored.
      
      2) on amd64 (and itanic) the damn thing is broken - we have int + u64 + u64
      and layouts on i386 and amd64 are _not_ the same.  raw_ioctl() would
      work there, but it's never called due to (1).  As it is, i386 /sbin/raw
      definitely doesn't work on amd64 boxen.
      
      3) switching to raw_ioctl() as is would *not* work on e.g. sparc64 and ppc64,
      which would be rather sad, seeing that normal userland there is 32bit.
      The thing is, slapping __packed on the struct in question does not DTRT -
      it eliminates *all* padding.  The real solution is to use compat_u64.
      
      4) of course, all that stuff has no business being outside of raw.c in the
      first place - there should be ->compat_ioctl() for /dev/rawctl instead of
      messing with compat_ioctl.c.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cb8c8dac
  28. 25 Sep, 2009 1 commit
    • Miklos Szeredi's avatar
      vfs_rename_dir() doesn't properly account for filesystems with · 07431498
      Miklos Szeredi authored
      FS_RENAME_DOES_D_MOVE.  If new_dentry has a target inode attached, it
      unhashes the new_dentry prior to the rename() iop and rehashes it after,
      but doesn't account for the possibility that rename() may have swapped
      {old,new}_dentry.  For FS_RENAME_DOES_D_MOVE filesystems, it rehashes
      new_dentry (now the old renamed-from name, which d_move() expected to go
      away), such that a subsequent lookup will find it.
      
      This was caught by the recently posted POSIX fstest suite, rename/10.t
      test 62 (and others) on ceph.
      
      The bug was introduced by: commit 349457cc
      "[PATCH] Allow file systems to manually d_move() inside of ->rename()"
      
      Fix by not rehashing the new dentry.  Rehashing used to be needed by
      d_move() but isn't anymore.
      Reported-by: default avatarSage Weil <sage@newdream.net>
      Cc: Zach Brown <zach.brown@oracle.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      07431498
  29. 06 Oct, 2009 1 commit
  30. 13 Oct, 2009 1 commit
  31. 09 Oct, 2009 1 commit
  32. 10 Oct, 2009 1 commit