1. 20 Aug, 2009 1 commit
    • Nick Piggin's avatar
      Signed-off-by: Nick Piggin <npiggin@suse.de> · 34829fca
      Nick Piggin authored
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      34829fca
  2. 24 Aug, 2009 1 commit
    • Nick Piggin's avatar
      On Fri, Aug 21, 2009 at 04:06:59PM +0200, Jan Kara wrote: · 2cba550b
      Nick Piggin authored
      > >   Hi,
      > >
      > > > I also have commented a possible bug in existing ext2 code, marked with XXX.
      > >   Looks good, except:
      > >
      > > > +int ext2_setsize(struct inode *inode, loff_t newsize)
      > >   This could be static.
      > >
      > > > @@ -1459,8 +1540,15 @@ int ext2_setattr(struct dentry *dentry,
      > > >  		if (error)
      > > >  			return error;
      > > >  	}
      > > > -	error = inode_setattr(inode, iattr);
      > > > +	if (iattr->ia_valid & ATTR_SIZE) {
      > > > +		error = ext2_setsize(inode, iattr->ia_size);
      > > > +		if (error)
      > > > +			return error;
      > > > +	}
      > > > +	generic_setattr(inode, iattr);
      > >   Here, we should store the error code I suppose...
      >   Ah, I was confused. generic_setattr() returns void. But then remove
      > the check !error from:
      >   if (!error && (iattr->ia_valid & ATTR_MODE))
      > which just follows the generic_setattr(). That's what made me think
      > generic_setattr() returns something :)
      
      Yep, good suggestion.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2cba550b
  3. 20 Aug, 2009 3 commits
  4. 24 Aug, 2009 2 commits
    • Nick Piggin's avatar
      Introduce a new truncate calling sequence into fs/mm subsystems. Rather · 7d12a880
      Nick Piggin authored
      than setattr > vmtruncate > truncate, have filesystems call their truncate
      sequence from ->setattr if filesystem specific operations are required. 
      vmtruncate is deprecated, and truncate_pagecache and inode_newsize_ok
      helpers introduced previously should be used.
      
      simple_setattr is introduced for simple in-ram filesystems to implement
      the new truncate sequence.  Eventually all filesystems should be converted
      to implement a setattr, and the default code in notify_change should go
      away.
      
      simple_setsize is also introduced to perform just the ATTR_SIZE portion of
      simple_setattr (ie.  changing i_size and trimming pagecache).
      
      A new attribute is introduced into inode_operations structure;
      .new_truncate is a temporary hack to distinguish filesystems that
      implement the new truncate system.
      
      To implement the new truncate sequence:
      - set .new_truncate = 1
      - filesystem specific manipulations (eg freeing blocks) must be done in
        the setattr method rather than ->truncate.
      - vmtruncate can not be used by core code to trim blocks past i_size in
        the event of write failure after allocation, so this must be performed
        in the fs code.
      - make use of the better opportunity to catch errors with the above 2 changes.
      - inode_setattr should not be used. generic_setattr is a new function
        to be used to copy simple attributes into the generic inode.
      
      Big problem with the previous calling sequence: the filesystem is not
      called until i_size has already changed.  This means it is not allowed to
      fail the call, and also it does not know what the previous i_size was. 
      Also, generic code calling vmtruncate to truncate allocated blocks in case
      of error had no good way to return a meaningful error (or, for example,
      atomically handle block deallocation).
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7d12a880
    • Nick Piggin's avatar
      Update some fs code to make use of new helper functions introduced in the · f9ba4d0d
      Nick Piggin authored
      previous patch.  Should be no significant change in behaviour (except CIFS
      now calls send_sig under i_lock, via inode_newsize_ok).
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Cc: <linux-nfs@vger.kernel.org>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: <linux-cifs-client@lists.samba.org>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f9ba4d0d
  5. 20 Aug, 2009 1 commit
  6. 24 Aug, 2009 3 commits
    • Andrew Morton's avatar
      repair comment layout · 49603288
      Andrew Morton authored
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      49603288
    • Nick Piggin's avatar
      Invalidate sb->s_bdev on remount,ro. · c3cebacd
      Nick Piggin authored
      Fixes a problem reported by Jorge Boncompte who is seeing corruption
      trying to snapshot a minix filesystem image.  Some filesystems modify
      their metadata via a path other than the bdev buffer cache (eg.  they may
      use a private linear mapping for their metadata, or implement directories
      in pagecache, etc).  Also, file data modifications usually go to the bdev
      via their own mappings.
      
      These updates are not coherent with buffercache IO (eg.  via /dev/bdev)
      and never have been.  However there could be a reasonable expectation that
      after a mount -oremount,ro operation then the buffercache should
      subsequently be coherent with previous filesystem modifications.
      
      So invalidate the bdev mappings on a remount,ro operation to provide a
      coherency point.
      
      The problem was exposed when we switched the old rd to brd because old rd
      didn't really function like a normal block device and updates to rd via
      mappings other than the buffercache would still end up going into its
      buffercache.  But the same problem has always affected other "normal"
      block devices, including loop.
      Reported-by: default avatar"Jorge Boncompte [DTI2]" <jorge@dti2.net>
      Tested-by: default avatar"Jorge Boncompte [DTI2]" <jorge@dti2.net>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c3cebacd
    • Nick Piggin's avatar
      Filesystems outside the regular namespace do not have to clear · 52496d2b
      Nick Piggin authored
      DCACHE_UNHASHED in order to have a working /proc/$pid/fd/XXX.  Nothing in
      proc prevents the fd link from being used if its dentry is not in the
      hash.
      
      Also, it does not get put into the dcache hash if DCACHE_UNHASHED is
      clear; that depends on the filesystem calling d_add or d_rehash.
      
      So delete the misleading comments and needless code.
      Acked-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      52496d2b
  7. 19 Aug, 2009 1 commit
  8. 12 Aug, 2009 1 commit
    • Jeff Layton's avatar
      sb->s_maxbytes is supposed to indicate the maximum size of a file that can · 903eccb0
      Jeff Layton authored
      exist on the filesystem.  It's declared as an unsigned long long.
      
      Even if a filesystem has no inherent limit that prevents it from using
      every bit in that unsigned long long, it's still problematic to set it to
      anything larger than MAX_LFS_FILESIZE.  There are places in the kernel
      that cast s_maxbytes to a signed value.  If it's set too large then this
      cast makes it a negative number and generally breaks the comparison.
      
      Change s_maxbytes to be loff_t instead.  That should help eliminate the
      temptation to set it too large by making it a signed value.
      
      Also, add a warning for couple of releases to help catch filesystems that
      set s_maxbytes too large.  Eventually we can either convert this to a
      BUG() or just remove it and in the hope that no one will get it wrong now
      that it's a signed value.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Robert Love <rlove@google.com>
      Cc: Mandeep Singh Baines <msb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      903eccb0
  9. 17 Aug, 2009 1 commit
  10. 24 Aug, 2009 1 commit
    • Roland Dreier's avatar
      > ============================================= · d037023f
      Roland Dreier authored
       >  [ INFO: possible recursive locking detected ]
       >  2.6.31-2-generic #14~rbd3
       >  ---------------------------------------------
       >  firefox-3.5/4162 is trying to acquire lock:
       >   (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
       >
       >  but task is already holding lock:
       >   (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
       >
       >  other info that might help us debug this:
       >  3 locks held by firefox-3.5/4162:
       >   #0:  (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
       >   #1:  (&sb->s_type->i_mutex_key#11/1){+.+.+.}, at: [<ffffffff81139d5a>] lock_rename+0x6a/0xf0
       >   #2:  (&sb->s_type->i_mutex_key#11/2){+.+.+.}, at: [<ffffffff81139d6f>] lock_rename+0x7f/0xf0
       >
       >  stack backtrace:
       >  Pid: 4162, comm: firefox-3.5 Tainted: G         C 2.6.31-2-generic #14~rbd3
       >  Call Trace:
       >   [<ffffffff8108ae74>] print_deadlock_bug+0xf4/0x100
       >   [<ffffffff8108ce26>] validate_chain+0x4c6/0x750
       >   [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430
       >   [<ffffffff8108d585>] lock_acquire+0xa5/0x150
       >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
       >   [<ffffffff815526ad>] __mutex_lock_common+0x4d/0x3d0
       >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
       >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
       >   [<ffffffff8120eaf9>] ? ecryptfs_rename+0x99/0x170
       >   [<ffffffff81552b36>] mutex_lock_nested+0x46/0x60
       >   [<ffffffff81139d31>] lock_rename+0x41/0xf0
       >   [<ffffffff8120eb2a>] ecryptfs_rename+0xca/0x170
       >   [<ffffffff81139a9e>] vfs_rename_dir+0x13e/0x160
       >   [<ffffffff8113ac7e>] vfs_rename+0xee/0x290
       >   [<ffffffff8113c212>] ? __lookup_hash+0x102/0x160
       >   [<ffffffff8113d512>] sys_renameat+0x252/0x280
       >   [<ffffffff81133eb4>] ? cp_new_stat+0xe4/0x100
       >   [<ffffffff8101316a>] ? sysret_check+0x2e/0x69
       >   [<ffffffff8108c34d>] ? trace_hardirqs_on_caller+0x14d/0x190
       >   [<ffffffff8113d55b>] sys_rename+0x1b/0x20
       >   [<ffffffff81013132>] system_call_fastpath+0x16/0x1b
      
      The trace above is totally reproducible by doing a cross-directory
      rename on an ecryptfs directory.
      
      The issue seems to be that sys_renameat() does lock_rename() then calls
      into the filesystem; if the filesystem is ecryptfs, then
      ecryptfs_rename() again does lock_rename() on the lower filesystem, and
      lockdep can't tell that the two s_vfs_rename_mutexes are different.  It
      seems an annotation like the following is sufficient to fix this (it
      does get rid of the lockdep trace in my simple tests); however I would
      like to make sure I'm not misunderstanding the locking, hence the CC
      list...
      Signed-off-by: default avatarRoland Dreier <rdreier@cisco.com>
      Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
      Cc: Dustin Kirkland <kirkland@canonical.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d037023f
  11. 03 Sep, 2009 1 commit
  12. 24 Aug, 2009 2 commits
  13. 22 Aug, 2009 1 commit
    • Vegard Nossum's avatar
      On 2009/6/17 Ingo Molnar <mingo@elte.hu> reported: · 7d6a40be
      Vegard Nossum authored
      >
      > btw., here's an old friend of a warning:
      >
      > async_continuing @ 1 after 0 usec
      > WARNING: kmemcheck: Caught 8-bit read from freed memory (f5f33004)
      > 0040f3f57400686f74706c756700000000000000000000000000000000000000
      >  i i i i f f f f f f f f f f f f f f f f f f f f f f f f f f f f
      >          ^
      >
      > Pid: 1, comm: swapper Not tainted (2.6.30-tip-04303-g5ada65e-dirty #767) P4DC6
      > EIP: 0060:[<c1248df4>] EFLAGS: 00010246 CPU: 0
      > EIP is at exact_copy_from_user+0x64/0x130
      > EAX: 00000000 EBX: 00000001 ECX: 000000f5 EDX: 000000f5
      > ESI: f5fdeffb EDI: f5f33004 EBP: f6c48ee8 ESP: c29598cc
      >  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      > CR0: 8005003b CR2: f6c20044 CR3: 0294d000 CR4: 000006d0
      > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
      > DR6: ffff4ff0 DR7: 00000400
      >  [<c124916a>] copy_mount_options+0xba/0x1c0
      >  [<c124dc0a>] sys_mount+0x1a/0x170
      >  [<c263c937>] do_mount_root+0x27/0xe0
      >  [<c263ca33>] mount_block_root+0x43/0x140
      >  [<c263cc02>] mount_root+0xd2/0x160
      >  [<c263ce49>] prepare_namespace+0x1b9/0x380
      >  [<c263c4c8>] kernel_init+0xb8/0x110
      >  [<c103ab13>] kernel_thread_helper+0x7/0x14
      >  [<ffffffff>] 0xffffffff
      > EXT3-fs: INFO: recovery required on readonly filesystem.
      > EXT3-fs: write access will be enabled during recovery.
      
      sys_mount() reads/copies a whole page for its "type" parameter.  When
      do_mount_root() passes a kernel address that points to an object which is
      smaller than a whole page, copy_mount_options() will happily go past this
      memory object, possibly dereferencing "wild" pointers that could be in any
      state (hence the kmemcheck warning, which shows that parts of the next
      page are not even allocated).
      
      (The likelihood of something going wrong here is pretty low -- first of
      all this only applies to kernel calls to sys_mount(), which are mostly
      found in the boot code.  Secondly, I guess if the page was not mapped,
      exact_copy_from_user() _would_ in fact handle it correctly because of its
      access_ok(), etc.  checks.)
      
      But it is much nicer to avoid the dubious reads altogether, by stopping as
      soon as we find a NUL byte.  Is there a good reason why we can't do
      something like this, using the already existing strndup_from_user()?
      
      [akpm@linux-foundation.org: make copy_mount_string() static]
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7d6a40be
  14. 24 Aug, 2009 2 commits
  15. 10 Jun, 2009 1 commit
  16. 20 Apr, 2009 1 commit
  17. 24 Aug, 2009 2 commits
    • Al Viro's avatar
      RAW_SETBIND and RAW_GETBIND 32bit versions are fscked in interesting ways. · 0c16377a
      Al Viro authored
      1) fs/compat_ioctl.c has COMPATIBLE_IOCTL(RAW_SETBIND) followed by
      HANDLE_IOCTL(RAW_SETBIND, raw_ioctl).  The latter is ignored.
      
      2) on amd64 (and itanic) the damn thing is broken - we have int + u64 + u64
      and layouts on i386 and amd64 are _not_ the same.  raw_ioctl() would
      work there, but it's never called due to (1).  As it is, i386 /sbin/raw
      definitely doesn't work on amd64 boxen.
      
      3) switching to raw_ioctl() as is would *not* work on e.g. sparc64 and ppc64,
      which would be rather sad, seeing that normal userland there is 32bit.
      The thing is, slapping __packed on the struct in question does not DTRT -
      it eliminates *all* padding.  The real solution is to use compat_u64.
      
      4) of course, all that stuff has no business being outside of raw.c in the
      first place - there should be ->compat_ioctl() for /dev/rawctl instead of
      messing with compat_ioctl.c.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0c16377a
    • Miklos Szeredi's avatar
      vfs_rename_dir() doesn't properly account for filesystems with · f5dc5e07
      Miklos Szeredi authored
      FS_RENAME_DOES_D_MOVE.  If new_dentry has a target inode attached, it
      unhashes the new_dentry prior to the rename() iop and rehashes it after,
      but doesn't account for the possibility that rename() may have swapped
      {old,new}_dentry.  For FS_RENAME_DOES_D_MOVE filesystems, it rehashes
      new_dentry (now the old renamed-from name, which d_move() expected to go
      away), such that a subsequent lookup will find it.
      
      This was caught by the recently posted POSIX fstest suite, rename/10.t
      test 62 (and others) on ceph.
      
      The bug was introduced by: commit 349457cc
      "[PATCH] Allow file systems to manually d_move() inside of ->rename()"
      
      Fix by not rehashing the new dentry.  Rehashing used to be needed by
      d_move() but isn't anymore.
      Reported-by: default avatarSage Weil <sage@newdream.net>
      Cc: Zach Brown <zach.brown@oracle.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f5dc5e07
  18. 10 Sep, 2009 1 commit
  19. 11 Sep, 2009 1 commit
  20. 03 Sep, 2009 3 commits
  21. 22 Aug, 2009 1 commit
  22. 13 Aug, 2009 1 commit
  23. 03 Sep, 2009 1 commit
  24. 02 Sep, 2009 2 commits
  25. 13 Sep, 2009 1 commit
  26. 31 Aug, 2009 1 commit
  27. 25 Aug, 2009 1 commit
  28. 24 Aug, 2009 2 commits