1. 29 Jun, 2009 1 commit
  2. 24 Aug, 2009 4 commits
  3. 03 Jun, 2009 2 commits
  4. 02 Jun, 2009 1 commit
    • Hisashi Hifumi's avatar
      I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O · a940f91a
      Hisashi Hifumi authored
      is unpluged to improve throughput on especially RAID environment.
      
      The normal case is, if page N become uptodate at time T(N), then T(N) <=
      T(N+1) holds.  With RAID (and NFS to some degree), there is no strict
      ordering, the data arrival time depends on runtime status of individual
      disks, which breaks that formula.  So in do_generic_file_read(), just
      after submitting the async readahead IO request, the current page may well
      be uptodate, so the page won't be locked, and the block device won't be
      implicitly unplugged:
      
                     if (PageReadahead(page))
                              page_cache_async_readahead()
                      if (!PageUptodate(page))
                                      goto page_not_up_to_date;
                      //...
      page_not_up_to_date:
                      lock_page_killable(page);
      
      Therefore explicit unplugging can help.
      
      Following is the test result with dd.
      
      #dd if=testdir/testfile of=/dev/null bs=16384
      
      -2.6.30-rc6
      1048576+0 records in
      1048576+0 records out
      17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s
      
      -2.6.30-rc6-patched
      1048576+0 records in
      1048576+0 records out
      17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s
      
      (7Disks RAID-0 Array)
      
      -2.6.30-rc6
      1054976+0 records in
      1054976+0 records out
      17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s
      
      -2.6.30-rc6-patched
      1054976+0 records out
      17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s
      
      (7Disks RAID-5 Array)
      Signed-off-by: default avatarHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Acked-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a940f91a
  5. 24 Aug, 2009 4 commits
  6. 20 Aug, 2009 1 commit
  7. 08 Sep, 2009 1 commit
  8. 19 Aug, 2009 1 commit
    • James Toy's avatar
      · 8051620d
      James Toy authored
      The following commit make console open fails while booting:
      
      	commit d966976924119acd35a431adbb95292082f73f8c
      	Author: Alan Cox <alan@linux.intel.com>
      	Date:   Tue Aug 11 10:23:05 2009 +1000
      
      	tty: make the kref destructor occur asynchronously
      
      Due to tty release routines runs in workqueue now, error like following
      will be reported while booting:
      
      INIT open /dev/console Input/output error
      
      The reason is that now there's latency issue with closing, but when we
      open a "closing not finished" tty, -EIO will be returned.
      
      Fix it as alan's following suggestion:
      
      Fun but its actually not a bug and the fix is wrong in itself as the port
      may be closing but not yet being destructed, in which case it seems to do
      the wrong thing.  Opening a tty that is closing (and could be closing for
      long periods) is supposed to return -EIO.
      
      I suspect a better way to deal with this and keep the old console timing
      is to split tty->shutdown into two functions.
      
      tty->shutdown() - called synchronously just before we dump the tty onto
      the waitqueue for destruction
      
      tty->cleanup() - called when the destructor runs.
      
      We would then do the shutdown part which can occur in IRQ context fine,
      before queueing the rest of the release (from tty->magic = 0 ...  the end)
      to occur asynchronously
      
      The USB update in -next would then need a call like
      
             if (tty->cleanup)
                     tty->cleanup(tty);
      
      at the top of the async function and the USB shutdown to be split between
      shutdown and cleanup as the USB resource cleanup and final tidy cannot
      occur synchronously as it needs to sleep.
      
      In other words the logic becomes
      
             final kref put
                     make object unfindable
      
             async
                     clean it up
      Signed-off-by: default avatarDave Young <hidave.darkstar@gmail.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Emmanuel Benisty <benisty.e@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8051620d
  9. 13 Aug, 2009 1 commit
  10. 24 Jul, 2009 1 commit
  11. 09 Sep, 2009 3 commits
  12. 05 Jun, 2009 1 commit
  13. 10 Sep, 2009 1 commit
    • Miklos Szeredi's avatar
      John Johansen pointed out, that getcwd(2) will give a garbled result if a · 8d51b62b
      Miklos Szeredi authored
      bind mount of a non-filesystem-root directory is detached:
      
         > mkdir /mnt/foo
         > mount --bind /etc /mnt/foo
         > cd /mnt/foo/skel
         > umount -l /mnt/foo
         > /bin/pwd
         etcskel
      
      If it was the root of the filesystem which was detached, it will give a
      saner looking result, but it still won't be a valid absolute path by which
      the CWD can be reached (assuming the process's root is not also on the
      detached mount).
      
      A similar issue happens if the CWD is outside the process's root or in a
      different namespace.  These problems are relevant to symlinks under
      /proc/<pid>/ and /proc/<pid>/fd/ as well.
      
      This patch addresses all these issues, by prefixing such unreachable paths
      with "(unreachable)".  This isn't perfect since the returned path may
      still be a valid _relative_ path, and applications may not check the
      result of getcwd() for starting with a '/' before using it.
      
      For this reason Andreas Gruenbacher thinks getcwd(2) should return ENOENT
      in these cases, but that breaks /bin/pwd and bash in the above cases.
      Reported-by: default avatarJohn Johansen <jjohansen@suse.de>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Andreas Gruenbacher <agruen@suse.de>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8d51b62b
  14. 20 Aug, 2009 1 commit
    • Nick Piggin's avatar
      Signed-off-by: Nick Piggin <npiggin@suse.de> · c774ca4d
      Nick Piggin authored
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c774ca4d
  15. 24 Aug, 2009 1 commit
    • Nick Piggin's avatar
      Signed-off-by: Nick Piggin <npiggin@suse.de> · 42d163d0
      Nick Piggin authored
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      42d163d0
  16. 20 Aug, 2009 1 commit
  17. 24 Aug, 2009 1 commit
    • Nick Piggin's avatar
      Signed-off-by: Nick Piggin <npiggin@suse.de> · 2bc602af
      Nick Piggin authored
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2bc602af
  18. 20 Aug, 2009 1 commit
    • Nick Piggin's avatar
      Signed-off-by: Nick Piggin <npiggin@suse.de> · 3475ea42
      Nick Piggin authored
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3475ea42
  19. 24 Aug, 2009 1 commit
    • Nick Piggin's avatar
      On Fri, Aug 21, 2009 at 04:06:59PM +0200, Jan Kara wrote: · 85a3602b
      Nick Piggin authored
      > >   Hi,
      > >
      > > > I also have commented a possible bug in existing ext2 code, marked with XXX.
      > >   Looks good, except:
      > >
      > > > +int ext2_setsize(struct inode *inode, loff_t newsize)
      > >   This could be static.
      > >
      > > > @@ -1459,8 +1540,15 @@ int ext2_setattr(struct dentry *dentry,
      > > >  		if (error)
      > > >  			return error;
      > > >  	}
      > > > -	error = inode_setattr(inode, iattr);
      > > > +	if (iattr->ia_valid & ATTR_SIZE) {
      > > > +		error = ext2_setsize(inode, iattr->ia_size);
      > > > +		if (error)
      > > > +			return error;
      > > > +	}
      > > > +	generic_setattr(inode, iattr);
      > >   Here, we should store the error code I suppose...
      >   Ah, I was confused. generic_setattr() returns void. But then remove
      > the check !error from:
      >   if (!error && (iattr->ia_valid & ATTR_MODE))
      > which just follows the generic_setattr(). That's what made me think
      > generic_setattr() returns something :)
      
      Yep, good suggestion.
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      85a3602b
  20. 20 Aug, 2009 3 commits
  21. 24 Aug, 2009 2 commits
    • Nick Piggin's avatar
      Introduce a new truncate calling sequence into fs/mm subsystems. Rather · 7b9e2af3
      Nick Piggin authored
      than setattr > vmtruncate > truncate, have filesystems call their truncate
      sequence from ->setattr if filesystem specific operations are required. 
      vmtruncate is deprecated, and truncate_pagecache and inode_newsize_ok
      helpers introduced previously should be used.
      
      simple_setattr is introduced for simple in-ram filesystems to implement
      the new truncate sequence.  Eventually all filesystems should be converted
      to implement a setattr, and the default code in notify_change should go
      away.
      
      simple_setsize is also introduced to perform just the ATTR_SIZE portion of
      simple_setattr (ie.  changing i_size and trimming pagecache).
      
      A new attribute is introduced into inode_operations structure;
      .new_truncate is a temporary hack to distinguish filesystems that
      implement the new truncate system.
      
      To implement the new truncate sequence:
      - set .new_truncate = 1
      - filesystem specific manipulations (eg freeing blocks) must be done in
        the setattr method rather than ->truncate.
      - vmtruncate can not be used by core code to trim blocks past i_size in
        the event of write failure after allocation, so this must be performed
        in the fs code.
      - make use of the better opportunity to catch errors with the above 2 changes.
      - inode_setattr should not be used. generic_setattr is a new function
        to be used to copy simple attributes into the generic inode.
      
      Big problem with the previous calling sequence: the filesystem is not
      called until i_size has already changed.  This means it is not allowed to
      fail the call, and also it does not know what the previous i_size was. 
      Also, generic code calling vmtruncate to truncate allocated blocks in case
      of error had no good way to return a meaningful error (or, for example,
      atomically handle block deallocation).
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7b9e2af3
    • Nick Piggin's avatar
      Update some fs code to make use of new helper functions introduced in the · 466c1857
      Nick Piggin authored
      previous patch.  Should be no significant change in behaviour (except CIFS
      now calls send_sig under i_lock, via inode_newsize_ok).
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Cc: <linux-nfs@vger.kernel.org>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: <linux-cifs-client@lists.samba.org>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      466c1857
  22. 20 Aug, 2009 1 commit
  23. 24 Aug, 2009 3 commits
    • Andrew Morton's avatar
      repair comment layout · 537c370f
      Andrew Morton authored
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      537c370f
    • Nick Piggin's avatar
      Invalidate sb->s_bdev on remount,ro. · 1bc98644
      Nick Piggin authored
      Fixes a problem reported by Jorge Boncompte who is seeing corruption
      trying to snapshot a minix filesystem image.  Some filesystems modify
      their metadata via a path other than the bdev buffer cache (eg.  they may
      use a private linear mapping for their metadata, or implement directories
      in pagecache, etc).  Also, file data modifications usually go to the bdev
      via their own mappings.
      
      These updates are not coherent with buffercache IO (eg.  via /dev/bdev)
      and never have been.  However there could be a reasonable expectation that
      after a mount -oremount,ro operation then the buffercache should
      subsequently be coherent with previous filesystem modifications.
      
      So invalidate the bdev mappings on a remount,ro operation to provide a
      coherency point.
      
      The problem was exposed when we switched the old rd to brd because old rd
      didn't really function like a normal block device and updates to rd via
      mappings other than the buffercache would still end up going into its
      buffercache.  But the same problem has always affected other "normal"
      block devices, including loop.
      Reported-by: default avatar"Jorge Boncompte [DTI2]" <jorge@dti2.net>
      Tested-by: default avatar"Jorge Boncompte [DTI2]" <jorge@dti2.net>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1bc98644
    • Nick Piggin's avatar
      Filesystems outside the regular namespace do not have to clear · 8fa31938
      Nick Piggin authored
      DCACHE_UNHASHED in order to have a working /proc/$pid/fd/XXX.  Nothing in
      proc prevents the fd link from being used if its dentry is not in the
      hash.
      
      Also, it does not get put into the dcache hash if DCACHE_UNHASHED is
      clear; that depends on the filesystem calling d_add or d_rehash.
      
      So delete the misleading comments and needless code.
      Acked-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8fa31938
  24. 19 Aug, 2009 1 commit
  25. 12 Aug, 2009 1 commit
    • Jeff Layton's avatar
      sb->s_maxbytes is supposed to indicate the maximum size of a file that can · 9600c605
      Jeff Layton authored
      exist on the filesystem.  It's declared as an unsigned long long.
      
      Even if a filesystem has no inherent limit that prevents it from using
      every bit in that unsigned long long, it's still problematic to set it to
      anything larger than MAX_LFS_FILESIZE.  There are places in the kernel
      that cast s_maxbytes to a signed value.  If it's set too large then this
      cast makes it a negative number and generally breaks the comparison.
      
      Change s_maxbytes to be loff_t instead.  That should help eliminate the
      temptation to set it too large by making it a signed value.
      
      Also, add a warning for couple of releases to help catch filesystems that
      set s_maxbytes too large.  Eventually we can either convert this to a
      BUG() or just remove it and in the hope that no one will get it wrong now
      that it's a signed value.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Robert Love <rlove@google.com>
      Cc: Mandeep Singh Baines <msb@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9600c605
  26. 17 Aug, 2009 1 commit