Commits · 6dad2ffb964afad0e04bb5737e035cd0c11046e2 · linux / linux-davinci

03 Jun, 2009 2 commits

This is yuk. It's just to amke Randy happy for now. · 6dad2ffb

Andrew Morton authored Jun 04, 2009

Cc: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6dad2ffb

unbust comment layout · cc1218d5

Andrew Morton authored Jun 04, 2009

Cc: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

cc1218d5

02 Jun, 2009 1 commit

I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O · a940f91a

Hisashi Hifumi authored Jun 02, 2009

is unpluged to improve throughput on especially RAID environment.

The normal case is, if page N become uptodate at time T(N), then T(N) <=
T(N+1) holds.  With RAID (and NFS to some degree), there is no strict
ordering, the data arrival time depends on runtime status of individual
disks, which breaks that formula.  So in do_generic_file_read(), just
after submitting the async readahead IO request, the current page may well
be uptodate, so the page won't be locked, and the block device won't be
implicitly unplugged:

               if (PageReadahead(page))
                        page_cache_async_readahead()
                if (!PageUptodate(page))
                                goto page_not_up_to_date;
                //...
page_not_up_to_date:
                lock_page_killable(page);

Therefore explicit unplugging can help.

Following is the test result with dd.

#dd if=testdir/testfile of=/dev/null bs=16384

-2.6.30-rc6
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s

-2.6.30-rc6-patched
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s

(7Disks RAID-0 Array)

-2.6.30-rc6
1054976+0 records in
1054976+0 records out
17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s

-2.6.30-rc6-patched
1054976+0 records out
17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s

(7Disks RAID-5 Array)
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

a940f91a

24 Aug, 2009 4 commits

When a cpuset's nodemask is updated, all attached tasks have their cached · d2f0ac7c

David Rientjes authored Aug 25, 2009

task->mems_allowed updated by a heap instead of requiring an explicit call
to cpuset_update_task_memory_state(), which has since been removed in
58568d2a ("cpuset,mm: update tasks'
mems_allowed in time").

Remove the obsoleted comment from the page allocator.

Cc: Paul Menage <menage@google.com>
Acked-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d2f0ac7c

disable_swap_token() doesn't take an argument. This fixes the · 6a1ef588

Johannes Weiner authored Aug 25, 2009

!CONFIG_SWAP dummy.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6a1ef588

Cc: Johannes Weiner <hannes@cmpxchg.org> · 5e7fde00

Andrew Morton authored Aug 25, 2009

Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

5e7fde00

Make use of the compiler's typechecking on !CONFIG_SWAP as well. · fd7c08d0

Johannes Weiner authored Aug 25, 2009

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fd7c08d0

20 Aug, 2009 1 commit

With 2.6.31 'crash' on x86_64 falls flat on its face as the '_end' symbol · 717c6305

Hannes Reinecke authored Aug 21, 2009

is missing from the System.map file.

The culprit is commit 091e52c3, which
moved the '_end' symbol into it's own section.  Apparently this causes
kallsyms to not reference it properly.

So either we'd need to revert part of the patch to not include _end in
it's own section.

Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

717c6305

08 Sep, 2009 1 commit

· 7e1ad39c

James Toy authored Sep 08, 2009

- add -mmN to EXTRAVERSION

- Add a marker to make the v4l build environment happier
Signed-off-by: Michael Krufky <mkrufky@m1k.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

7e1ad39c

19 Aug, 2009 1 commit

· 8051620d

James Toy authored Aug 19, 2009

The following commit make console open fails while booting:

	commit d966976924119acd35a431adbb95292082f73f8c
	Author: Alan Cox <alan@linux.intel.com>
	Date:   Tue Aug 11 10:23:05 2009 +1000

	tty: make the kref destructor occur asynchronously

Due to tty release routines runs in workqueue now, error like following
will be reported while booting:

INIT open /dev/console Input/output error

The reason is that now there's latency issue with closing, but when we
open a "closing not finished" tty, -EIO will be returned.

Fix it as alan's following suggestion:

Fun but its actually not a bug and the fix is wrong in itself as the port
may be closing but not yet being destructed, in which case it seems to do
the wrong thing.  Opening a tty that is closing (and could be closing for
long periods) is supposed to return -EIO.

I suspect a better way to deal with this and keep the old console timing
is to split tty->shutdown into two functions.

tty->shutdown() - called synchronously just before we dump the tty onto
the waitqueue for destruction

tty->cleanup() - called when the destructor runs.

We would then do the shutdown part which can occur in IRQ context fine,
before queueing the rest of the release (from tty->magic = 0 ...  the end)
to occur asynchronously

The USB update in -next would then need a call like

       if (tty->cleanup)
               tty->cleanup(tty);

at the top of the async function and the USB shutdown to be split between
shutdown and cleanup as the USB resource cleanup and final tidy cannot
occur synchronously as it needs to sleep.

In other words the logic becomes

       final kref put
               make object unfindable

       async
               clean it up
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Emmanuel Benisty <benisty.e@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

8051620d

13 Aug, 2009 1 commit

Also remove lots of unused irq_cpustat fields. · 493f054b

Christoph Hellwig authored Aug 13, 2009

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

493f054b

24 Jul, 2009 1 commit

Amerigo Wang authored Jul 24, 2009

xtensa_pipe() for xtensa.
Signed-off-by: WANG Cong <amwang@redhat.com>
Reviewed-by: Johannes Weiner <jw@emlix.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

1e0546c4

09 Sep, 2009 3 commits

The patch · 18a22227

Miklos Szeredi authored Sep 10, 2009

  "vfs: fix d_path() for unreachable paths"

generally changed d_path() to report unreachable paths with a special
prefix.  This has an effect on /proc/${PID}/maps as well for memory maps
set up with shmem_file_setup() or hugetlb_file_setup().  These functions
set up unlinked files under a kernel-private vfsmount.  Since this
vfsmount is unreachable from userspace, these maps will be reported with
the "(unreachable)" prefix.

This is undesirable, because it changes the kernel ABI and might break
applications for no good reason.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

18a22227

"vfs: fix d_path() for unreachable paths" prefixes unreachable paths · 6dc1faec

Miklos Szeredi authored Sep 09, 2009

with "(unreachable)" in the result of getcwd(2), /proc/*/mounts,
/proc/*/cwd, /proc/*/fd/*, etc...

Hugh Dickins reported that an old version of gnome-vfs-daemon crashes
because it finds an entry in /proc/mounts where the mountpoint is
unreachable.

This patch reverts /proc/mounts to the old behavior (or rather a less
crazy version of the old behavior).
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6dc1faec

Add two helpers that allow access to the seq_file's own buffer, but · d4dd167a

Miklos Szeredi authored Sep 09, 2009

hides the internal details of seq_files.

This allows easier implementation of special purpose filling
functions.  It also cleans up some existing functions which duplicated
the seq_file logic.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d4dd167a

05 Jun, 2009 1 commit

seq_path_root() is returning a return value of successful __d_path() · cee0e35d

Tetsuo Handa authored Jun 06, 2009

instead of returning a negative value when mangle_path() failed.

This is not a bug so far because nobody is using return value of
seq_path_root().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

cee0e35d

10 Sep, 2009 1 commit

John Johansen pointed out, that getcwd(2) will give a garbled result if a · 8d51b62b

Miklos Szeredi authored Sep 10, 2009

bind mount of a non-filesystem-root directory is detached:

   > mkdir /mnt/foo
   > mount --bind /etc /mnt/foo
   > cd /mnt/foo/skel
   > umount -l /mnt/foo
   > /bin/pwd
   etcskel

If it was the root of the filesystem which was detached, it will give a
saner looking result, but it still won't be a valid absolute path by which
the CWD can be reached (assuming the process's root is not also on the
detached mount).

A similar issue happens if the CWD is outside the process's root or in a
different namespace.  These problems are relevant to symlinks under
/proc/<pid>/ and /proc/<pid>/fd/ as well.

This patch addresses all these issues, by prefixing such unreachable paths
with "(unreachable)".  This isn't perfect since the returned path may
still be a valid _relative_ path, and applications may not check the
result of getcwd() for starting with a '/' before using it.

For this reason Andreas Gruenbacher thinks getcwd(2) should return ENOENT
in these cases, but that breaks /bin/pwd and bash in the above cases.
Reported-by: John Johansen <jjohansen@suse.de>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

8d51b62b

20 Aug, 2009 1 commit

Signed-off-by: Nick Piggin <npiggin@suse.de> · c774ca4d

Nick Piggin authored Aug 21, 2009

Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

c774ca4d

24 Aug, 2009 1 commit

Signed-off-by: Nick Piggin <npiggin@suse.de> · 42d163d0

Nick Piggin authored Aug 25, 2009

Acked-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

42d163d0

20 Aug, 2009 1 commit

Note I wasn't able to test jfs because the kernel wasn't mounting the · cfd8a4d6

Nick Piggin authored Aug 21, 2009

product of my mkfs.jfs for some reason.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

cfd8a4d6

24 Aug, 2009 1 commit

Signed-off-by: Nick Piggin <npiggin@suse.de> · 2bc602af

Nick Piggin authored Aug 25, 2009

Cc: Chris Mason <chris.mason@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

2bc602af

20 Aug, 2009 1 commit

Signed-off-by: Nick Piggin <npiggin@suse.de> · 3475ea42

Nick Piggin authored Aug 21, 2009

Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

3475ea42

24 Aug, 2009 1 commit

On Fri, Aug 21, 2009 at 04:06:59PM +0200, Jan Kara wrote: · 85a3602b

Nick Piggin authored Aug 25, 2009

> >   Hi,
> >
> > > I also have commented a possible bug in existing ext2 code, marked with XXX.
> >   Looks good, except:
> >
> > > +int ext2_setsize(struct inode *inode, loff_t newsize)
> >   This could be static.
> >
> > > @@ -1459,8 +1540,15 @@ int ext2_setattr(struct dentry *dentry,
> > >  		if (error)
> > >  			return error;
> > >  	}
> > > -	error = inode_setattr(inode, iattr);
> > > +	if (iattr->ia_valid & ATTR_SIZE) {
> > > +		error = ext2_setsize(inode, iattr->ia_size);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > > +	generic_setattr(inode, iattr);
> >   Here, we should store the error code I suppose...
>   Ah, I was confused. generic_setattr() returns void. But then remove
> the check !error from:
>   if (!error && (iattr->ia_valid & ATTR_MODE))
> which just follows the generic_setattr(). That's what made me think
> generic_setattr() returns something :)

Yep, good suggestion.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

85a3602b

20 Aug, 2009 3 commits

I also have commented a possible bug in existing ext2 code, marked with · 3cb5c15d

Nick Piggin authored Aug 21, 2009

XXX.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

3cb5c15d

Signed-off-by: Nick Piggin <npiggin@suse.de> · 1e829880

Nick Piggin authored Aug 21, 2009

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

1e829880

Convert simple filesystems: ramfs, configfs, sysfs to new truncate · f4b723e8

Nick Piggin authored Aug 21, 2009

sequence.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

f4b723e8

24 Aug, 2009 2 commits

Introduce a new truncate calling sequence into fs/mm subsystems. Rather · 7b9e2af3

Nick Piggin authored Aug 25, 2009

than setattr > vmtruncate > truncate, have filesystems call their truncate
sequence from ->setattr if filesystem specific operations are required. 
vmtruncate is deprecated, and truncate_pagecache and inode_newsize_ok
helpers introduced previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence.  Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion of
simple_setattr (ie.  changing i_size and trimming pagecache).

A new attribute is introduced into inode_operations structure;
.new_truncate is a temporary hack to distinguish filesystems that
implement the new truncate system.

To implement the new truncate sequence:
- set .new_truncate = 1
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- make use of the better opportunity to catch errors with the above 2 changes.
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.

Big problem with the previous calling sequence: the filesystem is not
called until i_size has already changed.  This means it is not allowed to
fail the call, and also it does not know what the previous i_size was. 
Also, generic code calling vmtruncate to truncate allocated blocks in case
of error had no good way to return a meaningful error (or, for example,
atomically handle block deallocation).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

7b9e2af3

Update some fs code to make use of new helper functions introduced in the · 466c1857

Nick Piggin authored Aug 25, 2009

previous patch.  Should be no significant change in behaviour (except CIFS
now calls send_sig under i_lock, via inode_newsize_ok).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: <linux-nfs@vger.kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: <linux-cifs-client@lists.samba.org>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

466c1857

20 Aug, 2009 1 commit

Introduce new truncate helpers truncate_pagecache and inode_newsize_ok. · 7f59bbe4

Nick Piggin authored Aug 21, 2009

vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and into
mm/truncate.c.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

7f59bbe4

24 Aug, 2009 3 commits

repair comment layout · 537c370f

Andrew Morton authored Aug 25, 2009

Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

537c370f

Invalidate sb->s_bdev on remount,ro. · 1bc98644

Nick Piggin authored Aug 25, 2009

Fixes a problem reported by Jorge Boncompte who is seeing corruption
trying to snapshot a minix filesystem image.  Some filesystems modify
their metadata via a path other than the bdev buffer cache (eg.  they may
use a private linear mapping for their metadata, or implement directories
in pagecache, etc).  Also, file data modifications usually go to the bdev
via their own mappings.

These updates are not coherent with buffercache IO (eg.  via /dev/bdev)
and never have been.  However there could be a reasonable expectation that
after a mount -oremount,ro operation then the buffercache should
subsequently be coherent with previous filesystem modifications.

So invalidate the bdev mappings on a remount,ro operation to provide a
coherency point.

The problem was exposed when we switched the old rd to brd because old rd
didn't really function like a normal block device and updates to rd via
mappings other than the buffercache would still end up going into its
buffercache.  But the same problem has always affected other "normal"
block devices, including loop.
Reported-by: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Tested-by: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

1bc98644

Filesystems outside the regular namespace do not have to clear · 8fa31938

Nick Piggin authored Aug 25, 2009

DCACHE_UNHASHED in order to have a working /proc/$pid/fd/XXX.  Nothing in
proc prevents the fd link from being used if its dentry is not in the
hash.

Also, it does not get put into the dcache hash if DCACHE_UNHASHED is
clear; that depends on the filesystem calling d_add or d_rehash.

So delete the misleading comments and needless code.
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

8fa31938

19 Aug, 2009 1 commit

As Johannes Weiner pointed out, one of the range checks in do_sendfile · 9de0f478

Jeff Layton authored Aug 19, 2009

is redundant and is already checked in rw_verify_area.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

9de0f478

12 Aug, 2009 1 commit

sb->s_maxbytes is supposed to indicate the maximum size of a file that can · 9600c605

Jeff Layton authored Aug 12, 2009

exist on the filesystem.  It's declared as an unsigned long long.

Even if a filesystem has no inherent limit that prevents it from using
every bit in that unsigned long long, it's still problematic to set it to
anything larger than MAX_LFS_FILESIZE.  There are places in the kernel
that cast s_maxbytes to a signed value.  If it's set too large then this
cast makes it a negative number and generally breaks the comparison.

Change s_maxbytes to be loff_t instead.  That should help eliminate the
temptation to set it too large by making it a signed value.

Also, add a warning for couple of releases to help catch filesystems that
set s_maxbytes too large.  Eventually we can either convert this to a
BUG() or just remove it and in the hope that no one will get it wrong now
that it's a signed value.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

9600c605

17 Aug, 2009 1 commit

If fiemap_check_ranges is passed a large enough value, then it's · 7058be41

Jeff Layton authored Aug 18, 2009

possible that the value would be cast to a signed value for comparison
against s_maxbytes when we change it to loff_t. Make sure that doesn't
happen by explicitly casting s_maxbytes to an unsigned value for the
purposes of comparison.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Love <rlove@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

7058be41

24 Aug, 2009 1 commit

> ============================================= · 90f4f90a

Roland Dreier authored Aug 25, 2009

 >  [ INFO: possible recursive locking detected ]
 >  2.6.31-2-generic #14~rbd3
 >  ---------------------------------------------
 >  firefox-3.5/4162 is trying to acquire lock:
 >   (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >
 >  but task is already holding lock:
 >   (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >
 >  other info that might help us debug this:
 >  3 locks held by firefox-3.5/4162:
 >   #0:  (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >   #1:  (&sb->s_type->i_mutex_key#11/1){+.+.+.}, at: [<ffffffff81139d5a>] lock_rename+0x6a/0xf0
 >   #2:  (&sb->s_type->i_mutex_key#11/2){+.+.+.}, at: [<ffffffff81139d6f>] lock_rename+0x7f/0xf0
 >
 >  stack backtrace:
 >  Pid: 4162, comm: firefox-3.5 Tainted: G         C 2.6.31-2-generic #14~rbd3
 >  Call Trace:
 >   [<ffffffff8108ae74>] print_deadlock_bug+0xf4/0x100
 >   [<ffffffff8108ce26>] validate_chain+0x4c6/0x750
 >   [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430
 >   [<ffffffff8108d585>] lock_acquire+0xa5/0x150
 >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
 >   [<ffffffff815526ad>] __mutex_lock_common+0x4d/0x3d0
 >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
 >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
 >   [<ffffffff8120eaf9>] ? ecryptfs_rename+0x99/0x170
 >   [<ffffffff81552b36>] mutex_lock_nested+0x46/0x60
 >   [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >   [<ffffffff8120eb2a>] ecryptfs_rename+0xca/0x170
 >   [<ffffffff81139a9e>] vfs_rename_dir+0x13e/0x160
 >   [<ffffffff8113ac7e>] vfs_rename+0xee/0x290
 >   [<ffffffff8113c212>] ? __lookup_hash+0x102/0x160
 >   [<ffffffff8113d512>] sys_renameat+0x252/0x280
 >   [<ffffffff81133eb4>] ? cp_new_stat+0xe4/0x100
 >   [<ffffffff8101316a>] ? sysret_check+0x2e/0x69
 >   [<ffffffff8108c34d>] ? trace_hardirqs_on_caller+0x14d/0x190
 >   [<ffffffff8113d55b>] sys_rename+0x1b/0x20
 >   [<ffffffff81013132>] system_call_fastpath+0x16/0x1b

The trace above is totally reproducible by doing a cross-directory
rename on an ecryptfs directory.

The issue seems to be that sys_renameat() does lock_rename() then calls
into the filesystem; if the filesystem is ecryptfs, then
ecryptfs_rename() again does lock_rename() on the lower filesystem, and
lockdep can't tell that the two s_vfs_rename_mutexes are different.  It
seems an annotation like the following is sufficient to fix this (it
does get rid of the lockdep trace in my simple tests); however I would
like to make sure I'm not misunderstanding the locking, hence the CC
list...
Signed-off-by: Roland Dreier <rdreier@cisco.com>
Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

90f4f90a

03 Sep, 2009 1 commit

Recent mmotms give me WARNING: at fs/namespace.c:612 mntput_no_expire()+... · b14c81eb

Hugh Dickins authored Sep 03, 2009

when unmounting: __mntput()'s WARN_ON(count_mnt_writers(mnt)).

That's because vfs-optimize-touch_time-too.patch inverted the sense
of mnt_want_write_file(), which is error-returning, not a boolean.

Presumably filetime updates went missing too, but I didn't notice those.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Valerie Aurora <vaurora@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b14c81eb

24 Aug, 2009 2 commits

Andi Kleen authored Aug 25, 2009

mnt_get_write is relatively costly, so try all avenues to avoid it first.

This patch is careful to still only update inode fields inside the lock
region.

This didn't show up in benchmarks, but it's easy enough to do.

[akpm@linux-foundation.org: fix typo in comment]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Valerie Aurora <vaurora@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

4ed80a09

Some benchmark testing shows touch_atime to be high up in profile logs for · fd50e7ec

Andi Kleen authored Aug 25, 2009

IO intensive workloads.  Most likely that's due to the lock in
mnt_want_write().  Unfortunately touch_atime first takes the lock, and
then does all the other tests that could avoid atime updates (like noatime
or relatime).

Do it the other way round -- first try to avoid the update and only then
if that didn't succeed take the lock.  That works because none of the
atime avoidance tests rely on locking.

This also eliminates a goto.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Valerie Aurora <vaurora@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

fd50e7ec

22 Aug, 2009 1 commit

On 2009/6/17 Ingo Molnar <mingo@elte.hu> reported: · 95096a22

Vegard Nossum authored Aug 22, 2009

>
> btw., here's an old friend of a warning:
>
> async_continuing @ 1 after 0 usec
> WARNING: kmemcheck: Caught 8-bit read from freed memory (f5f33004)
> 0040f3f57400686f74706c756700000000000000000000000000000000000000
>  i i i i f f f f f f f f f f f f f f f f f f f f f f f f f f f f
>          ^
>
> Pid: 1, comm: swapper Not tainted (2.6.30-tip-04303-g5ada65e-dirty #767) P4DC6
> EIP: 0060:[<c1248df4>] EFLAGS: 00010246 CPU: 0
> EIP is at exact_copy_from_user+0x64/0x130
> EAX: 00000000 EBX: 00000001 ECX: 000000f5 EDX: 000000f5
> ESI: f5fdeffb EDI: f5f33004 EBP: f6c48ee8 ESP: c29598cc
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> CR0: 8005003b CR2: f6c20044 CR3: 0294d000 CR4: 000006d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff4ff0 DR7: 00000400
>  [<c124916a>] copy_mount_options+0xba/0x1c0
>  [<c124dc0a>] sys_mount+0x1a/0x170
>  [<c263c937>] do_mount_root+0x27/0xe0
>  [<c263ca33>] mount_block_root+0x43/0x140
>  [<c263cc02>] mount_root+0xd2/0x160
>  [<c263ce49>] prepare_namespace+0x1b9/0x380
>  [<c263c4c8>] kernel_init+0xb8/0x110
>  [<c103ab13>] kernel_thread_helper+0x7/0x14
>  [<ffffffff>] 0xffffffff
> EXT3-fs: INFO: recovery required on readonly filesystem.
> EXT3-fs: write access will be enabled during recovery.

sys_mount() reads/copies a whole page for its "type" parameter.  When
do_mount_root() passes a kernel address that points to an object which is
smaller than a whole page, copy_mount_options() will happily go past this
memory object, possibly dereferencing "wild" pointers that could be in any
state (hence the kmemcheck warning, which shows that parts of the next
page are not even allocated).

(The likelihood of something going wrong here is pretty low -- first of
all this only applies to kernel calls to sys_mount(), which are mostly
found in the boot code.  Secondly, I guess if the page was not mapped,
exact_copy_from_user() _would_ in fact handle it correctly because of its
access_ok(), etc.  checks.)

But it is much nicer to avoid the dubious reads altogether, by stopping as
soon as we find a NUL byte.  Is there a good reason why we can't do
something like this, using the already existing strndup_from_user()?

[akpm@linux-foundation.org: make copy_mount_string() static]
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

95096a22