Commits · 6c431a1332aa2c6b627c583de5a9db635d5431c9 · linux / linux-davinci

12 Oct, 2009 4 commits

On ia64, the following test program exit abnormally, because glibc thread · 6c431a13

KOSAKI Motohiro authored Oct 13, 2009

library called abort().

 ========================================================
 (gdb) bt
 #0  0xa000000000010620 in __kernel_syscall_via_break ()
 #1  0x20000000003208e0 in raise () from /lib/libc.so.6.1
 #2  0x2000000000324090 in abort () from /lib/libc.so.6.1
 #3  0x200000000027c3e0 in __deallocate_stack () from /lib/libpthread.so.0
 #4  0x200000000027f7c0 in start_thread () from /lib/libpthread.so.0
 #5  0x200000000047ef60 in __clone2 () from /lib/libc.so.6.1
 ========================================================

The fact is, glibc call munmap() when thread exitng time for freeing
stack, and it assume munlock() never fail.  However, munmap() often make
vma splitting and it with many mapcount make -ENOMEM.

Oh well, that's crazy, because stack unmapping never increase mapcount. 
The maxcount exceeding is only temporary.  internal temporary exceeding
shouldn't make ENOMEM.

This patch does it.

 test_max_mapcount.c
 ==================================================================
  #include<stdio.h>
  #include<stdlib.h>
  #include<string.h>
  #include<pthread.h>
  #include<errno.h>
  #include<unistd.h>

  #define THREAD_NUM 30000
  #define MAL_SIZE (8*1024*1024)

 void *wait_thread(void *args)
 {
 	void *addr;

 	addr = malloc(MAL_SIZE);
 	sleep(10);

 	return NULL;
 }

 void *wait_thread2(void *args)
 {
 	sleep(60);

 	return NULL;
 }

 int main(int argc, char *argv[])
 {
 	int i;
 	pthread_t thread[THREAD_NUM], th;
 	int ret, count = 0;
 	pthread_attr_t attr;

 	ret = pthread_attr_init(&attr);
 	if(ret) {
 		perror("pthread_attr_init");
 	}

 	ret = pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
 	if(ret) {
 		perror("pthread_attr_setdetachstate");
 	}

 	for (i = 0; i < THREAD_NUM; i++) {
 		ret = pthread_create(&th, &attr, wait_thread, NULL);
 		if(ret) {
 			fprintf(stderr, "[%d] ", count);
 			perror("pthread_create");
 		} else {
 			printf("[%d] create OK.\n", count);
 		}
 		count++;

 		ret = pthread_create(&thread[i], &attr, wait_thread2, NULL);
 		if(ret) {
 			fprintf(stderr, "[%d] ", count);
 			perror("pthread_create");
 		} else {
 			printf("[%d] create OK.\n", count);
 		}
 		count++;
 	}

 	sleep(3600);
 	return 0;
 }
 ==================================================================
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6c431a13

If not signed, testing of the read() return value in this function · e7ff8c38

Roel Kluin authored Oct 13, 2009

will not work.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

e7ff8c38

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> · 28a5fc7f

Tommi Rantala authored Oct 13, 2009

Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

28a5fc7f

Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> · dbd6585d

Tommi Rantala authored Oct 13, 2009

Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

dbd6585d

24 Aug, 2009 1 commit

When a page is freed with the PG_mlocked set, it is considered an · 68e4838b

Mel Gorman authored Aug 25, 2009

unexpected but recoverable situation.  A counter records how often this
event happens but it is easy to miss that this event has occured at
all.  This patch warns once when PG_mlocked is set to prompt debuggers
to check the counter to see how often it is happening.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

68e4838b

22 Sep, 2009 1 commit

I added blk_run_backing_dev on page_cache_async_readahead so readahead I/O · 1a9aa809

Hisashi Hifumi authored Sep 22, 2009

is unpluged to improve throughput on especially RAID environment.

The normal case is, if page N become uptodate at time T(N), then T(N) <=
T(N+1) holds.  With RAID (and NFS to some degree), there is no strict
ordering, the data arrival time depends on runtime status of individual
disks, which breaks that formula.  So in do_generic_file_read(), just
after submitting the async readahead IO request, the current page may well
be uptodate, so the page won't be locked, and the block device won't be
implicitly unplugged:

               if (PageReadahead(page))
                        page_cache_async_readahead()
                if (!PageUptodate(page))
                                goto page_not_up_to_date;
                //...
page_not_up_to_date:
                lock_page_killable(page);

Therefore explicit unplugging can help.

Following is the test result with dd.

#dd if=testdir/testfile of=/dev/null bs=16384

-2.6.30-rc6
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 224.182 seconds, 76.6 MB/s

-2.6.30-rc6-patched
1048576+0 records in
1048576+0 records out
17179869184 bytes (17 GB) copied, 206.465 seconds, 83.2 MB/s

(7Disks RAID-0 Array)

-2.6.30-rc6
1054976+0 records in
1054976+0 records out
17284726784 bytes (17 GB) copied, 212.233 seconds, 81.4 MB/s

-2.6.30-rc6-patched
1054976+0 records out
17284726784 bytes (17 GB) copied, 198.878 seconds, 86.9 MB/s

(7Disks RAID-5 Array)

The patch was found to improve performance with the SCST scsi target
driver.  See
http://sourceforge.net/mailarchive/forum.php?thread_name=a0272b440906030714g67eabc5k8f847fb1e538cc62%40mail.gmail.com&forum_name=scst-devel

[akpm@linux-foundation.org: unbust comment layout]
[akpm@linux-foundation.org: "fix" CONFIG_BLOCK=n]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Ronald <intercommit@gmail.com>
Cc: Bart Van Assche <bart.vanassche@gmail.com>
Cc: Vladislav Bolkhovitin <vst@vlnb.net>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

1a9aa809

09 Oct, 2009 1 commit

The oom killer header, including information such as the allocation order · 934b162c

David Rientjes authored Oct 09, 2009

and gfp mask, current's cpuset and memory controller, call trace, and VM
state information is currently only shown when the oom killer has selected
a task to kill.

This information is omitted, however, when the oom killer panics either
because of panic_on_oom sysctl settings or when no killable task was
found.  It is still relevant to know crucial pieces of information such as
the allocation order and VM state when diagnosing such issues, especially
at boot.

This patch displays the oom killer header whenever it panics so that bug
reports can include pertinent information to debug the issue, if possible.
Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

934b162c

28 Oct, 2009 2 commits

Fix soft-lockup in hso.c which is triggered on SMP machine when · 8b3600c7

Antti Kaijanmki authored Oct 29, 2009

modem is removed while file descriptor(s) under /dev are still open:

  old version called kref_put() too early which resulted in destroying
  hso_serial and hso_device objects which were still used later on.
Signed-off-by: Antti Kaijanmki <antti.kaijanmaki@nomovok.com>
Cc: Greg KH <greg@kroah.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

8b3600c7

Signed-off-by: Antti Kaijanmki <antti.kaijanmaki@nomovok.com> · 6990c0b4
Antti Kaijanmki authored Oct 29, 2009
```
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
```
6990c0b4

14 Oct, 2009 1 commit

- fixed shared interrupt bug reported by Vadim Lobanov · 941b637c

Thomas Dahlmann authored Oct 14, 2009

 - fixed possible warning oops on driver unload when connected
 - prevent interrupt flood in PIO mode ("modprobe amd5536udc use_dma=0")
   when using gadget ether
Signed-off-by: Thomas Dahlmann <dahlmann.thomas@arcor.de>
Cc: Robert Richter <robert.richter@amd.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Greg KH <greg@kroah.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

941b637c

06 Oct, 2009 1 commit

We have already new_timer initialized to all-zeros hence in function · 48158447

Stanislaw Gruszka authored Oct 07, 2009

initializations are not needed. Document function expectation about
new_timer argument as well.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

48158447

30 Sep, 2009 1 commit

incr_error and error fields of struct cpu_itimer are used when calculating · f1af83cf

Stanislaw Gruszka authored Oct 01, 2009

next timer tick in check_cpu_itimers() and should not be modified without
tsk->sighand->siglock taken.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

f1af83cf

24 Sep, 2009 1 commit

Recent hrtimer code will set the start info to a hrtimer only when that · 68ad718d

Feng Tang authored Sep 25, 2009

flag is set, then the start info of all hrtimers will always be
uninitialised before a "echo 1 > /proc/timer_stats", thus the
/proc/timer_lists will have something like:

active timers:
 #0: <c27d46b0>, tick_sched_timer, S:01, <(null)>, /-1
 # expires at 91062000000-91062000000 nsecs [in 156071 to 156071 nsecs]
 #1: <efb81b6c>, hrtimer_wakeup, S:01, <(null)>, /-1
 # expires at 91062300331-91062350331 nsecs [in 456402 to 506402 nsecs]
 #2: <efac9b6c>, hrtimer_wakeup, S:01, <(null)>, /-1
 # expires at 91068699811-91068749811 nsecs [in 6855882 to 6905882 nsecs]
 #3: <efacdb6c>, hrtimer_wakeup, S:01, <(null)>, /-1
 # expires at 91068755511-91068805511 nsecs [in 6911582 to 6961582 nsecs]
 #4: <efa95b6c>, hrtimer_wakeup, S:01, <(null)>, /-1
 # expires at 91068806066-91068856066 nsecs [in 6962137 to 7012137 nsecs]
 .....

This patch fixes it.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

68ad718d

16 Oct, 2009 1 commit

Ignore the address parameter in the various file_mmap() security checks · a48adeab

David Howells authored Oct 16, 2009

when CONFIG_MMU=n as the address hint is ignored under those
circumstances, and in any case the minimum mapping address check is
pointless in NOMMU mode.
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: Graff Yang <graf.yang@analog.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

a48adeab

24 Sep, 2009 1 commit

Driver scsi_lib.c might sleep in atomic context, because it calls · c1b57296

Alexander Strakh authored Sep 25, 2009

scsi_device_put under spin_lock_irqsave.

drivers/scsi/scsi_lib.c:356:
	spin_lock_irqsave(shost->host_lock, flags);
	scsi_device_put(sdev);
Path to might_sleep macro from scsi_device_put:
1. scsi_device_put calls put_device at ./drivers/scsi/scsi.c:1111
2. put_device calls kobject_put at ./drivers/base/core.c:1038
3. kobject_put calls kref_put at ./lib/kobject.c
4. kref_put may call callback function kobject_release at ./lib/kref.c if
refcount becomes zero, which might_sleep because it calls user event. Details:
	4.1 kobject_cleanup calls kobject_uevent at ./lib/kobject.c:555
	4.2 kobject_uevent calls kobject_uevent_env at  ./lib/kobject_uevent.c:282
	4.3 kobject_uevent_env calls call_usermodehelper_exec at
./include/linux/kmod.h:83
	4.4 call_usermodehelper_exec calls wait_for_completion at
./kernel/kmod.c:481
	4.5 wait_for_completion calls wait_for_common at ./kernel/sched.c:5710
	4.5 wait_for_common calls might_sleep at ./kernels/sched.c:5692

Found by Linux Driver Verification project.

Delete wrong sleeping function calls.
Signed-off-by: Alexander Strakh <strakh@ispras.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

c1b57296

25 Sep, 2009 1 commit

With 2.6.31 'crash' on x86_64 falls flat on its face as the '_end' symbol · b5f4b28b

Hannes Reinecke authored Sep 25, 2009

is missing from the System.map file.

The culprit is commit 091e52c3, which
moved the '_end' symbol into it's own section.  Apparently this causes
kallsyms to not reference it properly.

So either we'd need to revert part of the patch to not include _end in
it's own section.

Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b5f4b28b

06 Oct, 2009 1 commit

Some architectures compute ->vm_page_prot depending on ->vm_flags, so we · 28213aca

Jeremy Fitzhardinge authored Oct 06, 2009

need to update the protections after adjusting the flags.
Reported-by: Jan Beulich <JBeulich@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

28213aca

01 Oct, 2009 1 commit

drivers/gpu/drm/i915/i915_dma.c: In function 'i915_driver_load': · 97a5a635

Andrew Morton authored Oct 01, 2009

drivers/gpu/drm/i915/i915_dma.c:1114: warning: 'll_base' may be used uninitialized in this function

Partly this is because gcc isn't smart enough.  But `ll_base' does get used
uninitialised in the DRM_DEBUG() call.

Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

97a5a635

29 Sep, 2009 1 commit

There is no need to perform full BIDIR sync (copying the buffers in case · bcdc415a

Krzysztof Halasa authored Sep 30, 2009

of swiotlb and similar schemes) if we know that the owner (CPU or device)
hasn't altered the data.

Addresses the false-positive reported at
http://bugzilla.kernel.org/show_bug.cgi?id=14169Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Cc: David Miller <davem@davemloft.net>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

bcdc415a

13 Oct, 2009 3 commits

It looks like commit ("cpumask: · e9a0d8f0

Dave Mueller authored Oct 13, 2009

avoid playing with cpus_allowed in speedstep-ich.c") broke the
speedstep-ich driver.

The problem seems to be that speedstep-lib.c:speedstep_get_frequency() is
called with a wrong value as "processor" parameter by the code below,
resulting in a return value of 0.  The "processor" parameter should be the
value returned by "speedstep_detect_processor()"

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14340

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dominik Brodowski <linux@brodo.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Eric Pielbug <e.a.b.piel@tudelft.nl>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

e9a0d8f0

This is for consistency with various ioctl() operations that include the · 822a523d

Peter Zijlstra authored Oct 13, 2009

suffix "PGRP" in their names, and also for consistency with PRIO_PGRP,
used with setpriority() and getpriority().  Also, using PGRP instead of
GID avoids confusion with the common abbreviation of "group ID".

I'm fine with anything that makes it more consistent, and if PGRP is what
is the predominant abbreviation then I see no need to further confuse
matters by adding a third one.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

822a523d

I noticed that rtc wont generate interrupts after a resume from disk. · 3eca9d98

Maxim Levitsky authored Oct 14, 2009

Here hpet rtc emulation is used.

Problem is that rtc hpet comparator, isn't reinitialized after resume.
Easiest way to solve this, is always mask all hpet interrupts on suspend
This is triggered, when suspending with alarm set.


Otherwise, hpet driver will think it doesn't need to reinitialize
the rtc comparator, thus rtc interrupts won't work.

This emulation isn't need for wakealarm.
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

3eca9d98

31 Oct, 2009 1 commit

Because of an integer overflow on start_blk, various kind of wrong results · 0736e0ee

Mike Hommey authored Oct 31, 2009

would be returned by the generic_block_fiemap() handler, such as no
extents when there is a 4GB+ hole at the beginning of the file, or wrong
fe_logical when an extent starts after the first 4GB.
Signed-off-by: Mike Hommey <mh@glandium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@sgi.com>
Cc: Josef Bacik <jbacik@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

0736e0ee

16 Oct, 2009 1 commit

· defe24ae

james toy authored Oct 16, 2009

- add -mmN to EXTRAVERSION

- Add a marker to make the v4l build environment happier
Signed-off-by: Michael Krufky <mkrufky@m1k.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

defe24ae

29 Oct, 2009 1 commit

__pcpu_ptr_to_addr() can be overridden by the architecture and might not · 83c11588

Andrew Morton authored Oct 29, 2009

behave well if passed a NULL pointer.  So avoid calling it until we have
verified that its arg is not NULL.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

83c11588

30 Sep, 2009 1 commit

fix the following 'make includecheck' warnings: · db014b1d

Jaswinder Singh Rajput authored Sep 30, 2009

  arch/xtensa/kernel/vectors.S: asm/processor.h is included more than once.
  arch/xtensa/kernel/vectors.S: asm/ptrace.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

db014b1d

13 Aug, 2009 1 commit

Also remove lots of unused irq_cpustat fields. · 1f175c85

Christoph Hellwig authored Aug 13, 2009

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

1f175c85

24 Jul, 2009 1 commit

Amerigo Wang authored Jul 24, 2009

xtensa_pipe() for xtensa.
Signed-off-by: WANG Cong <amwang@redhat.com>
Reviewed-by: Johannes Weiner <jw@emlix.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b41aab29

30 Sep, 2009 2 commits

Get rid of the goto by flipping the if (!result) over. Make the comments · b5f53b38

Sage Weil authored Oct 01, 2009

a bit more descriptive.  Fix a few kernel style problems.  No functional
changes.

Cc: Ian Kent <raven@themaw.net>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Signed-off-by: Yehuda Sadeh <yehuda@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b5f53b38

real_lookup() is called by do_lookup() if dentry revalidation fails. If · 6220eef6

Sage Weil authored Oct 01, 2009

the cache is re-populated while waiting for i_mutex, it may find that a
d_lookup() subsequently succeeds (see the "Uhhuh!  Nasty case" comment).

Previously, real_lookup() would drop i_mutex and do_revalidate() again. 
If revalidate failed _again_, however, it would give up with -ENOENT.  The
problem here that network file systems may be invalidating dentries via
server callbacks, e.g.  due to concurrent access from another client, and
-ENOENT is frequently the wrong answer.

This problem has been seen with both Lustre and Ceph.  It seems possible
to hit this case with NFS as well if the cache lifetime is very short.

Instead, we should do_revalidate() while i_mutex is still held.  If
revalidation fails, we can move on to a ->lookup() and ensure a correct
result without worrying about any subsequent races.

Note that do_revalidate() is called with i_mutex held elsewhere.  For
example, do_filp_open(), lookup_create(), do_unlinkat(), do_rmdir(), and
possibly others all take the directory i_mutex, and then

-> lookup_hash
        -> __lookup_hash
                -> cached_lookup
                        -> do_revalidate

so this does not introduce any new locking rules for d_revalidate
implementations.

Yes, the goto is ugly.  A cleanup patch follows.

Cc: Ian Kent <raven@themaw.net>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Signed-off-by: Yehuda Sadeh <yehuda@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

6220eef6

24 Sep, 2009 2 commits

Invalidate sb->s_bdev on remount,ro. · d7dde392

Nick Piggin authored Sep 24, 2009

Fixes a problem reported by Jorge Boncompte who is seeing corruption
trying to snapshot a minix filesystem image.  Some filesystems modify
their metadata via a path other than the bdev buffer cache (eg.  they may
use a private linear mapping for their metadata, or implement directories
in pagecache, etc).  Also, file data modifications usually go to the bdev
via their own mappings.

These updates are not coherent with buffercache IO (eg.  via /dev/bdev)
and never have been.  However there could be a reasonable expectation that
after a mount -oremount,ro operation then the buffercache should
subsequently be coherent with previous filesystem modifications.

So invalidate the bdev mappings on a remount,ro operation to provide a
coherency point.

The problem was exposed when we switched the old rd to brd because old rd
didn't really function like a normal block device and updates to rd via
mappings other than the buffercache would still end up going into its
buffercache.  But the same problem has always affected other "normal"
block devices, including loop.

[akpm@linux-foundation.org: repair comment layout]
Reported-by: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Tested-by: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

d7dde392

Filesystems outside the regular namespace do not have to clear · 4bca9bd4

Nick Piggin authored Sep 24, 2009

DCACHE_UNHASHED in order to have a working /proc/$pid/fd/XXX.  Nothing in
proc prevents the fd link from being used if its dentry is not in the
hash.

Also, it does not get put into the dcache hash if DCACHE_UNHASHED is
clear; that depends on the filesystem calling d_add or d_rehash.

So delete the misleading comments and needless code.
Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

4bca9bd4

25 Sep, 2009 1 commit

> ============================================= · a470a30a

Roland Dreier authored Sep 25, 2009

 >  [ INFO: possible recursive locking detected ]
 >  2.6.31-2-generic #14~rbd3
 >  ---------------------------------------------
 >  firefox-3.5/4162 is trying to acquire lock:
 >   (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >
 >  but task is already holding lock:
 >   (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >
 >  other info that might help us debug this:
 >  3 locks held by firefox-3.5/4162:
 >   #0:  (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >   #1:  (&sb->s_type->i_mutex_key#11/1){+.+.+.}, at: [<ffffffff81139d5a>] lock_rename+0x6a/0xf0
 >   #2:  (&sb->s_type->i_mutex_key#11/2){+.+.+.}, at: [<ffffffff81139d6f>] lock_rename+0x7f/0xf0
 >
 >  stack backtrace:
 >  Pid: 4162, comm: firefox-3.5 Tainted: G         C 2.6.31-2-generic #14~rbd3
 >  Call Trace:
 >   [<ffffffff8108ae74>] print_deadlock_bug+0xf4/0x100
 >   [<ffffffff8108ce26>] validate_chain+0x4c6/0x750
 >   [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430
 >   [<ffffffff8108d585>] lock_acquire+0xa5/0x150
 >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
 >   [<ffffffff815526ad>] __mutex_lock_common+0x4d/0x3d0
 >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
 >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
 >   [<ffffffff8120eaf9>] ? ecryptfs_rename+0x99/0x170
 >   [<ffffffff81552b36>] mutex_lock_nested+0x46/0x60
 >   [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >   [<ffffffff8120eb2a>] ecryptfs_rename+0xca/0x170
 >   [<ffffffff81139a9e>] vfs_rename_dir+0x13e/0x160
 >   [<ffffffff8113ac7e>] vfs_rename+0xee/0x290
 >   [<ffffffff8113c212>] ? __lookup_hash+0x102/0x160
 >   [<ffffffff8113d512>] sys_renameat+0x252/0x280
 >   [<ffffffff81133eb4>] ? cp_new_stat+0xe4/0x100
 >   [<ffffffff8101316a>] ? sysret_check+0x2e/0x69
 >   [<ffffffff8108c34d>] ? trace_hardirqs_on_caller+0x14d/0x190
 >   [<ffffffff8113d55b>] sys_rename+0x1b/0x20
 >   [<ffffffff81013132>] system_call_fastpath+0x16/0x1b

The trace above is totally reproducible by doing a cross-directory
rename on an ecryptfs directory.

The issue seems to be that sys_renameat() does lock_rename() then calls
into the filesystem; if the filesystem is ecryptfs, then
ecryptfs_rename() again does lock_rename() on the lower filesystem, and
lockdep can't tell that the two s_vfs_rename_mutexes are different.  It
seems an annotation like the following is sufficient to fix this (it
does get rid of the lockdep trace in my simple tests); however I would
like to make sure I'm not misunderstanding the locking, hence the CC
list...
Signed-off-by: Roland Dreier <rdreier@cisco.com>
Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

a470a30a

20 Apr, 2009 1 commit

Improve the description of fget_light(), which is currently incorrect · b93e1046

Tony Battersby authored Apr 20, 2009

about needing a prior refcnt (judging by the way it is actually used).
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

b93e1046

24 Aug, 2009 1 commit

RAW_SETBIND and RAW_GETBIND 32bit versions are fscked in interesting ways. · cb8c8dac

Al Viro authored Aug 25, 2009

1) fs/compat_ioctl.c has COMPATIBLE_IOCTL(RAW_SETBIND) followed by
HANDLE_IOCTL(RAW_SETBIND, raw_ioctl).  The latter is ignored.

2) on amd64 (and itanic) the damn thing is broken - we have int + u64 + u64
and layouts on i386 and amd64 are _not_ the same.  raw_ioctl() would
work there, but it's never called due to (1).  As it is, i386 /sbin/raw
definitely doesn't work on amd64 boxen.

3) switching to raw_ioctl() as is would *not* work on e.g. sparc64 and ppc64,
which would be rather sad, seeing that normal userland there is 32bit.
The thing is, slapping __packed on the struct in question does not DTRT -
it eliminates *all* padding.  The real solution is to use compat_u64.

4) of course, all that stuff has no business being outside of raw.c in the
first place - there should be ->compat_ioctl() for /dev/rawctl instead of
messing with compat_ioctl.c.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

cb8c8dac

25 Sep, 2009 1 commit

vfs_rename_dir() doesn't properly account for filesystems with · 07431498

Miklos Szeredi authored Sep 25, 2009

FS_RENAME_DOES_D_MOVE.  If new_dentry has a target inode attached, it
unhashes the new_dentry prior to the rename() iop and rehashes it after,
but doesn't account for the possibility that rename() may have swapped
{old,new}_dentry.  For FS_RENAME_DOES_D_MOVE filesystems, it rehashes
new_dentry (now the old renamed-from name, which d_move() expected to go
away), such that a subsequent lookup will find it.

This was caught by the recently posted POSIX fstest suite, rename/10.t
test 62 (and others) on ceph.

The bug was introduced by: commit 349457cc
"[PATCH] Allow file systems to manually d_move() inside of ->rename()"

Fix by not rehashing the new dentry.  Rehashing used to be needed by
d_move() but isn't anymore.
Reported-by: Sage Weil <sage@newdream.net>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

07431498

06 Oct, 2009 1 commit

DAC960_LP_Controller and DAC960_V2_Controller have the same value, but · e5d56cc5

Julia Lawall authored Oct 06, 2009

elsewhere it is DAC960_V1_Controller or DAC960_V2_Controller that is used
in the FirmwareType field.
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

e5d56cc5

13 Oct, 2009 1 commit

Fix use of unallocated memory for MSA2xxx enclosure device data. If you · 99e1764b

Stephen M. Cameron authored Oct 13, 2009

happened to have fewer physical devices reported by CCISS_REPORT_LUNS than
the total number of MSA2012 enclosures (unlikely), the data for some
enclosure(s) would get stored into, or cause other device data to be
stored into unallocated territory.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mike Miller <mikem@beardog.cce.hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

99e1764b

09 Oct, 2009 1 commit

avoid helpful cleanup patches. · 71cc4d03

Andrew Morton authored Oct 09, 2009

Cc: "Stephen M. Cameron" <scameron@beardog.cce.hp.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mike Miller <mikem@beardog.cce.hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

71cc4d03

10 Oct, 2009 1 commit

Fix hpsa_allow_any test for vendor ID. · 4ddfe74e

Stephen M. Cameron authored Oct 10, 2009

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Cc: Mike Miller <mikem@beardog.cce.hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

4ddfe74e