1. 03 Sep, 2009 1 commit
    • David Rientjes's avatar
      FLEX_ARRAY_INIT(element_size, total_nr_elements) cannot determine if · f3055907
      David Rientjes authored
      either parameter is valid, so flex arrays which are statically allocated
      with this interface can easily become corrupted or reference beyond its
      allocated memory.
      
      This removes FLEX_ARRAY_INIT() as a struct flex_array initializer since no
      initializer may perform the required checking.  Instead, the array is now
      defined with a new interface:
      
      	DEFINE_FLEX_ARRAY(name, element_size, total_nr_elements)
      
      This may be prefixed with `static' for file scope.
      
      This interface includes compile-time checking of the parameters to ensure
      they are valid.  Since the validity of both element_size and
      total_nr_elements depend on FLEX_ARRAY_BASE_SIZE and FLEX_ARRAY_PART_SIZE,
      the kernel build will fail if either of these predefined values changes
      such that the array parameters are no longer valid.
      
      Since BUILD_BUG_ON() requires compile time constants, several of the
      static inline functions that were once local to lib/flex_array.c had to be
      moved to include/linux/flex_array.h.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f3055907
  2. 22 Aug, 2009 3 commits
    • David Rientjes's avatar
      Add a new function to the flex_array API: · fef32d89
      David Rientjes authored
      	int flex_array_shrink(struct flex_array *fa)
      
      This function will free all unused second-level pages.  Since elements are
      now poisoned if they are not allocated with __GFP_ZERO, it's possible to
      identify parts that consist solely of unused elements.
      
      flex_array_shrink() returns the number of pages freed.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fef32d89
    • David Rientjes's avatar
      Newly initialized flex_array's and/or flex_array_part's are now poisoned · 8987e728
      David Rientjes authored
      with a new poison value, FLEX_ARRAY_FREE.  It's value is similar to
      POISON_FREE used in the various slab allocators, but is different to
      distinguish between flex array's poisoned kmem and slab allocator poisoned
      kmem.
      
      This will allow us to identify flex_array_part's that only contain free
      elements (and free them with an addition to the flex_array API).  This
      could also be extended in the future to identify `get' uses on elements
      that have not been `put'.
      
      If __GFP_ZERO is passed for a part's gfp mask, the poisoning is avoided. 
      These elements are considered to be in-use since they have been
      initialized.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8987e728
    • David Rientjes's avatar
      Add a new function to the flex_array API: · cf31429c
      David Rientjes authored
      	int flex_array_clear(struct flex_array *fa,
      				unsigned int element_nr)
      
      This function will zero the element at element_nr in the flex_array.
      
      Although this is equivalent to using flex_array_put() and passing a
      pointer to zero'd memory, flex_array_clear() does not require such a
      pointer to memory that would most likely need to be allocated on the
      caller's stack which could be significantly large depending on
      element_size.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cf31429c
  3. 24 Aug, 2009 2 commits
  4. 10 Sep, 2009 1 commit
    • Jiri Pirko's avatar
      Make ->ru_maxrss value in struct rusage filled accordingly to rss hiwater · d4fa8308
      Jiri Pirko authored
      mark.  This struct is filled as a parameter to getrusage syscall. 
      ->ru_maxrss value is set to KBs which is the way it is done in BSD
      systems.  /usr/bin/time (gnu time) application converts ->ru_maxrss to KBs
      which seems to be incorrect behavior.  Maintainer of this util was
      notified by me with the patch which corrects it and cc'ed.
      
      To make this happen we extend struct signal_struct by two fields.  The
      first one is ->maxrss which we use to store rss hiwater of the task.  The
      second one is ->cmaxrss which we use to store highest rss hiwater of all
      task childs.  These values are used in k_getrusage() to actually fill
      ->ru_maxrss.  k_getrusage() uses current rss hiwater value directly if mm
      struct exists.
      
      Note:
      exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
      it is intetionally behavior. *BSD getrusage have exec() inheriting.
      
      test programs
      ========================================================
      
      getrusage.c
      ===========
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <sys/types.h>
       #include <sys/time.h>
       #include <sys/resource.h>
       #include <sys/types.h>
       #include <sys/wait.h>
       #include <unistd.h>
       #include <signal.h>
       #include <sys/mman.h>
      
       #include "common.h"
      
       #define err(str) perror(str), exit(1)
      
      int main(int argc, char** argv)
      {
      	int status;
      
      	printf("allocate 100MB\n");
      	consume(100);
      
      	printf("testcase1: fork inherit? \n");
      	printf("  expect: initial.self ~= child.self\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      	} else {
      		show_rusage("fork child");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase2: fork inherit? (cont.) \n");
      	printf("  expect: initial.children ~= 100MB, but child.children = 0\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      	} else {
      		show_rusage("child");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase3: fork + malloc \n");
      	printf("  expect: child.self ~= initial.self + 50MB\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      	} else {
      		printf("allocate +50MB\n");
      		consume(50);
      		show_rusage("fork child");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase4: grandchild maxrss\n");
      	printf("  expect: post_wait.children ~= 300MB\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      		show_rusage("post_wait");
      	} else {
      		system("./child -n 0 -g 300");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase5: zombie\n");
      	printf("  expect: pre_wait ~= initial, IOW the zombie process is not accounted.\n");
      	printf("          post_wait ~= 400MB, IOW wait() collect child's max_rss. \n");
      	show_rusage("initial");
      	if (__fork()) {
      		sleep(1); /* children become zombie */
      		show_rusage("pre_wait");
      		wait(&status);
      		show_rusage("post_wait");
      	} else {
      		system("./child -n 400");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase6: SIG_IGN\n");
      	printf("  expect: initial ~= after_zombie (child's 500MB alloc should be ignored).\n");
      	show_rusage("initial");
      	signal(SIGCHLD, SIG_IGN);
      	if (__fork()) {
      		sleep(1); /* children become zombie */
      		show_rusage("after_zombie");
      	} else {
      		system("./child -n 500");
      		_exit(0);
      	}
      	printf("\n");
      	signal(SIGCHLD, SIG_DFL);
      
      	printf("testcase7: exec (without fork) \n");
      	printf("  expect: initial ~= exec \n");
      	show_rusage("initial");
      	execl("./child", "child", "-v", NULL);
      
      	return 0;
      }
      
      child.c
      =======
       #include <sys/types.h>
       #include <unistd.h>
       #include <sys/types.h>
       #include <sys/wait.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <sys/types.h>
       #include <sys/time.h>
       #include <sys/resource.h>
      
       #include "common.h"
      
      int main(int argc, char** argv)
      {
      	int status;
      	int c;
      	long consume_size = 0;
      	long grandchild_consume_size = 0;
      	int show = 0;
      
      	while ((c = getopt(argc, argv, "n:g:v")) != -1) {
      		switch (c) {
      		case 'n':
      			consume_size = atol(optarg);
      			break;
      		case 'v':
      			show = 1;
      			break;
      		case 'g':
      
      			grandchild_consume_size = atol(optarg);
      			break;
      		default:
      			break;
      		}
      	}
      
      	if (show)
      		show_rusage("exec");
      
      	if (consume_size) {
      		printf("child alloc %ldMB\n", consume_size);
      		consume(consume_size);
      	}
      
      	if (grandchild_consume_size) {
      		if (fork()) {
      			wait(&status);
      		} else {
      			printf("grandchild alloc %ldMB\n", grandchild_consume_size);
      			consume(grandchild_consume_size);
      
      			exit(0);
      		}
      	}
      
      	return 0;
      }
      
      common.c
      ========
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <sys/types.h>
       #include <sys/time.h>
       #include <sys/resource.h>
       #include <sys/types.h>
       #include <sys/wait.h>
       #include <unistd.h>
       #include <signal.h>
       #include <sys/mman.h>
      
       #include "common.h"
       #define err(str) perror(str), exit(1)
      
      void show_rusage(char *prefix)
      {
          	int err, err2;
          	struct rusage rusage_self;
          	struct rusage rusage_children;
      
          	printf("%s: ", prefix);
          	err = getrusage(RUSAGE_SELF, &rusage_self);
          	if (!err)
          		printf("self %ld ", rusage_self.ru_maxrss);
          	err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
          	if (!err2)
          		printf("children %ld ", rusage_children.ru_maxrss);
      
          	printf("\n");
      }
      
      /* Some buggy OS need this worthless CPU waste. */
      void make_pagefault(void)
      {
      	void *addr;
      	int size = getpagesize();
      	int i;
      
      	for (i=0; i<1000; i++) {
      		addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
      		if (addr == MAP_FAILED)
      			err("make_pagefault");
      		memset(addr, 0, size);
      		munmap(addr, size);
      	}
      }
      
      void consume(int mega)
      {
          	size_t sz = mega * 1024 * 1024;
          	void *ptr;
      
          	ptr = malloc(sz);
          	memset(ptr, 0, sz);
      	make_pagefault();
      }
      
      pid_t __fork(void)
      {
      	pid_t pid;
      
      	pid = fork();
      	make_pagefault();
      
      	return pid;
      }
      
      common.h
      ========
      void show_rusage(char *prefix);
      void make_pagefault(void);
      void consume(int mega);
      pid_t __fork(void);
      
      FreeBSD result (expected result)
      ========================================================
      allocate 100MB
      testcase1: fork inherit?
        expect: initial.self ~= child.self
      initial: self 103492 children 0
      fork child: self 103540 children 0
      
      testcase2: fork inherit? (cont.)
        expect: initial.children ~= 100MB, but child.children = 0
      initial: self 103540 children 103540
      child: self 103564 children 0
      
      testcase3: fork + malloc
        expect: child.self ~= initial.self + 50MB
      initial: self 103564 children 103564
      allocate +50MB
      fork child: self 154860 children 0
      
      testcase4: grandchild maxrss
        expect: post_wait.children ~= 300MB
      initial: self 103564 children 154860
      grandchild alloc 300MB
      post_wait: self 103564 children 308720
      
      testcase5: zombie
        expect: pre_wait ~= initial, IOW the zombie process is not accounted.
                post_wait ~= 400MB, IOW wait() collect child's max_rss.
      initial: self 103564 children 308720
      child alloc 400MB
      pre_wait: self 103564 children 308720
      post_wait: self 103564 children 411312
      
      testcase6: SIG_IGN
        expect: initial ~= after_zombie (child's 500MB alloc should be ignored).
      initial: self 103564 children 411312
      child alloc 500MB
      after_zombie: self 103624 children 411312
      
      testcase7: exec (without fork)
        expect: initial ~= exec
      initial: self 103624 children 411312
      exec: self 103624 children 411312
      
      Linux result (actual test result)
      ========================================================
      allocate 100MB
      testcase1: fork inherit?
        expect: initial.self ~= child.self
      initial: self 102848 children 0
      fork child: self 102572 children 0
      
      testcase2: fork inherit? (cont.)
        expect: initial.children ~= 100MB, but child.children = 0
      initial: self 102876 children 102644
      child: self 102572 children 0
      
      testcase3: fork + malloc
        expect: child.self ~= initial.self + 50MB
      initial: self 102876 children 102644
      allocate +50MB
      fork child: self 153804 children 0
      
      testcase4: grandchild maxrss
        expect: post_wait.children ~= 300MB
      initial: self 102876 children 153864
      grandchild alloc 300MB
      post_wait: self 102876 children 307536
      
      testcase5: zombie
        expect: pre_wait ~= initial, IOW the zombie process is not accounted.
                post_wait ~= 400MB, IOW wait() collect child's max_rss.
      initial: self 102876 children 307536
      child alloc 400MB
      pre_wait: self 102876 children 307536
      post_wait: self 102876 children 410076
      
      testcase6: SIG_IGN
        expect: initial ~= after_zombie (child's 500MB alloc should be ignored).
      initial: self 102876 children 410076
      child alloc 500MB
      after_zombie: self 102880 children 410076
      
      testcase7: exec (without fork)
        expect: initial ~= exec
      initial: self 102880 children 410076
      exec: self 102880 children 410076
      Signed-off-by: default avatarJiri Pirko <jpirko@redhat.com>
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d4fa8308
  5. 27 Aug, 2009 1 commit
    • Joe Perches's avatar
      Previous behavior was "bottom-up" in each section from the pattern "F:" · 03d02353
      Joe Perches authored
      entry that matched.  Now information is entered into the various lists in
      the "as entered" order for each matched section.
      
      This also allows the F: entry to be put anywhere in a section, not just as
      the last entries in the section.
      
      And a couple of improvements:
      
      Don't alphabetically sort before outputting the matched scm, status,
      subsystem and web sections.
      
      Ignore content after a single email address so these entries are acceptable
      M:	name <address> whatever other comment
      
      And a fix:
      
      Make an M: entry without a name again use the name from an immediately
      preceding P: line if it exists.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      03d02353
  6. 14 Aug, 2009 7 commits
    • Joe Perches's avatar
      Allow control over the elimination of duplicate email names and addresses · 0b53507b
      Joe Perches authored
      --remove-duplicates will use the first email name or address presented
      --noremove-duplicates will emit all names and addresses
      
      --remove-duplicates is enabled by default
      
      For instance:
      
      $ ./scripts/get_maintainer.pl -f drivers/char/tty_ioctl.c
      Greg Kroah-Hartman <gregkh@suse.de>
      Alan Cox <alan@linux.intel.com>
      Mike Frysinger <vapier@gentoo.org>
      Alexey Dobriyan <adobriyan@gmail.com>
      linux-kernel@vger.kernel.org
      
      $ ./scripts/get_maintainer.pl -f --noremove-duplicates drivers/char/tty_ioctl.c
      Greg Kroah-Hartman <gregkh@suse.de>
      Alan Cox <alan@redhat.com>
      Alan Cox <alan@linux.intel.com>
      Alan Cox <alan@lxorguk.ukuu.org.uk>
      Mike Frysinger <vapier@gentoo.org>
      Alexey Dobriyan <adobriyan@gmail.com>
      linux-kernel@vger.kernel.org
      
      Using --remove-duplicates could eliminate multiple maintainers that
      share the same name but not the same email address.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0b53507b
    • Joe Perches's avatar
      If a person sets a separator, it's only used if --nomultiline is set. · 37cd581c
      Joe Perches authored
      Don't make the command line also include --nomultiline in that case.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      37cd581c
    • Joe Perches's avatar
      Add reading and using .mailmap file if it exists · 926aae6e
      Joe Perches authored
      Convert address entries in .mailmap to first encountered address
      Don't terminate shell commands with \n
      Strip characters found after sign-offs by: name <address> [stripped]
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      926aae6e
    • Joe Perches's avatar
      Added format_email and parse_email routines to reduce inline use. · 1206481d
      Joe Perches authored
      Added email_address_inuse to eliminate multiple maintainer entries
      for the same email address, the first name encountered is used.
      
      Used internal perl equivalents of shell cmd use of grep|cut|sort|uniq
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1206481d
    • Joe Perches's avatar
      --pattern-depth is used to control how many levels of directory traversal · 1389227b
      Joe Perches authored
      should be performed to find maintainers.  default is 0 (all directory levels).
      
      For instance:
      
      MAINTAINERS currently has multiple M: and F: entries that match
      net/netfilter/ipvs/ip_vs_app.c
      
      IPVS
      M:	Wensong Zhang <wensong@linux-vs.org>
      M:	Simon Horman <horms@verge.net.au>
      M:	Julian Anastasov <ja@ssi.bg>
      [...]
      F:	net/netfilter/ipvs/
      
      NETFILTER/IPTABLES/IPCHAINS
      [...]
      M:	Patrick McHardy <kaber@trash.net>
      [...]
      F:	net/netfilter/
      
      NETWORKING [GENERAL]
      M:	"David S. Miller" <davem@davemloft.net>
      [...]
      F:	net/
      
      THE REST
      M:	Linus Torvalds <torvalds@linux-foundation.org>
      [...]
      F:	*/
      
      Using this command will return all of those maintainers:
      (except Linus unless --git-chief-maintainers is specified)
      
      $ ./scripts/get_maintainer.pl --nogit -nol \
      	-f net/netfilter/ipvs/ip_vs_app.c
      Julian Anastasov <ja@ssi.bg>
      Simon Horman <horms@verge.net.au>
      Wensong Zhang <wensong@linux-vs.org>
      Patrick McHardy <kaber@trash.net>
      David S. Miller <davem@davemloft.net>
      
      Adding --pattern-depth=1 will match at the deepest level
      $ ./scripts/get_maintainer.pl --nogit -nol --pattern-depth=1 \
      	-f net/netfilter/ipvs/ip_vs_app.c
      Julian Anastasov <ja@ssi.bg>
      Simon Horman <horms@verge.net.au>
      Wensong Zhang <wensong@linux-vs.org>
      
      Adding --pattern-depth=2 will match at the deepest level and 1 higher
      $ ./scripts/get_maintainer.pl --nogit -nol --pattern-depth=2 \
      	-f net/netfilter/ipvs/ip_vs_app.c
      Julian Anastasov <ja@ssi.bg>
      Simon Horman <horms@verge.net.au>
      Wensong Zhang <wensong@linux-vs.org>
      Patrick McHardy <kaber@trash.net>
      
      and so on.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1389227b
    • Joe Perches's avatar
      Before this change, matched sections were added in the order · f33669e9
      Joe Perches authored
      of appearance in the normally alphabetic section order of
      the MAINTAINERS file.
      
      For instance, finding the maintainer for drivers/scsi/wd7000.c
      would first find "SCSI SUBSYSTEM", then "WD7000 SCSI SUBSYSTEM",
      then "THE REST".
      
      before patch:
      
      $ ./scripts/get_maintainer.pl --nogit -f drivers/scsi/wd7000.c
      James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Miroslav Zagorac <zaga@fly.cc.fer.hr>
      linux-scsi@vger.kernel.org
      linux-kernel@vger.kernel.org
      
      get_maintainer.pl now selects matched sections by longest pattern match.
      Longest is the number of "/"s and any specific file pattern.
      
      This changes the example output order of MAINTAINERS to whatever is
      selected in "WD7000 SUBSYSTEM", then "SCSI SYSTEM", then "THE REST".
      
      after patch:
      
      $ ./scripts/get_maintainer.pl --nogit -f drivers/scsi/wd7000.c
      Miroslav Zagorac <zaga@fly.cc.fer.hr>
      James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      linux-scsi@vger.kernel.org
      linux-kernel@vger.kernel.org
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f33669e9
    • Joe Perches's avatar
      Julia Lawall suggested that get_maintainers.pl should have the · b7a9288a
      Joe Perches authored
      ability to include signatories of commits that are modified by
      a particular patch.
      
      Vegard Nossum did something similar once.
      http://lkml.org/lkml/2008/5/29/449
      
      The modified script looks the commits for all lines in the
      patch, and includes the "-by:" signatories for those commits.
      It uses the same git-min-percent, git-max-maintainers, and
      git-min-signatures options.  git-since is ignored.
      
      It can be used independently from the --git default, so
              ./scripts/get_maintainers.pl --nogit --git-blame <patch>
      or
              ./scripts/get_maintainers.pl --nogit --git-blame -f <file>
      is acceptable.
      
      If used with -f <file>, all lines/commits for the file are
      checked.
      
      --git-blame can be slow if used with -f <file>
      --git-blame does not work with -f <directory>
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b7a9288a
  7. 24 Aug, 2009 1 commit
  8. 31 Jul, 2009 3 commits
    • Xiao Guangrong's avatar
      This patch is incomplete and thanks for Peter Zijlstra to point out · 3439617e
      Xiao Guangrong authored
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3439617e
    • Xiao Guangrong's avatar
      There is a race between generic_smp_call_function_*() and hotplug_cfd() in · fa15d937
      Xiao Guangrong authored
      many cases, see below examples:
      
      1: hotplug_cfd() can free cfd->cpumask, the system will crash if the
         cpu's cfd still in the call_function list:
      
      
            CPU A:                         CPU B
      
       smp_call_function_many()	    ......
         cpu_down()                      ......
        hotplug_cfd() ->                 ......
       free_cpumask_var(cfd->cpumask)  (receive function IPI interrupte)
                                      /* read cfd->cpumask */
                                generic_smp_call_function_interrupt() ->
                               cpumask_test_and_clear_cpu(cpu, data->cpumask)
      
                               	CRASH!!!
      
      2: It's not handle call_function list when cpu down, It's will lead to
         dead-wait if other path is waiting this cpu to execute function
      
          CPU A:                           CPU B
      
       smp_call_function_many(wait=0)
              ......			    CPU B down
         smp_call_function_many() -->  (cpu down before recevie function
          csd_lock(&data->csd);         IPI interrupte)
      
          DEAD-WAIT!!!!
      
        So, CPU A will dead-wait in csd_lock(), the same as
        smp_call_function_single()
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fa15d937
    • Andrew Morton's avatar
      Move hotplug_cfd() to the end of the file so that we can see what changes · 5d40e4ad
      Andrew Morton authored
      the next patch (generic-ipi: fix the race between
      generic_smp_call_function_*() and hotplug_cfd()) actually makes to that
      function.
      
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5d40e4ad
  9. 24 Aug, 2009 1 commit
  10. 02 Sep, 2009 1 commit
    • Suzuki Poulose's avatar
      Compat utimensat() returns EINVAL when the tv_nsec is one of UTIME_OMIT or · f05da6b7
      Suzuki Poulose authored
      UTIME_NOW and the tv_sec is set to non-zero.  As per man pages, the tv_sec
      field should be ignored.
      
      sys_utimensat() works fine in this case.
      
      
      Test case:
      
      #define _GNU_SOURCE
      #define _ATFILE_SOURCE
      #include <stdio.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <sys/stat.h>
      #include <stdlib.h>
      
      main(int argc, char *argv[])
      {
      	struct timespec ts[2];
      	struct timespec *tsp;
      	
      	if (argc < 2) {
      		fprintf(stderr, "Usage : %s filename\n", argv[0]);
      		exit (-1);
      	}
      	
      	ts[0].tv_nsec = ts[1].tv_nsec = UTIME_NOW;
      	ts[0].tv_sec = ts[1].tv_sec = 1;
      
      	tsp = ts;
      	
      	if (utimensat(AT_FDCWD, argv[1],tsp,0) == -1)
      		perror("utimensat");
      	else
      		fprintf(stdout, "utimensat success\n");
      	return 0;
      }
      mjs22lp5:~ # cc -m64 utimensat-test.c -o utimensat_test64
      mjs22lp5:~ # cc -m32 utimensat-test.c -o utimensat_test32
      mjs22lp5:~ # ./utimensat_test32 /tmp/utimensat_test
      utimensat: Invalid argument
      mjs22lp5:~ # ./utimensat_test64 /tmp/utimensat_test
      utimensat success
      mjs22lp5:~ # uname -r
      2.6.31-rc8
      
      With the patch :
      
      mjs22lp5:~ # ./utimensat_test64 /tmp/utimensat_test
      utimensat success
      mjs22lp5:~ # ./utimensat_test32 /tmp/utimensat_test
      utimensat success
      mjs22lp5:~ # uname -r
      2.6.31-rc8utimensat
      Signed-off-by: default avatarSuzuki K P <suzuki@in.ibm.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f05da6b7
  11. 27 Aug, 2009 1 commit
  12. 24 Aug, 2009 1 commit
  13. 04 Sep, 2009 1 commit
  14. 24 Aug, 2009 2 commits
    • Davide Libenzi's avatar
      Split the anonfd interface into a bare file pointer creation one, and a · 4b23c37a
      Davide Libenzi authored
      file pointer creation plus install one.
      
      There are cases, like the usage of eventfds inside other kernel
      interfaces, where the file pointer created by anonfd needs to be used
      inside the initialization of other structures.
      
      As it is right now, as soon as anon_inode_getfd() returns, the kenrle can
      race with userspace closing the newly installed file descriptor.
      
      This patch, while keeping the old anon_inode_getfd(), introduces a new
      anon_inode_getfile() (whose services are reused in anon_inode_getfd())
      that allows to split the file creation phase and the fd install one.
      
      Once all the kernel structures are initialized, the code can call the
      proper fd_install().
      
      Gregory manifested the need for something like this inside KVM.
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Gregory Haskins <ghaskins@novell.com>
      Acked-by: default avatarRoland Dreier <rolandd@cisco.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4b23c37a
    • Bartlomiej Zolnierkiewicz's avatar
      On Saturday 01 August 2009 00:30:39 Mail Delivery Subsystem wrote: · 2a27e1ae
      Bartlomiej Zolnierkiewicz authored
      > Delivery to the following recipient failed permanently:
      >
      >      linware@sh.cvut.cz
      >
      > Technical details of permanent failure:
      > Google tried to deliver your message, but it was rejected by the recipient
      > domain. We recommend contacting the other email provider for further
      > information about the cause of this error. The error that the other server
      > returned was: 450 450 <linware@sh.cvut.cz>: Recipient address rejected:
      > undeliverable address: unknown user: "linware" (state 14).
      
      Cc: Petr Vandrovec <vandrove@vc.cvut.cz>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2a27e1ae
  15. 20 Aug, 2009 1 commit
  16. 05 Sep, 2009 1 commit
    • Jan Beulich's avatar
      gcc permitting variable length arrays makes the current construct used for · 78f95a76
      Jan Beulich authored
      BUILD_BUG_ON() useless, as that doesn't produce any diagnostic if the
      controlling expression isn't really constant.  Instead, this patch makes
      it so that a bit field gets used here.  Consequently, those uses where the
      condition isn't really constant now also need fixing.
      
      Note that in the gfp.h, kmemcheck.h, and virtio_config.h cases
      MAYBE_BUILD_BUG_ON() really just serves documentation purposes - even if
      the expression is compile time constant (__builtin_constant_p() yields
      true), the array is still deemed of variable length by gcc, and hence the
      whole expression doesn't have the intended effect.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      78f95a76
  17. 24 Aug, 2009 1 commit
  18. 19 Aug, 2009 2 commits
    • Nick Piggin's avatar
      We have had a report of bad memory allocation latency during DVD-RAM (UDF) · f2871d82
      Nick Piggin authored
      writing.  This is causing the user's desktop session to become unusable.
      
      Jan tracked the cause of this down to UDF inode reclaim blocking:
      
      gnome-screens D ffff810006d1d598     0 20686      1
       ffff810006d1d508 0000000000000082 ffff810037db6718 0000000000000800
       ffff810006d1d488 ffffffff807e4280 ffffffff807e4280 ffff810006d1a580
       ffff8100bccbc140 ffff810006d1a8c0 0000000006d1d4e8 ffff810006d1a8c0
      Call Trace:
       [<ffffffff804477f3>] io_schedule+0x63/0xa5
       [<ffffffff802c2587>] sync_buffer+0x3b/0x3f
       [<ffffffff80447d2a>] __wait_on_bit+0x47/0x79
       [<ffffffff80447dc6>] out_of_line_wait_on_bit+0x6a/0x77
       [<ffffffff802c24f6>] __wait_on_buffer+0x1f/0x21
       [<ffffffff802c442a>] __bread+0x70/0x86
       [<ffffffff88de9ec7>] :udf:udf_tread+0x38/0x3a
       [<ffffffff88de0fcf>] :udf:udf_update_inode+0x4d/0x68c
       [<ffffffff88de26e1>] :udf:udf_write_inode+0x1d/0x2b
       [<ffffffff802bcf85>] __writeback_single_inode+0x1c0/0x394
       [<ffffffff802bd205>] write_inode_now+0x7d/0xc4
       [<ffffffff88de2e76>] :udf:udf_clear_inode+0x3d/0x53
       [<ffffffff802b39ae>] clear_inode+0xc2/0x11b
       [<ffffffff802b3ab1>] dispose_list+0x5b/0x102
       [<ffffffff802b3d35>] shrink_icache_memory+0x1dd/0x213
       [<ffffffff8027ede3>] shrink_slab+0xe3/0x158
       [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232
       [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392
       [<ffffffff802951fa>] alloc_page_vma+0x176/0x189
       [<ffffffff802822d8>] __do_fault+0x10c/0x417
       [<ffffffff80284232>] handle_mm_fault+0x466/0x940
       [<ffffffff8044b922>] do_page_fault+0x676/0xabf
      
      This blocks with iprune_mutex held, which then blocks other reclaimers:
      
      X             D ffff81009d47c400     0 17285  14831
       ffff8100844f3728 0000000000000086 0000000000000000 ffff81000000e288
       ffff81000000da00 ffffffff807e4280 ffffffff807e4280 ffff81009d47c400
       ffffffff805ff890 ffff81009d47c740 00000000844f3808 ffff81009d47c740
      Call Trace:
       [<ffffffff80447f8c>] __mutex_lock_slowpath+0x72/0xa9
       [<ffffffff80447e1a>] mutex_lock+0x1e/0x22
       [<ffffffff802b3ba1>] shrink_icache_memory+0x49/0x213
       [<ffffffff8027ede3>] shrink_slab+0xe3/0x158
       [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232
       [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392
       [<ffffffff8029507f>] alloc_pages_current+0xd1/0xd6
       [<ffffffff80279ac0>] __get_free_pages+0xe/0x4d
       [<ffffffff802ae1b7>] __pollwait+0x5e/0xdf
       [<ffffffff8860f2b4>] :nvidia:nv_kern_poll+0x2e/0x73
       [<ffffffff802ad949>] do_select+0x308/0x506
       [<ffffffff802adced>] core_sys_select+0x1a6/0x254
       [<ffffffff802ae0b7>] sys_select+0xb5/0x157
      
      Now I think the main problem is having the filesystem block (and do IO) in
      inode reclaim.  The problem is that this doesn't get accounted well and
      penalizes a random allocator with a big latency spike caused by work
      generated from elsewhere.
      
      I think the best idea would be to avoid this.  By design if possible, or
      by deferring the hard work to an asynchronous context.  If the latter,
      then the fs would probably want to throttle creation of new work with
      queue size of the deferred work, but let's not get into those details.
      
      Anyway, the other obvious thing we looked at is the iprune_mutex which is
      causing the cascading blocking.  We could turn this into an rwsem to
      improve concurrency.  It is unreasonable to totally ban all potentially
      slow or blocking operations in inode reclaim, so I think this is a cheap
      way to get a small improvement.
      
      This doesn't solve the whole problem of course.  The process doing inode
      reclaim will still take the latency hit, and concurrent processes may end
      up contending on filesystem locks.  So fs developers should keep these
      problems in mind.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f2871d82
    • Rusty Russell's avatar
      Impact: cleanup · ed53d1b3
      Rusty Russell authored
      No need for redeclaration.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ed53d1b3
  19. 24 Aug, 2009 1 commit
  20. 22 Jul, 2009 1 commit
    • Scott James Remnant's avatar
      The act of a process becoming a session leader is a useful signal to a · 614454c8
      Scott James Remnant authored
      supervising init daemon such as Upstart.
      
      While a daemon will normally do this as part of the process of becoming a
      daemon, it is rare for its children to do so.  When the children do, it is
      nearly always a sign that the child should be considered detached from the
      parent and not supervised along with it.
      
      The poster-child example is OpenSSH; the per-login children call setsid()
      so that they may control the pty connected to them.  If the primary daemon
      dies or is restarted, we do not want to consider the per-login children
      and want to respawn the primary daemon without killing the children.
      
      This patch adds a new PROC_SID_EVENT and associated structure to the
      proc_event event_data union, it arranges for this to be emitted when the
      special PIDTYPE_SID pid is set.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarScott James Remnant <scott@ubuntu.com>
      Acked-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      614454c8
  21. 08 Sep, 2009 1 commit
  22. 04 Sep, 2009 1 commit
  23. 31 Jul, 2009 1 commit
  24. 28 Jul, 2009 1 commit
    • Andrew Morton's avatar
      Use atomic_dec_return(). · 8d45c1a8
      Andrew Morton authored
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8d45c1a8
  25. 30 Jul, 2009 1 commit
    • Xiao Guangrong's avatar
      This patch can remove spinlock from struct call_function_data, the · 4de930a4
      Xiao Guangrong authored
      reasons are below:
      
      1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
         it can atomically test and clear specific cpu, we can use it instead
         of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
         to protect those in generic_smp_call_function_interrupt().
      
      2: in smp_call_function_many(), after csd_lock() return, the current's
         cfd_data is deleted from call_function list, so it not have race
         between other cpus, then cfs_data is only used in
         smp_call_function_many() that must disable preemption and not from
         a hardware interrupthandler or from a bottom half handler to call,
         only the correspond cpu can use it, so it not have race in current
         cpu, no need cfs_data->lock to protect it.
      
      3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
         generic_smp_call_function_interrupt(), so we can define cfs_data->refs
         to atomic_t, and no need cfs_data->lock any more.
      Signed-off-by: default avatarXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4de930a4
  26. 24 Jul, 2009 1 commit
  27. 23 Jul, 2009 1 commit