1. 24 Jun, 2009 1 commit
  2. 03 Sep, 2009 2 commits
  3. 22 Aug, 2009 3 commits
    • David Rientjes's avatar
      Add a new function to the flex_array API: · 570aa159
      David Rientjes authored
      	int flex_array_shrink(struct flex_array *fa)
      
      This function will free all unused second-level pages.  Since elements are
      now poisoned if they are not allocated with __GFP_ZERO, it's possible to
      identify parts that consist solely of unused elements.
      
      flex_array_shrink() returns the number of pages freed.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      570aa159
    • David Rientjes's avatar
      Newly initialized flex_array's and/or flex_array_part's are now poisoned · cfb22ee8
      David Rientjes authored
      with a new poison value, FLEX_ARRAY_FREE.  It's value is similar to
      POISON_FREE used in the various slab allocators, but is different to
      distinguish between flex array's poisoned kmem and slab allocator poisoned
      kmem.
      
      This will allow us to identify flex_array_part's that only contain free
      elements (and free them with an addition to the flex_array API).  This
      could also be extended in the future to identify `get' uses on elements
      that have not been `put'.
      
      If __GFP_ZERO is passed for a part's gfp mask, the poisoning is avoided. 
      These elements are considered to be in-use since they have been
      initialized.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cfb22ee8
    • David Rientjes's avatar
      Add a new function to the flex_array API: · 7e576440
      David Rientjes authored
      	int flex_array_clear(struct flex_array *fa,
      				unsigned int element_nr)
      
      This function will zero the element at element_nr in the flex_array.
      
      Although this is equivalent to using flex_array_put() and passing a
      pointer to zero'd memory, flex_array_clear() does not require such a
      pointer to memory that would most likely need to be allocated on the
      caller's stack which could be significantly large depending on
      element_size.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7e576440
  4. 24 Aug, 2009 2 commits
  5. 10 Sep, 2009 1 commit
    • Jiri Pirko's avatar
      Make ->ru_maxrss value in struct rusage filled accordingly to rss hiwater · 65ebc0c2
      Jiri Pirko authored
      mark.  This struct is filled as a parameter to getrusage syscall. 
      ->ru_maxrss value is set to KBs which is the way it is done in BSD
      systems.  /usr/bin/time (gnu time) application converts ->ru_maxrss to KBs
      which seems to be incorrect behavior.  Maintainer of this util was
      notified by me with the patch which corrects it and cc'ed.
      
      To make this happen we extend struct signal_struct by two fields.  The
      first one is ->maxrss which we use to store rss hiwater of the task.  The
      second one is ->cmaxrss which we use to store highest rss hiwater of all
      task childs.  These values are used in k_getrusage() to actually fill
      ->ru_maxrss.  k_getrusage() uses current rss hiwater value directly if mm
      struct exists.
      
      Note:
      exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
      it is intetionally behavior. *BSD getrusage have exec() inheriting.
      
      test programs
      ========================================================
      
      getrusage.c
      ===========
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <sys/types.h>
       #include <sys/time.h>
       #include <sys/resource.h>
       #include <sys/types.h>
       #include <sys/wait.h>
       #include <unistd.h>
       #include <signal.h>
       #include <sys/mman.h>
      
       #include "common.h"
      
       #define err(str) perror(str), exit(1)
      
      int main(int argc, char** argv)
      {
      	int status;
      
      	printf("allocate 100MB\n");
      	consume(100);
      
      	printf("testcase1: fork inherit? \n");
      	printf("  expect: initial.self ~= child.self\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      	} else {
      		show_rusage("fork child");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase2: fork inherit? (cont.) \n");
      	printf("  expect: initial.children ~= 100MB, but child.children = 0\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      	} else {
      		show_rusage("child");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase3: fork + malloc \n");
      	printf("  expect: child.self ~= initial.self + 50MB\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      	} else {
      		printf("allocate +50MB\n");
      		consume(50);
      		show_rusage("fork child");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase4: grandchild maxrss\n");
      	printf("  expect: post_wait.children ~= 300MB\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      		show_rusage("post_wait");
      	} else {
      		system("./child -n 0 -g 300");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase5: zombie\n");
      	printf("  expect: pre_wait ~= initial, IOW the zombie process is not accounted.\n");
      	printf("          post_wait ~= 400MB, IOW wait() collect child's max_rss. \n");
      	show_rusage("initial");
      	if (__fork()) {
      		sleep(1); /* children become zombie */
      		show_rusage("pre_wait");
      		wait(&status);
      		show_rusage("post_wait");
      	} else {
      		system("./child -n 400");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase6: SIG_IGN\n");
      	printf("  expect: initial ~= after_zombie (child's 500MB alloc should be ignored).\n");
      	show_rusage("initial");
      	signal(SIGCHLD, SIG_IGN);
      	if (__fork()) {
      		sleep(1); /* children become zombie */
      		show_rusage("after_zombie");
      	} else {
      		system("./child -n 500");
      		_exit(0);
      	}
      	printf("\n");
      	signal(SIGCHLD, SIG_DFL);
      
      	printf("testcase7: exec (without fork) \n");
      	printf("  expect: initial ~= exec \n");
      	show_rusage("initial");
      	execl("./child", "child", "-v", NULL);
      
      	return 0;
      }
      
      child.c
      =======
       #include <sys/types.h>
       #include <unistd.h>
       #include <sys/types.h>
       #include <sys/wait.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <sys/types.h>
       #include <sys/time.h>
       #include <sys/resource.h>
      
       #include "common.h"
      
      int main(int argc, char** argv)
      {
      	int status;
      	int c;
      	long consume_size = 0;
      	long grandchild_consume_size = 0;
      	int show = 0;
      
      	while ((c = getopt(argc, argv, "n:g:v")) != -1) {
      		switch (c) {
      		case 'n':
      			consume_size = atol(optarg);
      			break;
      		case 'v':
      			show = 1;
      			break;
      		case 'g':
      
      			grandchild_consume_size = atol(optarg);
      			break;
      		default:
      			break;
      		}
      	}
      
      	if (show)
      		show_rusage("exec");
      
      	if (consume_size) {
      		printf("child alloc %ldMB\n", consume_size);
      		consume(consume_size);
      	}
      
      	if (grandchild_consume_size) {
      		if (fork()) {
      			wait(&status);
      		} else {
      			printf("grandchild alloc %ldMB\n", grandchild_consume_size);
      			consume(grandchild_consume_size);
      
      			exit(0);
      		}
      	}
      
      	return 0;
      }
      
      common.c
      ========
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <sys/types.h>
       #include <sys/time.h>
       #include <sys/resource.h>
       #include <sys/types.h>
       #include <sys/wait.h>
       #include <unistd.h>
       #include <signal.h>
       #include <sys/mman.h>
      
       #include "common.h"
       #define err(str) perror(str), exit(1)
      
      void show_rusage(char *prefix)
      {
          	int err, err2;
          	struct rusage rusage_self;
          	struct rusage rusage_children;
      
          	printf("%s: ", prefix);
          	err = getrusage(RUSAGE_SELF, &rusage_self);
          	if (!err)
          		printf("self %ld ", rusage_self.ru_maxrss);
          	err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
          	if (!err2)
          		printf("children %ld ", rusage_children.ru_maxrss);
      
          	printf("\n");
      }
      
      /* Some buggy OS need this worthless CPU waste. */
      void make_pagefault(void)
      {
      	void *addr;
      	int size = getpagesize();
      	int i;
      
      	for (i=0; i<1000; i++) {
      		addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
      		if (addr == MAP_FAILED)
      			err("make_pagefault");
      		memset(addr, 0, size);
      		munmap(addr, size);
      	}
      }
      
      void consume(int mega)
      {
          	size_t sz = mega * 1024 * 1024;
          	void *ptr;
      
          	ptr = malloc(sz);
          	memset(ptr, 0, sz);
      	make_pagefault();
      }
      
      pid_t __fork(void)
      {
      	pid_t pid;
      
      	pid = fork();
      	make_pagefault();
      
      	return pid;
      }
      
      common.h
      ========
      void show_rusage(char *prefix);
      void make_pagefault(void);
      void consume(int mega);
      pid_t __fork(void);
      
      FreeBSD result (expected result)
      ========================================================
      allocate 100MB
      testcase1: fork inherit?
        expect: initial.self ~= child.self
      initial: self 103492 children 0
      fork child: self 103540 children 0
      
      testcase2: fork inherit? (cont.)
        expect: initial.children ~= 100MB, but child.children = 0
      initial: self 103540 children 103540
      child: self 103564 children 0
      
      testcase3: fork + malloc
        expect: child.self ~= initial.self + 50MB
      initial: self 103564 children 103564
      allocate +50MB
      fork child: self 154860 children 0
      
      testcase4: grandchild maxrss
        expect: post_wait.children ~= 300MB
      initial: self 103564 children 154860
      grandchild alloc 300MB
      post_wait: self 103564 children 308720
      
      testcase5: zombie
        expect: pre_wait ~= initial, IOW the zombie process is not accounted.
                post_wait ~= 400MB, IOW wait() collect child's max_rss.
      initial: self 103564 children 308720
      child alloc 400MB
      pre_wait: self 103564 children 308720
      post_wait: self 103564 children 411312
      
      testcase6: SIG_IGN
        expect: initial ~= after_zombie (child's 500MB alloc should be ignored).
      initial: self 103564 children 411312
      child alloc 500MB
      after_zombie: self 103624 children 411312
      
      testcase7: exec (without fork)
        expect: initial ~= exec
      initial: self 103624 children 411312
      exec: self 103624 children 411312
      
      Linux result (actual test result)
      ========================================================
      allocate 100MB
      testcase1: fork inherit?
        expect: initial.self ~= child.self
      initial: self 102848 children 0
      fork child: self 102572 children 0
      
      testcase2: fork inherit? (cont.)
        expect: initial.children ~= 100MB, but child.children = 0
      initial: self 102876 children 102644
      child: self 102572 children 0
      
      testcase3: fork + malloc
        expect: child.self ~= initial.self + 50MB
      initial: self 102876 children 102644
      allocate +50MB
      fork child: self 153804 children 0
      
      testcase4: grandchild maxrss
        expect: post_wait.children ~= 300MB
      initial: self 102876 children 153864
      grandchild alloc 300MB
      post_wait: self 102876 children 307536
      
      testcase5: zombie
        expect: pre_wait ~= initial, IOW the zombie process is not accounted.
                post_wait ~= 400MB, IOW wait() collect child's max_rss.
      initial: self 102876 children 307536
      child alloc 400MB
      pre_wait: self 102876 children 307536
      post_wait: self 102876 children 410076
      
      testcase6: SIG_IGN
        expect: initial ~= after_zombie (child's 500MB alloc should be ignored).
      initial: self 102876 children 410076
      child alloc 500MB
      after_zombie: self 102880 children 410076
      
      testcase7: exec (without fork)
        expect: initial ~= exec
      initial: self 102880 children 410076
      exec: self 102880 children 410076
      Signed-off-by: default avatarJiri Pirko <jpirko@redhat.com>
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      65ebc0c2
  6. 13 Sep, 2009 2 commits
  7. 27 Aug, 2009 1 commit
    • Joe Perches's avatar
      Previous behavior was "bottom-up" in each section from the pattern "F:" · fb143188
      Joe Perches authored
      entry that matched.  Now information is entered into the various lists in
      the "as entered" order for each matched section.
      
      This also allows the F: entry to be put anywhere in a section, not just as
      the last entries in the section.
      
      And a couple of improvements:
      
      Don't alphabetically sort before outputting the matched scm, status,
      subsystem and web sections.
      
      Ignore content after a single email address so these entries are acceptable
      M:	name <address> whatever other comment
      
      And a fix:
      
      Make an M: entry without a name again use the name from an immediately
      preceding P: line if it exists.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fb143188
  8. 14 Aug, 2009 7 commits
    • Joe Perches's avatar
      Allow control over the elimination of duplicate email names and addresses · 02d5e23d
      Joe Perches authored
      --remove-duplicates will use the first email name or address presented
      --noremove-duplicates will emit all names and addresses
      
      --remove-duplicates is enabled by default
      
      For instance:
      
      $ ./scripts/get_maintainer.pl -f drivers/char/tty_ioctl.c
      Greg Kroah-Hartman <gregkh@suse.de>
      Alan Cox <alan@linux.intel.com>
      Mike Frysinger <vapier@gentoo.org>
      Alexey Dobriyan <adobriyan@gmail.com>
      linux-kernel@vger.kernel.org
      
      $ ./scripts/get_maintainer.pl -f --noremove-duplicates drivers/char/tty_ioctl.c
      Greg Kroah-Hartman <gregkh@suse.de>
      Alan Cox <alan@redhat.com>
      Alan Cox <alan@linux.intel.com>
      Alan Cox <alan@lxorguk.ukuu.org.uk>
      Mike Frysinger <vapier@gentoo.org>
      Alexey Dobriyan <adobriyan@gmail.com>
      linux-kernel@vger.kernel.org
      
      Using --remove-duplicates could eliminate multiple maintainers that
      share the same name but not the same email address.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      02d5e23d
    • Joe Perches's avatar
      If a person sets a separator, it's only used if --nomultiline is set. · 6d4dc584
      Joe Perches authored
      Don't make the command line also include --nomultiline in that case.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6d4dc584
    • Joe Perches's avatar
      Add reading and using .mailmap file if it exists · 79a24bef
      Joe Perches authored
      Convert address entries in .mailmap to first encountered address
      Don't terminate shell commands with \n
      Strip characters found after sign-offs by: name <address> [stripped]
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      79a24bef
    • Joe Perches's avatar
      Added format_email and parse_email routines to reduce inline use. · 9b893f39
      Joe Perches authored
      Added email_address_inuse to eliminate multiple maintainer entries
      for the same email address, the first name encountered is used.
      
      Used internal perl equivalents of shell cmd use of grep|cut|sort|uniq
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9b893f39
    • Joe Perches's avatar
      --pattern-depth is used to control how many levels of directory traversal · 58521c9f
      Joe Perches authored
      should be performed to find maintainers.  default is 0 (all directory levels).
      
      For instance:
      
      MAINTAINERS currently has multiple M: and F: entries that match
      net/netfilter/ipvs/ip_vs_app.c
      
      IPVS
      M:	Wensong Zhang <wensong@linux-vs.org>
      M:	Simon Horman <horms@verge.net.au>
      M:	Julian Anastasov <ja@ssi.bg>
      [...]
      F:	net/netfilter/ipvs/
      
      NETFILTER/IPTABLES/IPCHAINS
      [...]
      M:	Patrick McHardy <kaber@trash.net>
      [...]
      F:	net/netfilter/
      
      NETWORKING [GENERAL]
      M:	"David S. Miller" <davem@davemloft.net>
      [...]
      F:	net/
      
      THE REST
      M:	Linus Torvalds <torvalds@linux-foundation.org>
      [...]
      F:	*/
      
      Using this command will return all of those maintainers:
      (except Linus unless --git-chief-maintainers is specified)
      
      $ ./scripts/get_maintainer.pl --nogit -nol \
      	-f net/netfilter/ipvs/ip_vs_app.c
      Julian Anastasov <ja@ssi.bg>
      Simon Horman <horms@verge.net.au>
      Wensong Zhang <wensong@linux-vs.org>
      Patrick McHardy <kaber@trash.net>
      David S. Miller <davem@davemloft.net>
      
      Adding --pattern-depth=1 will match at the deepest level
      $ ./scripts/get_maintainer.pl --nogit -nol --pattern-depth=1 \
      	-f net/netfilter/ipvs/ip_vs_app.c
      Julian Anastasov <ja@ssi.bg>
      Simon Horman <horms@verge.net.au>
      Wensong Zhang <wensong@linux-vs.org>
      
      Adding --pattern-depth=2 will match at the deepest level and 1 higher
      $ ./scripts/get_maintainer.pl --nogit -nol --pattern-depth=2 \
      	-f net/netfilter/ipvs/ip_vs_app.c
      Julian Anastasov <ja@ssi.bg>
      Simon Horman <horms@verge.net.au>
      Wensong Zhang <wensong@linux-vs.org>
      Patrick McHardy <kaber@trash.net>
      
      and so on.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      58521c9f
    • Joe Perches's avatar
      Before this change, matched sections were added in the order · 732c81fb
      Joe Perches authored
      of appearance in the normally alphabetic section order of
      the MAINTAINERS file.
      
      For instance, finding the maintainer for drivers/scsi/wd7000.c
      would first find "SCSI SUBSYSTEM", then "WD7000 SCSI SUBSYSTEM",
      then "THE REST".
      
      before patch:
      
      $ ./scripts/get_maintainer.pl --nogit -f drivers/scsi/wd7000.c
      James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Miroslav Zagorac <zaga@fly.cc.fer.hr>
      linux-scsi@vger.kernel.org
      linux-kernel@vger.kernel.org
      
      get_maintainer.pl now selects matched sections by longest pattern match.
      Longest is the number of "/"s and any specific file pattern.
      
      This changes the example output order of MAINTAINERS to whatever is
      selected in "WD7000 SUBSYSTEM", then "SCSI SYSTEM", then "THE REST".
      
      after patch:
      
      $ ./scripts/get_maintainer.pl --nogit -f drivers/scsi/wd7000.c
      Miroslav Zagorac <zaga@fly.cc.fer.hr>
      James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      linux-scsi@vger.kernel.org
      linux-kernel@vger.kernel.org
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      732c81fb
    • Joe Perches's avatar
      Julia Lawall suggested that get_maintainers.pl should have the · 23deb9bd
      Joe Perches authored
      ability to include signatories of commits that are modified by
      a particular patch.
      
      Vegard Nossum did something similar once.
      http://lkml.org/lkml/2008/5/29/449
      
      The modified script looks the commits for all lines in the
      patch, and includes the "-by:" signatories for those commits.
      It uses the same git-min-percent, git-max-maintainers, and
      git-min-signatures options.  git-since is ignored.
      
      It can be used independently from the --git default, so
              ./scripts/get_maintainers.pl --nogit --git-blame <patch>
      or
              ./scripts/get_maintainers.pl --nogit --git-blame -f <file>
      is acceptable.
      
      If used with -f <file>, all lines/commits for the file are
      checked.
      
      --git-blame can be slow if used with -f <file>
      --git-blame does not work with -f <directory>
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      23deb9bd
  9. 24 Aug, 2009 1 commit
  10. 31 Jul, 2009 1 commit
  11. 24 Aug, 2009 1 commit
  12. 10 Sep, 2009 1 commit
  13. 02 Sep, 2009 1 commit
    • Suzuki Poulose's avatar
      Compat utimensat() returns EINVAL when the tv_nsec is one of UTIME_OMIT or · 44b9650d
      Suzuki Poulose authored
      UTIME_NOW and the tv_sec is set to non-zero.  As per man pages, the tv_sec
      field should be ignored.
      
      sys_utimensat() works fine in this case.
      
      
      Test case:
      
      #define _GNU_SOURCE
      #define _ATFILE_SOURCE
      #include <stdio.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <sys/stat.h>
      #include <stdlib.h>
      
      main(int argc, char *argv[])
      {
      	struct timespec ts[2];
      	struct timespec *tsp;
      	
      	if (argc < 2) {
      		fprintf(stderr, "Usage : %s filename\n", argv[0]);
      		exit (-1);
      	}
      	
      	ts[0].tv_nsec = ts[1].tv_nsec = UTIME_NOW;
      	ts[0].tv_sec = ts[1].tv_sec = 1;
      
      	tsp = ts;
      	
      	if (utimensat(AT_FDCWD, argv[1],tsp,0) == -1)
      		perror("utimensat");
      	else
      		fprintf(stdout, "utimensat success\n");
      	return 0;
      }
      mjs22lp5:~ # cc -m64 utimensat-test.c -o utimensat_test64
      mjs22lp5:~ # cc -m32 utimensat-test.c -o utimensat_test32
      mjs22lp5:~ # ./utimensat_test32 /tmp/utimensat_test
      utimensat: Invalid argument
      mjs22lp5:~ # ./utimensat_test64 /tmp/utimensat_test
      utimensat success
      mjs22lp5:~ # uname -r
      2.6.31-rc8
      
      With the patch :
      
      mjs22lp5:~ # ./utimensat_test64 /tmp/utimensat_test
      utimensat success
      mjs22lp5:~ # ./utimensat_test32 /tmp/utimensat_test
      utimensat success
      mjs22lp5:~ # uname -r
      2.6.31-rc8utimensat
      Signed-off-by: default avatarSuzuki K P <suzuki@in.ibm.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      44b9650d
  14. 27 Aug, 2009 1 commit
  15. 24 Aug, 2009 1 commit
  16. 04 Sep, 2009 1 commit
  17. 24 Aug, 2009 2 commits
    • Davide Libenzi's avatar
      Split the anonfd interface into a bare file pointer creation one, and a · 7d9ad6ac
      Davide Libenzi authored
      file pointer creation plus install one.
      
      There are cases, like the usage of eventfds inside other kernel
      interfaces, where the file pointer created by anonfd needs to be used
      inside the initialization of other structures.
      
      As it is right now, as soon as anon_inode_getfd() returns, the kenrle can
      race with userspace closing the newly installed file descriptor.
      
      This patch, while keeping the old anon_inode_getfd(), introduces a new
      anon_inode_getfile() (whose services are reused in anon_inode_getfd())
      that allows to split the file creation phase and the fd install one.
      
      Once all the kernel structures are initialized, the code can call the
      proper fd_install().
      
      Gregory manifested the need for something like this inside KVM.
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Gregory Haskins <ghaskins@novell.com>
      Acked-by: default avatarRoland Dreier <rolandd@cisco.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7d9ad6ac
    • Bartlomiej Zolnierkiewicz's avatar
      On Saturday 01 August 2009 00:30:39 Mail Delivery Subsystem wrote: · ee812a7a
      Bartlomiej Zolnierkiewicz authored
      > Delivery to the following recipient failed permanently:
      >
      >      linware@sh.cvut.cz
      >
      > Technical details of permanent failure:
      > Google tried to deliver your message, but it was rejected by the recipient
      > domain. We recommend contacting the other email provider for further
      > information about the cause of this error. The error that the other server
      > returned was: 450 450 <linware@sh.cvut.cz>: Recipient address rejected:
      > undeliverable address: unknown user: "linware" (state 14).
      
      Cc: Petr Vandrovec <vandrove@vc.cvut.cz>
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ee812a7a
  18. 20 Aug, 2009 1 commit
  19. 05 Sep, 2009 1 commit
    • Jan Beulich's avatar
      gcc permitting variable length arrays makes the current construct used for · c9533f3c
      Jan Beulich authored
      BUILD_BUG_ON() useless, as that doesn't produce any diagnostic if the
      controlling expression isn't really constant.  Instead, this patch makes
      it so that a bit field gets used here.  Consequently, those uses where the
      condition isn't really constant now also need fixing.
      
      Note that in the gfp.h, kmemcheck.h, and virtio_config.h cases
      MAYBE_BUILD_BUG_ON() really just serves documentation purposes - even if
      the expression is compile time constant (__builtin_constant_p() yields
      true), the array is still deemed of variable length by gcc, and hence the
      whole expression doesn't have the intended effect.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c9533f3c
  20. 24 Aug, 2009 1 commit
  21. 19 Aug, 2009 2 commits
    • Nick Piggin's avatar
      We have had a report of bad memory allocation latency during DVD-RAM (UDF) · 5ecf6496
      Nick Piggin authored
      writing.  This is causing the user's desktop session to become unusable.
      
      Jan tracked the cause of this down to UDF inode reclaim blocking:
      
      gnome-screens D ffff810006d1d598     0 20686      1
       ffff810006d1d508 0000000000000082 ffff810037db6718 0000000000000800
       ffff810006d1d488 ffffffff807e4280 ffffffff807e4280 ffff810006d1a580
       ffff8100bccbc140 ffff810006d1a8c0 0000000006d1d4e8 ffff810006d1a8c0
      Call Trace:
       [<ffffffff804477f3>] io_schedule+0x63/0xa5
       [<ffffffff802c2587>] sync_buffer+0x3b/0x3f
       [<ffffffff80447d2a>] __wait_on_bit+0x47/0x79
       [<ffffffff80447dc6>] out_of_line_wait_on_bit+0x6a/0x77
       [<ffffffff802c24f6>] __wait_on_buffer+0x1f/0x21
       [<ffffffff802c442a>] __bread+0x70/0x86
       [<ffffffff88de9ec7>] :udf:udf_tread+0x38/0x3a
       [<ffffffff88de0fcf>] :udf:udf_update_inode+0x4d/0x68c
       [<ffffffff88de26e1>] :udf:udf_write_inode+0x1d/0x2b
       [<ffffffff802bcf85>] __writeback_single_inode+0x1c0/0x394
       [<ffffffff802bd205>] write_inode_now+0x7d/0xc4
       [<ffffffff88de2e76>] :udf:udf_clear_inode+0x3d/0x53
       [<ffffffff802b39ae>] clear_inode+0xc2/0x11b
       [<ffffffff802b3ab1>] dispose_list+0x5b/0x102
       [<ffffffff802b3d35>] shrink_icache_memory+0x1dd/0x213
       [<ffffffff8027ede3>] shrink_slab+0xe3/0x158
       [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232
       [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392
       [<ffffffff802951fa>] alloc_page_vma+0x176/0x189
       [<ffffffff802822d8>] __do_fault+0x10c/0x417
       [<ffffffff80284232>] handle_mm_fault+0x466/0x940
       [<ffffffff8044b922>] do_page_fault+0x676/0xabf
      
      This blocks with iprune_mutex held, which then blocks other reclaimers:
      
      X             D ffff81009d47c400     0 17285  14831
       ffff8100844f3728 0000000000000086 0000000000000000 ffff81000000e288
       ffff81000000da00 ffffffff807e4280 ffffffff807e4280 ffff81009d47c400
       ffffffff805ff890 ffff81009d47c740 00000000844f3808 ffff81009d47c740
      Call Trace:
       [<ffffffff80447f8c>] __mutex_lock_slowpath+0x72/0xa9
       [<ffffffff80447e1a>] mutex_lock+0x1e/0x22
       [<ffffffff802b3ba1>] shrink_icache_memory+0x49/0x213
       [<ffffffff8027ede3>] shrink_slab+0xe3/0x158
       [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232
       [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392
       [<ffffffff8029507f>] alloc_pages_current+0xd1/0xd6
       [<ffffffff80279ac0>] __get_free_pages+0xe/0x4d
       [<ffffffff802ae1b7>] __pollwait+0x5e/0xdf
       [<ffffffff8860f2b4>] :nvidia:nv_kern_poll+0x2e/0x73
       [<ffffffff802ad949>] do_select+0x308/0x506
       [<ffffffff802adced>] core_sys_select+0x1a6/0x254
       [<ffffffff802ae0b7>] sys_select+0xb5/0x157
      
      Now I think the main problem is having the filesystem block (and do IO) in
      inode reclaim.  The problem is that this doesn't get accounted well and
      penalizes a random allocator with a big latency spike caused by work
      generated from elsewhere.
      
      I think the best idea would be to avoid this.  By design if possible, or
      by deferring the hard work to an asynchronous context.  If the latter,
      then the fs would probably want to throttle creation of new work with
      queue size of the deferred work, but let's not get into those details.
      
      Anyway, the other obvious thing we looked at is the iprune_mutex which is
      causing the cascading blocking.  We could turn this into an rwsem to
      improve concurrency.  It is unreasonable to totally ban all potentially
      slow or blocking operations in inode reclaim, so I think this is a cheap
      way to get a small improvement.
      
      This doesn't solve the whole problem of course.  The process doing inode
      reclaim will still take the latency hit, and concurrent processes may end
      up contending on filesystem locks.  So fs developers should keep these
      problems in mind.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5ecf6496
    • Rusty Russell's avatar
      Impact: cleanup · 0db85948
      Rusty Russell authored
      No need for redeclaration.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0db85948
  22. 24 Aug, 2009 1 commit
  23. 22 Jul, 2009 1 commit
    • Scott James Remnant's avatar
      The act of a process becoming a session leader is a useful signal to a · 8f58f837
      Scott James Remnant authored
      supervising init daemon such as Upstart.
      
      While a daemon will normally do this as part of the process of becoming a
      daemon, it is rare for its children to do so.  When the children do, it is
      nearly always a sign that the child should be considered detached from the
      parent and not supervised along with it.
      
      The poster-child example is OpenSSH; the per-login children call setsid()
      so that they may control the pty connected to them.  If the primary daemon
      dies or is restarted, we do not want to consider the per-login children
      and want to respawn the primary daemon without killing the children.
      
      This patch adds a new PROC_SID_EVENT and associated structure to the
      proc_event event_data union, it arranges for this to be emitted when the
      special PIDTYPE_SID pid is set.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarScott James Remnant <scott@ubuntu.com>
      Acked-by: default avatarMatt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8f58f837
  24. 08 Sep, 2009 1 commit
  25. 04 Sep, 2009 1 commit
  26. 31 Jul, 2009 1 commit
  27. 28 Jul, 2009 1 commit
    • Andrew Morton's avatar
      Use atomic_dec_return(). · e27cbcc1
      Andrew Morton authored
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e27cbcc1