1. 12 Aug, 2009 1 commit
    • Nils Carlson's avatar
      The periodic interrupt from drivers/char/hpet.c does not work correctly, · a4f612ce
      Nils Carlson authored
      both when using the periodic capability of the hardware and while
      emulating the periodic interrupt (when hardware does not support periodic
      mode).
      
      With timers capable of periodic interrupts, the comparator field is first
      set with the period value followed by set of hidden accumulator, which has
      the side effect of overwriting the comparator value.  This results in
      wrong periodicity for the interrupts.  For, periodic interrupts to work,
      following steps are necessary, in that order.
      
      * Set config with Tn_VAL_SET_CNF bit
      
      * Write to hidden accumulator, the value written is the time when the
        first interrupt should be generated
      
      * Write compartor with period interval for subsequent interrupts
        (http://www.intel.com/hardwaredesign/hpetspec_1.pdf )
      
      When emulating periodic timer with timers not capable of periodic
      interrupt, driver is adding the period to counter value instead of
      comparator value, which causes slow drift when using this emulation.
      
      Also, driver seems to add hpetp->hp_delta both while setting up periodic
      interrupt and while emulating periodic interrupts with timers not capable
      of doing periodic interrupts.  This hp_delta will result in slower than
      expected interrupt rate and should not be used while setting the interval.
      Signed-off-by: default avatarVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: default avatarNils Carlson <nils.carlson@ericsson.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a4f612ce
  2. 04 Aug, 2009 1 commit
  3. 24 Aug, 2009 2 commits
  4. 18 Aug, 2009 1 commit
    • Renzo Davoli's avatar
      There are two useless lines in fs/char_dev.c. · 3796c12a
      Renzo Davoli authored
      In register_chrdev there is a loop to change all '/' into '!' in the
      kernel object name.
      This code is useless as the same substitution is in kobject_set_name_vargs in
      lib/kobject.c:
      228         /* ewww... some of these buggers have '/' in the name ... */
      229         while ((s = strchr(kobj->name, '/')))
      230                 s[0] = '!';
      
      kobject_set_name_vargs is called by kobject_set_name.
      kobject_set_name is called just above the useless loop.
      Signed-off-by: default avatarRenzo Davoli <renzo@cs.unibo.it>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3796c12a
  5. 21 Jul, 2009 1 commit
  6. 02 Jul, 2009 1 commit
  7. 20 Jul, 2009 1 commit
  8. 01 Jul, 2009 2 commits
  9. 28 Jul, 2009 1 commit
    • Oleg Nesterov's avatar
      sys_delete_module() can set MODULE_STATE_GOING after · dfedcf2f
      Oleg Nesterov authored
      search_binary_handler() does try_module_get().  In this case
      set_binfmt()->try_module_get() fails but since none of the callers
      check the returned error, the task will run with the wrong old
      ->binfmt.
      
      The proper fix should change all ->load_binary() methods, but we can
      rely on fact that the caller must hold a reference to binfmt->module
      and use __module_get() which never fails.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dfedcf2f
  10. 22 Jul, 2009 2 commits
    • Neil Horman's avatar
      Allow core_pattern pipes to wait for user space to complete · f2119854
      Neil Horman authored
      One of the things that user space processes like to do is look at metadata
      for a crashing process in their /proc/<pid> directory.  this is racy
      however, since do_coredump in the kernel doesn't wait for the user space
      process to complete before it reaps the crashing process.  This patch
      corrects that.  Allowing the kernel to wait for the user space process to
      complete before cleaning up the crashing process.  This is a bit tricky to
      do for a few reasons:
      
      1) The user space process isn't our child, so we can't sys_wait4 on it
      2) We need to close the pipe before waiting for the user process to complete,
      since the user process may rely on an EOF condition
      
      I've discussed several solutions with Oleg Nesterov off-list about this,
      and this is the one we've come up with.  We add ourselves as a pipe reader
      (to prevent premature cleanup of the pipe_inode_info), and remove
      ourselves as a writer (to provide an EOF condition to the writer in user
      space), then we iterate until the user space process exits (which we
      detect by pipe->readers == 1, hence the > 1 check in the loop).  When we
      exit the loop, we restore the proper reader/writer values, then we return
      and let filp_close in do_coredump clean up the pipe data properly.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarEarl Chew <earl_chew@agilent.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f2119854
    • Andrew Morton's avatar
      ERROR: code indent should use tabs where possible · 38d11701
      Andrew Morton authored
      #115: FILE: fs/exec.c:1838:
      + ^I^Iif (call_usermodehelper_pipe(helper_argv[0], helper_argv, NULL,$
      
      ERROR: code indent should use tabs where possible
      #120: FILE: fs/exec.c:1842:
      + ^I^I^Igoto fail_dropcount;$
      
      WARNING: externs should be avoided in .c files
      #149: FILE: kernel/sysctl.c:80:
      +extern unsigned int core_pipe_limit;
      
      total: 2 errors, 1 warnings, 120 lines checked
      
      ./patches/exec-let-do_coredump-limit-the-number-of-concurrent-dumps-to-pipes-v9.patch has style problems, please review.  If any of these errors
      are false positives report them to the maintainer, see
      CHECKPATCH in MAINTAINERS.
      
      Please run checkpatch prior to sending patches
      
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      38d11701
  11. 24 Aug, 2009 1 commit
    • Neil Horman's avatar
      Introduce core pipe limiting sysctl. · 395c70c4
      Neil Horman authored
      Since we can dump cores to pipe, rather than directly to the filesystem,
      we create a condition in which a user can create a very high load on the
      system simply by running bad applications.
      
      If the pipe reader specified in core_pattern is poorly written, we can
      have lots of ourstandig resources and processes in the system.
      
      This sysctl introduces an ability to limit that resource consumption. 
      core_pipe_limit defines how many in-flight dumps may be run in parallel,
      dumps beyond this value are skipped and a note is made in the kernel log. 
      A special value of 0 in core_pipe_limit denotes unlimited core dumps may
      be handled (this is the default value).
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarEarl Chew <earl_chew@agilent.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      395c70c4
  12. 22 Jul, 2009 2 commits
    • Andrew Morton's avatar
      WARNING: suspect code indent for conditional statements (16, 25) · 661d9067
      Andrew Morton authored
      #48: FILE: fs/exec.c:1796:
      +		if (core_limit == 0) {
      +			 /*
      
      WARNING: line over 80 characters
      #57: FILE: fs/exec.c:1805:
      +			  * but it runs as root, and can do lots of stupid things
      
      WARNING: line over 80 characters
      #58: FILE: fs/exec.c:1806:
      +			  * Note that we use task_tgid_vnr here to grab the pid of the
      
      WARNING: line over 80 characters
      #59: FILE: fs/exec.c:1807:
      +			  * process group leader.  That way we get the right pid if a thread
      
      total: 0 errors, 4 warnings, 59 lines checked
      
      ./patches/exec-make-do_coredump-more-resilient-to-recursive-crashes-v9.patch has style problems, please review.  If any of these errors
      are false positives report them to the maintainer, see
      CHECKPATCH in MAINTAINERS.
      
      Please run checkpatch prior to sending patches
      
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      661d9067
    • Neil Horman's avatar
      Change how we detect recursive dumps. · 0f7c9b6f
      Neil Horman authored
      Currently we have a mechanism by which we try to compare pathnames of the
      crashing process to the core_pattern path.  This is broken for a dozen
      reasons, and just doesn't work in any sort of robust way.
      
      I'm replacing it with the use of a 0 RLIMIT_CORE value.  Since helper apps
      set RLIMIT_CORE to zero, we don't write out core files for any process
      with that particular limit set.  It the core_pattern is a pipe, any
      non-zero limit is translated to RLIM_INFINITY.
      
      This allows complete dumps to be captured, but prevents infinite recursion
      in the event that the core_pattern process itself crashes.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarEarl Chew <earl_chew@agilent.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0f7c9b6f
  13. 24 Aug, 2009 5 commits
  14. 17 Aug, 2009 1 commit
    • Peter Zijlstra's avatar
      In order to direct the SIGIO signal to a particular thread of a · a32b24d3
      Peter Zijlstra authored
      multi-threaded application we cannot, like suggested by the manpage, put a
      TID into the regular fcntl(F_SETOWN) call.  It will still be send to the
      whole process of which that thread is part.
      
      Since people do want to properly direct SIGIO we introduce F_SETOWN_EX.
      
      The need to direct SIGIO comes from self-monitoring profiling such as with
      perf-counters.  Perf-counters uses SIGIO to notify that new sample data is
      available.  If the signal is delivered to the same task that generated the
      new sample it can augment that data by inspecting the task's user-space
      state right after it returns from the kernel.  This is esp.  convenient
      for interpreted or virtual machine driven environments.
      
      Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex
      as argument:
      
      struct f_owner_ex {
      	int   type;
      	pid_t pid;
      };
      
      Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Tested-by: default avatarstephane eranian <eranian@googlemail.com>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a32b24d3
  15. 04 Aug, 2009 2 commits
  16. 24 Aug, 2009 2 commits
  17. 13 Jul, 2009 1 commit
  18. 24 Aug, 2009 1 commit
    • Oleg Nesterov's avatar
      Thanks to Roland who pointed out de_thread() issues. · 91ece556
      Oleg Nesterov authored
      Currently we add sub-threads to ->real_parent->children list.  This buys
      nothing but slows down do_wait().
      
      With this patch ->children contains only main threads (group leaders). 
      The only complication is that forget_original_parent() should iterate over
      sub-threads by hand, and de_thread() needs another list_replace() when it
      changes ->group_leader.
      
      Henceforth do_wait_thread() can never see task_detached() && !EXIT_DEAD
      tasks, we can remove this check (and we can unify do_wait_thread() and
      ptrace_do_wait()).
      
      This change can confuse the optimistic search in mm_update_next_owner(),
      but this is fixable and minor.
      
      Perhaps badness() and oom_kill_process() should be updated, but they
      should be fixed in any case.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ratan Nalumasu <rnalumasu@gmail.com>
      Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      91ece556
  19. 02 Sep, 2009 2 commits
  20. 13 Jul, 2009 1 commit
    • Oleg Nesterov's avatar
      Suggested by Roland. · 8652f104
      Oleg Nesterov authored
      do_wait(__WNOTHREAD) can only succeed if the caller is either ptracer, or
      it is ->real_parent and the child is not traced. IOW, caller == p->parent
      otherwise we should not wake up.
      
      Change child_wait_callback() to check this. Ratan reports the workload with
      CPU load >99% caused by unnecessary wakeups, should be fixed by this patch.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarRoland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ratan Nalumasu <rnalumasu@gmail.com>
      Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8652f104
  21. 24 Aug, 2009 1 commit
  22. 13 Jul, 2009 2 commits
  23. 24 Aug, 2009 1 commit
    • Oleg Nesterov's avatar
      The bug is old, it wasn't cause by recent changes. · 9d6364b1
      Oleg Nesterov authored
      Test case:
      
      	static void *tfunc(void *arg)
      	{
      		int pid = (long)arg;
      
      		assert(ptrace(PTRACE_ATTACH, pid, NULL, NULL) == 0);
      		kill(pid, SIGKILL);
      
      		sleep(1);
      		return NULL;
      	}
      
      	int main(void)
      	{
      		pthread_t th;
      		long pid = fork();
      
      		if (!pid)
      			pause();
      
      		signal(SIGCHLD, SIG_IGN);
      		assert(pthread_create(&th, NULL, tfunc, (void*)pid) == 0);
      
      		int r = waitpid(-1, NULL, __WNOTHREAD);
      		printf("waitpid: %d %m\n", r);
      
      		return 0;
      	}
      
      Before the patch this program hangs, after this patch waitpid() correctly
      fails with errno == -ECHILD.
      
      The problem is, __ptrace_detach() reaps the EXIT_ZOMBIE tracee if its
      ->real_parent is our sub-thread and we ignore SIGCHLD.  But in this case
      we should wake up other threads which can sleep in do_wait().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9d6364b1
  24. 04 Sep, 2009 1 commit
  25. 24 Aug, 2009 1 commit
  26. 14 Aug, 2009 1 commit
  27. 11 Aug, 2009 1 commit
    • Andrew Morton's avatar
      ERROR: spaces required around that '?' (ctx:VxW) · 3e6e8789
      Andrew Morton authored
      #50: FILE: mm/memcontrol.c:485:
      +	int val = (charge)? 1 : -1;
       	                  ^
      
      total: 1 errors, 0 warnings, 171 lines checked
      
      ./patches/memcg-improve-resource-counter-scalability.patch has style problems, please review.  If any of these errors
      are false positives report them to the maintainer, see
      CHECKPATCH in MAINTAINERS.
      
      Please run checkpatch prior to sending patches
      
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3e6e8789
  28. 14 Aug, 2009 1 commit
    • Balbir Singh's avatar
      Reduce the resource counter overhead (mostly spinlock) associated with the · 2ae3312b
      Balbir Singh authored
      root cgroup.  This is a part of the several patches to reduce mem cgroup
      overhead.  I had posted other approaches earlier (including using percpu
      counters).  Those patches will be a natural addition and will be added
      iteratively on top of these.
      
      The patch stops resource counter accounting for the root cgroup.  The data
      for display is derived from the statisitcs we maintain via
      mem_cgroup_charge_statistics (which is more scalable).  What happens today
      is that, we do double accounting, once using res_counter_charge() and once
      using memory_cgroup_charge_statistics().  For the root, since we don't
      implement limits any more, we don't need to track every charge via
      res_counter_charge() and check for limit being exceeded and reclaim.
      
      The main mem->res usage_in_bytes can be derived by summing the cache and
      rss usage data from memory statistics (MEM_CGROUP_STAT_RSS and
      MEM_CGROUP_STAT_CACHE).  However, for memsw->res usage_in_bytes, we need
      additional data about swapped out memory.  This patch adds a
      MEM_CGROUP_STAT_SWAPOUT and uses that along with MEM_CGROUP_STAT_RSS and
      MEM_CGROUP_STAT_CACHE to derive the memsw data.  This data is computed
      recursively when hierarchy is enabled.
      
      The tests results I see on a 24 way show that
      
      1. The lock contention disappears from /proc/lock_stats
      2. The results of the test are comparable to running with
         cgroup_disable=memory.
      
      Here is a sample of my program runs
      
      Without Patch
      
       Performance counter stats for '/home/balbir/parallel_pagefault':
      
       7192804.124144  task-clock-msecs         #     23.937 CPUs
               424691  context-switches         #      0.000 M/sec
                  267  CPU-migrations           #      0.000 M/sec
             28498113  page-faults              #      0.004 M/sec
        5826093739340  cycles                   #    809.989 M/sec
         408883496292  instructions             #      0.070 IPC
           7057079452  cache-references         #      0.981 M/sec
           3036086243  cache-misses             #      0.422 M/sec
      
        300.485365680  seconds time elapsed
      
      With cgroup_disable=memory
      
       Performance counter stats for '/home/balbir/parallel_pagefault':
      
       7182183.546587  task-clock-msecs         #     23.915 CPUs
               425458  context-switches         #      0.000 M/sec
                  203  CPU-migrations           #      0.000 M/sec
             92545093  page-faults              #      0.013 M/sec
        6034363609986  cycles                   #    840.185 M/sec
         437204346785  instructions             #      0.072 IPC
           6636073192  cache-references         #      0.924 M/sec
           2358117732  cache-misses             #      0.328 M/sec
      
        300.320905827  seconds time elapsed
      
      With this patch applied
      
       Performance counter stats for '/home/balbir/parallel_pagefault':
      
       7191619.223977  task-clock-msecs         #     23.955 CPUs
               422579  context-switches         #      0.000 M/sec
                   88  CPU-migrations           #      0.000 M/sec
             91946060  page-faults              #      0.013 M/sec
        5957054385619  cycles                   #    828.333 M/sec
        1058117350365  instructions             #      0.178 IPC
           9161776218  cache-references         #      1.274 M/sec
           1920494280  cache-misses             #      0.267 M/sec
      
        300.218764862  seconds time elapsed
      
      Data from Prarit (kernel compile with make -j64 on a 64
      CPU/32G machine)
      
      For a single run
      
      Without patch
      
      real 27m8.988s
      user 87m24.916s
      sys 382m6.037s
      
      With patch
      
      real    4m18.607s
      user    84m58.943s
      sys     50m52.682s
      
      With config turned off
      
      real    4m54.972s
      user    90m13.456s
      sys     50m19.711s
      
      NOTE: The data looks counterintuitive due to the increased performance
      with the patch, even over the config being turned off. We probably need
      more runs, but so far all testing has shown that the patches definitely
      help.
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2ae3312b