1. 23 May, 2009 1 commit
    • Peter Zijlstra's avatar
      perf_counter: Fix dynamic irq_period logging · e220d2dc
      Peter Zijlstra authored
      We call perf_adjust_freq() from perf_counter_task_tick() which
      is is called under the rq->lock causing lock recursion.
      However, it's no longer required to be called under the
      rq->lock, so remove it from under it.
      
      Also, fix up some related comments.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090523163012.476197912@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e220d2dc
  2. 22 May, 2009 4 commits
    • Ingo Molnar's avatar
      perf_counter tools: increase limits · c6eb1384
      Ingo Molnar authored
      I tried to run with 300 active counters and the tools bailed out
      because our limit was at 64. So increase the counter limit to 1024
      and the CPU limit to 4096.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c6eb1384
    • Ingo Molnar's avatar
      perf_counter: fix !PERF_COUNTERS build failure · 910431c7
      Ingo Molnar authored
      Update the !CONFIG_PERF_COUNTERS prototype too, for
      perf_counter_task_sched_out().
      
      [ Impact: build fix ]
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18966.10666.517218.332164@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      910431c7
    • Paul Mackerras's avatar
      perf_counter: Optimize context switch between identical inherited contexts · 564c2b21
      Paul Mackerras authored
      When monitoring a process and its descendants with a set of inherited
      counters, we can often get the situation in a context switch where
      both the old (outgoing) and new (incoming) process have the same set
      of counters, and their values are ultimately going to be added together.
      In that situation it doesn't matter which set of counters are used to
      count the activity for the new process, so there is really no need to
      go through the process of reading the hardware counters and updating
      the old task's counters and then setting up the PMU for the new task.
      
      This optimizes the context switch in this situation.  Instead of
      scheduling out the perf_counter_context for the old task and
      scheduling in the new context, we simply transfer the old context
      to the new task and keep using it without interruption.  The new
      context gets transferred to the old task.  This means that both
      tasks still have a valid perf_counter_context, so no special case
      is introduced when the old task gets scheduled in again, either on
      this CPU or another CPU.
      
      The equivalence of contexts is detected by keeping a pointer in
      each cloned context pointing to the context it was cloned from.
      To cope with the situation where a context is changed by adding
      or removing counters after it has been cloned, we also keep a
      generation number on each context which is incremented every time
      a context is changed.  When a context is cloned we take a copy
      of the parent's generation number, and two cloned contexts are
      equivalent only if they have the same parent and the same
      generation number.  In order that the parent context pointer
      remains valid (and is not reused), we increment the parent
      context's reference count for each context cloned from it.
      
      Since we don't have individual fds for the counters in a cloned
      context, the only thing that can make two clones of a given parent
      different after they have been cloned is enabling or disabling all
      counters with prctl.  To account for this, we keep a count of the
      number of enabled counters in each context.  Two contexts must have
      the same number of enabled counters to be considered equivalent.
      
      Here are some measurements of the context switch time as measured with
      the lat_ctx benchmark from lmbench, comparing the times obtained with
      and without this patch series:
      
      		-----Unmodified-----		With this patch series
      Counters:	none	2 HW	4H+4S	none	2 HW	4H+4S
      
      2 processes:
      Average		3.44	6.45	11.24	3.12	3.39	3.60
      St dev		0.04	0.04	0.13	0.05	0.17	0.19
      
      8 processes:
      Average		6.45	8.79	14.00	5.57	6.23	7.57
      St dev		1.27	1.04	0.88	1.42	1.46	1.42
      
      32 processes:
      Average		5.56	8.43	13.78	5.28	5.55	7.15
      St dev		0.41	0.47	0.53	0.54	0.57	0.81
      
      The numbers are the mean and standard deviation of 20 runs of
      lat_ctx.  The "none" columns are lat_ctx run directly without any
      counters.  The "2 HW" columns are with lat_ctx run under perfstat,
      counting cycles and instructions.  The "4H+4S" columns are lat_ctx run
      under perfstat with 4 hardware counters and 4 software counters
      (cycles, instructions, cache references, cache misses, task
      clock, context switch, cpu migrations, and page faults).
      
      [ Impact: performance optimization of counter context-switches ]
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18966.10666.517218.332164@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      564c2b21
    • Paul Mackerras's avatar
      perf_counter: Dynamically allocate tasks' perf_counter_context struct · a63eaf34
      Paul Mackerras authored
      This replaces the struct perf_counter_context in the task_struct with
      a pointer to a dynamically allocated perf_counter_context struct.  The
      main reason for doing is this is to allow us to transfer a
      perf_counter_context from one task to another when we do lazy PMU
      switching in a later patch.
      
      This has a few side-benefits: the task_struct becomes a little smaller,
      we save some memory because only tasks that have perf_counters attached
      get a perf_counter_context allocated for them, and we can remove the
      inclusion of <linux/perf_counter.h> in sched.h, meaning that we don't
      end up recompiling nearly everything whenever perf_counter.h changes.
      
      The perf_counter_context structures are reference-counted and freed
      when the last reference is dropped.  A context can have references
      from its task and the counters on its task.  Counters can outlive the
      task so it is possible that a context will be freed well after its
      task has exited.
      
      Contexts are allocated on fork if the parent had a context, or
      otherwise the first time that a per-task counter is created on a task.
      In the latter case, we set the context pointer in the task struct
      locklessly using an atomic compare-and-exchange operation in case we
      raced with some other task in creating a context for the subject task.
      
      This also removes the task pointer from the perf_counter struct.  The
      task pointer was not used anywhere and would make it harder to move a
      context from one task to another.  Anything that needed to know which
      task a counter was attached to was already using counter->ctx->task.
      
      The __perf_counter_init_context function moves up in perf_counter.c
      so that it can be called from find_get_context, and now initializes
      the refcount, but is otherwise unchanged.
      
      We were potentially calling list_del_counter twice: once from
      __perf_counter_exit_task when the task exits and once from
      __perf_counter_remove_from_context when the counter's fd gets closed.
      This adds a check in list_del_counter so it doesn't do anything if
      the counter has already been removed from the lists.
      
      Since perf_counter_task_sched_in doesn't do anything if the task doesn't
      have a context, and leaves cpuctx->task_ctx = NULL, this adds code to
      __perf_install_in_context to set cpuctx->task_ctx if necessary, i.e. in
      the case where the current task adds the first counter to itself and
      thus creates a context for itself.
      
      This also adds similar code to __perf_counter_enable to handle a
      similar situation which can arise when the counters have been disabled
      using prctl; that also leaves cpuctx->task_ctx = NULL.
      
      [ Impact: refactor counter context management to prepare for new feature ]
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18966.10075.781053.231153@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a63eaf34
  3. 20 May, 2009 5 commits
    • Ingo Molnar's avatar
      perf_counter: Fix context removal deadlock · 34adc806
      Ingo Molnar authored
      Disable the PMU globally before removing a counter from a
      context. This fixes the following lockup:
      
      [22081.741922] ------------[ cut here ]------------
      [22081.746668] WARNING: at arch/x86/kernel/cpu/perf_counter.c:803 intel_pmu_handle_irq+0x9b/0x24e()
      [22081.755624] Hardware name: X8DTN
      [22081.758903] perfcounters: irq loop stuck!
      [22081.762985] Modules linked in:
      [22081.766136] Pid: 11082, comm: perf Not tainted 2.6.30-rc6-tip #226
      [22081.772432] Call Trace:
      [22081.774940]  <NMI>  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
      [22081.781993]  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
      [22081.788368]  [<ffffffff8104505c>] ? warn_slowpath_common+0x77/0xa3
      [22081.794649]  [<ffffffff810450d3>] ? warn_slowpath_fmt+0x40/0x45
      [22081.800696]  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
      [22081.807080]  [<ffffffff814d1a72>] ? perf_counter_nmi_handler+0x3f/0x4a
      [22081.813751]  [<ffffffff814d2d09>] ? notifier_call_chain+0x58/0x86
      [22081.819951]  [<ffffffff8105b250>] ? notify_die+0x2d/0x32
      [22081.825392]  [<ffffffff814d1414>] ? do_nmi+0x8e/0x242
      [22081.830538]  [<ffffffff814d0f0a>] ? nmi+0x1a/0x20
      [22081.835342]  [<ffffffff8117e102>] ? selinux_file_free_security+0x0/0x1a
      [22081.842105]  [<ffffffff81018793>] ? x86_pmu_disable_counter+0x15/0x41
      [22081.848673]  <<EOE>>  [<ffffffff81018f3d>] ? x86_pmu_disable+0x86/0x103
      [22081.855512]  [<ffffffff8108fedd>] ? __perf_counter_remove_from_context+0x0/0xfe
      [22081.862926]  [<ffffffff8108fcbc>] ? counter_sched_out+0x30/0xce
      [22081.868909]  [<ffffffff8108ff36>] ? __perf_counter_remove_from_context+0x59/0xfe
      [22081.876382]  [<ffffffff8106808a>] ? smp_call_function_single+0x6c/0xe6
      [22081.882955]  [<ffffffff81091b96>] ? perf_release+0x86/0x14c
      [22081.888600]  [<ffffffff810c4c84>] ? __fput+0xe7/0x195
      [22081.893718]  [<ffffffff810c213e>] ? filp_close+0x5b/0x62
      [22081.899107]  [<ffffffff81046a70>] ? put_files_struct+0x64/0xc2
      [22081.905031]  [<ffffffff8104841a>] ? do_exit+0x1e2/0x6ef
      [22081.910360]  [<ffffffff814d0a60>] ? _spin_lock_irqsave+0x9/0xe
      [22081.916292]  [<ffffffff8104898e>] ? do_group_exit+0x67/0x93
      [22081.921953]  [<ffffffff810489cc>] ? sys_exit_group+0x12/0x16
      [22081.927759]  [<ffffffff8100baab>] ? system_call_fastpath+0x16/0x1b
      [22081.934076] ---[ end trace 3a3936ce3e1b4505 ]---
      
      And could potentially also fix the lockup reported by Marcelo Tosatti.
      
      Also, print more debug info in case of a detected lockup.
      
      [ Impact: fix lockup ]
      Reported-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      34adc806
    • Peter Zijlstra's avatar
      perf_counter: Optimize sched in/out of counters · afedadf2
      Peter Zijlstra authored
      Avoid a function call for !group counters by directly calling the counter
      function.
      
      [ Impact: micro-optimize the code ]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090520102553.511933670@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      afedadf2
    • Peter Zijlstra's avatar
      perf_counter: Optimize disable of time based sw counters · b986d7ec
      Peter Zijlstra authored
      Currently we call hrtimer_cancel() unconditionally on disable of time based
      software counters. Avoid when possible.
      
      [ Impact: micro-optimize the code ]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090520102553.388185031@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b986d7ec
    • Peter Zijlstra's avatar
      perf_counter: Log irq_period changes · 26b119bc
      Peter Zijlstra authored
      For the dynamic irq_period code, log whenever we change the period so that
      analyzing code can normalize the event flow.
      
      [ Impact: add new feature to allow more precise profiling ]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090520102553.298769743@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      26b119bc
    • Peter Zijlstra's avatar
      perf_counter: Solve the rotate_ctx vs inherit race differently · d7b629a3
      Peter Zijlstra authored
      Instead of disabling RR scheduling of the counters, use a different list
      that does not get rotated to iterate the counters on inheritance.
      
      [ Impact: cleanup, optimization ]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090520102553.237504544@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d7b629a3
  4. 19 May, 2009 2 commits
  5. 18 May, 2009 3 commits
    • Ingo Molnar's avatar
      perf_counter, x86: speed up the scheduling fast-path · b68f1d2e
      Ingo Molnar authored
      We have to set up the LVT entry only at counter init time, not at
      every switch-in time.
      
      There's friction between NMI and non-NMI use here - we'll probably
      remove the per counter configurability of it - but until then, dont
      slow down things ...
      
      [ Impact: micro-optimization ]
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b68f1d2e
    • Paul Mackerras's avatar
      perf_counter: powerpc: initialize cpuhw pointer before use · c0daaf3f
      Paul Mackerras authored
      Commit 9e35ad38 ("perf_counter: Rework the perf counter
      disable/enable") added code to the powerpc hw_perf_enable (renamed
      from hw_perf_restore) to test cpuhw->disabled and return immediately
      if it is not set (i.e. if the PMU is already enabled).
      
      Unfortunately the test got added before cpuhw was initialized,
      resulting in an oops the first time hw_perf_enable got called.
      This fixes it by moving the initialization of cpuhw to before
      cpuhw->disabled is tested.
      
      [ Impact: fix oops-causing bug on powerpc ]
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <18960.56772.869734.304631@drongo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c0daaf3f
    • Ingo Molnar's avatar
      Merge commit 'v2.6.30-rc6' into perfcounters/core · dc3f81b1
      Ingo Molnar authored
      Merge reason: this branch was on an -rc4 base, merge it up to -rc6
                    to get the latest upstream fixes.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      dc3f81b1
  6. 17 May, 2009 4 commits
    • Ingo Molnar's avatar
      perf_counter, x86: fix zero irq_period counters · d2517a49
      Ingo Molnar authored
      The quirk to irq_period unearthed an unrobustness we had in the
      hw_counter initialization sequence: we left irq_period at 0, which
      was then quirked up to 2 ... which then generated a _lot_ of
      interrupts during 'perf stat' runs, slowed them down and skewed
      the counter results in general.
      
      Initialize irq_period to the maximum instead.
      
      [ Impact: fix perf stat results ]
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d2517a49
    • Ingo Molnar's avatar
      perf_counter: fix threaded task exit · 0203026b
      Ingo Molnar authored
      Flushing counters in __exit_signal() with irqs disabled is not
      a good idea as perf_counter_exit_task() acquires mutexes. So
      flush it before acquiring the tasklist lock.
      
      (Note, we still need a fix for when the PID has been unhashed.)
      
      [ Impact: fix crash with inherited counters ]
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0203026b
    • Peter Zijlstra's avatar
      perf_counter: Fix counter inheritance · 856d56b9
      Peter Zijlstra authored
      Srivatsa Vaddagiri reported that a Java workload triggers this
      warning in kernel/exit.c:
      
         WARN_ON_ONCE(!list_empty(&tsk->perf_counter_ctx.counter_list));
      
      Add the inherited counter propagation on self-detach, this could
      cause counter leaks and incomplete stats in threaded code like
      the below:
      
        #include <pthread.h>
        #include <unistd.h>
      
        void *thread(void *arg)
        {
                sleep(5);
                return NULL;
        }
      
        void main(void)
        {
                pthread_t thr;
                pthread_create(&thr, NULL, thread, NULL);
        }
      Reported-by: default avatarSrivatsa Vaddagiri <vatsa@in.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      856d56b9
    • Peter Zijlstra's avatar
      perf_counter: Fix inheritance cleanup code · 8bc20959
      Peter Zijlstra authored
      Clean up code that open-coded the list_{add,del}_counter() code in
      __perf_counter_exit_task() which consequently diverged. This could
      lead to software counter crashes.
      
      Also, fold the ctx->nr_counter inc/dec into those functions and clean
      up some of the related code.
      
      [ Impact: fix potential sw counter crash, cleanup ]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8bc20959
  7. 16 May, 2009 1 commit
  8. 15 May, 2009 20 commits