1. 09 Nov, 2007 17 commits
    • Peter Zijlstra's avatar
      sched: avoid large irq-latencies in smp-balancing · b82d9fdd
      Peter Zijlstra authored
      SMP balancing is done with IRQs disabled and can iterate the full rq.
      When rqs are large this can cause large irq-latencies. Limit the nr of
      iterations on each run.
      
      This fixes a scheduling latency regression reported by the -rt folks.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Tested-by: default avatarGregory Haskins <ghaskins@novell.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b82d9fdd
    • Srivatsa Vaddagiri's avatar
      sched: fix copy_namespace() <-> sched_fork() dependency in do_fork · 3c90e6e9
      Srivatsa Vaddagiri authored
      Sukadev Bhattiprolu reported a kernel crash with control groups.
      There are couple of problems discovered by Suka's test:
      
      - The test requires the cgroup filesystem to be mounted with
        atleast the cpu and ns options (i.e both namespace and cpu 
        controllers are active in the same hierarchy). 
      
      	# mkdir /dev/cpuctl
      	# mount -t cgroup -ocpu,ns none cpuctl
      	(or simply)
      	# mount -t cgroup none cpuctl -> Will activate all controllers
      					 in same hierarchy.
      
      - The test invokes clone() with CLONE_NEWNS set. This causes a a new child
        to be created, also a new group (do_fork->copy_namespaces->ns_cgroup_clone->
        cgroup_clone) and the child is attached to the new group (cgroup_clone->
        attach_task->sched_move_task). At this point in time, the child's scheduler 
        related fields are uninitialized (including its on_rq field, which it has
        inherited from parent). As a result sched_move_task thinks its on
        runqueue, when it isn't.
      
        As a solution to this problem, I moved sched_fork() call, which
        initializes scheduler related fields on a new task, before
        copy_namespaces(). I am not sure though whether moving up will
        cause other side-effects. Do you see any issue?
      
      - The second problem exposed by this test is that task_new_fair()
        assumes that parent and child will be part of the same group (which 
        needn't be as this test shows). As a result, cfs_rq->curr can be NULL
        for the child.
      
        The solution is to test for curr pointer being NULL in
        task_new_fair().
      
      With the patch below, I could run ns_exec() fine w/o a crash.
      Reported-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3c90e6e9
    • Ingo Molnar's avatar
      sched: clean up the wakeup preempt check, #2 · 502d26b5
      Ingo Molnar authored
      clean up the preemption check to not use unnecessary 64-bit
      variables. This improves code size:
      
         text    data     bss     dec     hex filename
        44227    3326      36   47589    b9e5 sched.o.before
        44201    3326      36   47563    b9cb sched.o.after
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      502d26b5
    • Ingo Molnar's avatar
      sched: clean up the wakeup preempt check · 77d9cc44
      Ingo Molnar authored
      clean up the wakeup preemption check. No code changed:
      
         text    data     bss     dec     hex filename
        44227    3326      36   47589    b9e5 sched.o.before
        44227    3326      36   47589    b9e5 sched.o.after
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      77d9cc44
    • Ingo Molnar's avatar
      sched: wakeup preemption fix · 8bc6767a
      Ingo Molnar authored
      wakeup preemption fix: do not make it dependent on p->prio.
      Preemption purely depends on ->vruntime.
      
      This improves preemption in mixed-nice-level workloads.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8bc6767a
    • Ingo Molnar's avatar
      sched: remove PREEMPT_RESTRICT · 3e3e13f3
      Ingo Molnar authored
      remove PREEMPT_RESTRICT. (this is a separate commit so that any
      regression related to the removal itself is bisectable)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3e3e13f3
    • Ingo Molnar's avatar
      sched: turn off PREEMPT_RESTRICT · 52d3da1a
      Ingo Molnar authored
      PREEMPT_RESTRICT was a method aimed at reducing the amount of wakeup
      related preemption. It has a disadvantage though, it can prevent
      legitimate wakeups if a task is 'unlucky' to be hit too early by a tick
      that clears peer_preempt.
      
      Now that the wakeup preemption has been cleaned up we dont seem to have
      excessive preemptions anymore, so this feature can be turned off. (and
      removed in the next patch)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      52d3da1a
    • Ingo Molnar's avatar
      KVM: fix !SMP build error · a5fbb6d1
      Ingo Molnar authored
      fix a !SMP build error:
      
      drivers/kvm/kvm_main.c: In function 'kvm_flush_remote_tlbs':
      drivers/kvm/kvm_main.c:220: error: implicit declaration of function 'smp_call_function_mask'
      
      (and also avoid unused function warning related to up_smp_call_function()
      not making use of the 'func' parameter.)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a5fbb6d1
    • Ingo Molnar's avatar
      x86: make nmi_cpu_busy() always defined · 0492007e
      Ingo Molnar authored
      nmi_cpu_busy() must be available on !SMP too.
      
      this is in preparation to a smp_call_function_mask() fix.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0492007e
    • Ingo Molnar's avatar
      x86: make ipi_handler() always defined · 4e2947f1
      Ingo Molnar authored
      prepare for up_smp_call_function() to ensure that the 'func'
      pointer is unused. (which is related to a KVM build fix)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4e2947f1
    • Eric Dumazet's avatar
      sched: cleanup, use NSEC_PER_MSEC and NSEC_PER_SEC · d6322faf
      Eric Dumazet authored
      1) hardcoded 1000000000 value is used five times in places where
         NSEC_PER_SEC might be more readable.
      
      2) A conversion from nsec to msec uses the hardcoded 1000000 value,
         which is a candidate for NSEC_PER_MSEC.
      
      no code changed:
      
          text    data     bss     dec     hex filename
         44359    3326      36   47721    ba69 sched.o.before
         44359    3326      36   47721    ba69 sched.o.after
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d6322faf
    • Ingo Molnar's avatar
      sched: reintroduce SMP tunings again · 19978ca6
      Ingo Molnar authored
      Yanmin Zhang reported an aim7 regression and bisected it down to:
      
       |  commit 38ad464d
       |  Author: Ingo Molnar <mingo@elte.hu>
       |  Date:   Mon Oct 15 17:00:02 2007 +0200
       |
       |     sched: uniform tunings
       |
       |     use the same defaults on both UP and SMP.
      
      fix this by reintroducing similar SMP tunings again. This resolves
      the regression.
      
      (also update the comments to match the ilog2(nr_cpus) tuning effect)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      19978ca6
    • Paul Mackerras's avatar
      sched: restore deterministic CPU accounting on powerpc · fa13a5a1
      Paul Mackerras authored
      Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the
      deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been
      broken on powerpc, because we end up counting user time twice: once in
      timer_interrupt() and once in update_process_times().
      
      This fixes the problem by pulling the code in update_process_times
      that updates utime and stime into a separate function called
      account_process_tick.  If CONFIG_VIRT_CPU_ACCOUNTING is not defined,
      there is a version of account_process_tick in kernel/timer.c that
      simply accounts a whole tick to either utime or stime as before.  If
      CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to
      implement account_process_tick.
      
      This also lets us simplify the s390 code a bit; it means that the s390
      timer interrupt can now call update_process_times even when
      CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
      suitable account_process_tick().
      
      account_process_tick() now takes the task_struct * as an argument.
      Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      fa13a5a1
    • Balbir Singh's avatar
      sched: fix delay accounting regression · 9a41785c
      Balbir Singh authored
      Fix the delay accounting regression introduced by commit
      75d4ef16. rq no longer has sched_info
      data associated with it. task_struct sched_info structure is used by delay
      accounting to provide back statistics to user space.
      
      also remove direct use of sched_clock() (which is not a valid thing to
      do anymore) and use rq->clock instead.
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9a41785c
    • Peter Zijlstra's avatar
      sched: reintroduce the sched_min_granularity tunable · b2be5e96
      Peter Zijlstra authored
      we lost the sched_min_granularity tunable to a clever optimization
      that uses the sched_latency/min_granularity ratio - but the ratio
      is quite unintuitive to users and can also crash the kernel if the
      ratio is set to 0. So reintroduce the min_granularity tunable,
      while keeping the ratio maintained internally.
      
      no functionality changed.
      
      [ mingo@elte.hu: some fixlets. ]
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b2be5e96
    • Peter Zijlstra's avatar
      sched: documentation: place_entity() comments · 2cb8600e
      Peter Zijlstra authored
      Add a few comments to place_entity(). No code changed.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2cb8600e
    • Peter Zijlstra's avatar
      sched: fix vslice · 10b77724
      Peter Zijlstra authored
      vslice was missing a factor NICE_0_LOAD, as weight is in
      weight*NICE_0_LOAD units.
      
      the effect of this bug was larger initial slices and
      thus latency-noisier forks.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      10b77724
  2. 06 Nov, 2007 13 commits
  3. 05 Nov, 2007 10 commits