• Paul E. McKenney's avatar
    rcu: fix rcu_try_flip_waitack_needed() to prevent grace-period stall · d7c06513
    Paul E. McKenney authored
    The comment was correct -- need to make the code match the comment.
    Without this patch, if a CPU goes dynticks idle (and stays there forever)
    in just the right phase of preemptible-RCU grace-period processing,
    grace periods stall.  The offending sequence of events (courtesy
    of Promela/spin, at least after I got the liveness criterion coded
    correctly...) is as follows:
    
    o	CPU 0 is in dynticks-idle mode.  Its dynticks_progress_counter
    	is (say) 10.
    
    o	CPU 0 takes an interrupt, so rcu_irq_enter() increments CPU 0's
    	dynticks_progress_counter to 11.
    
    o	CPU 1 is doing RCU grace-period processing in rcu_try_flip_idle(),
    	sees rcu_pending(), so invokes dyntick_save_progress_counter(),
    	which in turn takes a snapshot of CPU 0's dynticks_progress_counter
    	into CPU 0's rcu_dyntick_snapshot -- now set to 11.  CPU 1 then
    	updates the RCU grace-period state to rcu_try_flip_waitack().
    
    o	CPU 0 returns from its interrupt, so rcu_irq_exit() increments
    	CPU 0's dynticks_progress_counter to 12.
    
    o	CPU 1 later invokes rcu_try_flip_waitack(), which notices that
    	CPU 0 has not yet responded, and hence in turn invokes
    	rcu_try_flip_waitack_needed().  This function examines the
    	state of CPU 0's dynticks_progress_counter and rcu_dyntick_snapshot
    	variables, which it copies to curr (== 12) and snap (== 11),
    	respectively.
    
    	Because curr!=snap, the first condition fails.
    
    	Because curr-snap is only 1 and snap is odd, the second
    	condition fails.
    
    	rcu_try_flip_waitack_needed() therefore incorrectly concludes
    	that it must wait for CPU 0 to explicitly acknowledge the
    	counter flip.
    
    o	CPU 0 remains forever in dynticks-idle mode, never taking
    	any more hardware interrupts or any NMIs, and never running
    	any more tasks.  (Of course, -something- will usually eventually
    	happen, which might be why we haven't seen this one in the
    	wild.  Still should be fixed!)
    
    Therefore the grace period never ends.  Fix is to make the code match
    the comment, as shown below.  With this fix, the above scenario
    would be satisfied with curr being even, and allow the grace period
    to proceed.
    Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Josh Triplett <josh@kernel.org>
    Cc: Dipankar Sarma <dipankar@in.ibm.com>
    Cc: <stable@kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    d7c06513
rcupreempt.c 40.9 KB