• Thomas Gleixner's avatar
    nohz: prevent tick stop outside of the idle loop · b8f8c3cf
    Thomas Gleixner authored
    Jack Ren and Eric Miao tracked down the following long standing
    problem in the NOHZ code:
    
    	scheduler switch to idle task
    	enable interrupts
    
    Window starts here
    
    	----> interrupt happens (does not set NEED_RESCHED)
    	      	irq_exit() stops the tick
    
    	----> interrupt happens (does set NEED_RESCHED)
    
    	return from schedule()
    	
    	cpu_idle(): preempt_disable();
    
    Window ends here
    
    The interrupts can happen at any point inside the race window. The
    first interrupt stops the tick, the second one causes the scheduler to
    rerun and switch away from idle again and we end up with the tick
    disabled.
    
    The fact that it needs two interrupts where the first one does not set
    NEED_RESCHED and the second one does made the bug obscure and extremly
    hard to reproduce and analyse. Kudos to Jack and Eric.
    
    Solution: Limit the NOHZ functionality to the idle loop to make sure
    that we can not run into such a situation ever again.
    
    cpu_idle()
    {
    	preempt_disable();
    
    	while(1) {
    		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
    		 			          are in the idle loop
    
    		 while (!need_resched())
    		       halt();
    
    		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
    		 preempt_enable_no_resched();
    		 schedule();
    		 preempt_disable();
    	}
    }
    
    In hindsight we should have done this forever, but ... 
    
    /me grabs a large brown paperbag.
    
    Debugged-by: Jack Ren <jack.ren@marvell.com>, 
    Debugged-by: default avatareric miao <eric.y.miao@gmail.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    b8f8c3cf
process_64.c 20.5 KB