• Jason Wessel's avatar
    softlockup: add sched_clock_tick() to avoid kernel warning on kgdb resume · b460e199
    Jason Wessel authored
    When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set sched_clock() gets the
    time from hardware, such as from TSC.  In this configuration kgdb will
    report a softlock warning messages on resuming or detaching from a
    debug session.
    
    Sequence of events in the problem case:
    
    1) "cpu sched clock" and "hardware time" are at 100 sec prior
       to a call to kgdb_handle_exception()
    
    2) Debugger waits in kgdb_handle_exception() for 80 sec and on exit
       the following is called ...  touch_softlockup_watchdog() -->
       __raw_get_cpu_var(touch_timestamp) = 0;
    
    3) "cpu sched clock" = 100s (it was not updated, because the interrupt
       was disabled in kgdb) but the "hardware time" = 180 sec
    
    4) The first timer interrupt after resuming from kgdb_handle_exception
       updates the watchdog from the "cpu sched clock"
    
    update_process_times() { ...  run_local_timers() --> softlockup_tick()
    --> check (touch_timestamp == 0) (it is "YES" here, we have set
    "touch_timestamp = 0" at kgdb) --> __touch_softlockup_watchdog()
    ***(A)--> reset "touch_timestamp" to "get_timestamp()" (Here, the
    "touch_timestamp" will still be set to 100s.)  ...
    
        scheduler_tick() ***(B)--> sched_clock_tick() (update "cpu sched
        clock" to "hardware time" = 180s) ...  }
    
    5) The Second timer interrupt handler appears to have a large jump and
       trips the softlockup warning.
    
    update_process_times() { ...  run_local_timers() --> softlockup_tick()
    --> "cpu sched clock" - "touch_timestamp" = 180s-100s > 60s --> printk
    "soft lockup error messages" ...  }
    
    note: ***(A) reset "touch_timestamp" to "get_timestamp(this_cpu)"
    
    Why "touch_timestamp" is 100 sec, instead of 180 sec?
    
    With the CONFIG_HAVE_UNSTABLE_SCHED_CLOCK" set the call trace of
    get_timestamp() is:
    
    get_timestamp(this_cpu) -->cpu_clock(this_cpu)
    -->sched_clock_cpu(this_cpu) -->__update_sched_clock(sched_clock_data,
    now)
    
    The __update_sched_clock() function uses the GTOD tick value to create
    a window to normalize the "now" values.  So if "now" values is too big
    for sched_clock_data, it will be ignored.
    
    The fix is to invoke sched_clock_tick() to update "cpu sched clock" in
    order to recover from this state.  This is done by introducing the
    function touch_softlockup_watchdog_sync(), which allows kgdb to
    request that the sched clock is updated when the watchdog thread runs
    the first time after a resume from kgdb.
    Signed-off-by: default avatarJason Wessel <jason.wessel@windriver.com>
    Signed-off-by: default avatarDongdong Deng <Dongdong.Deng@windriver.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: peterz@infradead.org
    b460e199
kgdb.c 38.7 KB