• Andi Kleen's avatar
    x86, mce, cmci: add CMCI support · 88ccbedd
    Andi Kleen authored
    Impact: Major new feature
    
    Intel CMCI (Corrected Machine Check Interrupt) is a new
    feature on Nehalem CPUs. It allows the CPU to trigger
    interrupts on corrected events, which allows faster
    reaction to them instead of with the traditional
    polling timer.
    
    Also use CMCI to discover shared banks. Machine check banks
    can be shared by CPU threads or even cores. Using the CMCI enable
    bit it is possible to detect the fact that another CPU already
    saw a specific bank. Use this to assign shared banks only
    to one CPU to avoid reporting duplicated events.
    
    On CPU hot unplug bank sharing is re discovered. This is done
    using a thread that cycles through all the CPUs.
    
    To avoid races between the poller and CMCI we only poll
    for banks that are not CMCI capable and only check CMCI
    owned banks on a interrupt.
    
    The shared banks ownership information is currently only used for
    CMCI interrupts, not polled banks.
    
    The sharing discovery code follows the algorithm recommended in the
    IA32 SDM Vol3a 14.5.2.1
    
    The CMCI interrupt handler just calls the machine check poller to
    pick up the machine check event that caused the interrupt.
    
    I decided not to implement a separate threshold event like
    the AMD version has, because the threshold is always one currently
    and adding another event didn't seem to add any value.
    
    Some code inspired by Yunhong Jiang's Xen implementation,
    which was in term inspired by a earlier CMCI implementation
    by me.
    Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
    Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
    88ccbedd
mce_64.c 27.1 KB