• Brandon Phiilps's avatar
    x86: Avoid race condition in pci_enable_msix() · ced5b697
    Brandon Phiilps authored
    Keep chip_data in create_irq_nr and destroy_irq.
    
    When two drivers are setting up MSI-X at the same time via
    pci_enable_msix() there is a race.  See this dmesg excerpt:
    
    [   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
    [   85.170611]   alloc irq_desc for 99 on node -1
    [   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
    [   85.170614]   alloc kstat_irqs on node -1
    [   85.170616] alloc irq_2_iommu on node -1
    [   85.170617]   alloc irq_desc for 100 on node -1
    [   85.170619]   alloc kstat_irqs on node -1
    [   85.170621] alloc irq_2_iommu on node -1
    [   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
    [   85.170626]   alloc irq_desc for 101 on node -1
    [   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
    [   85.170630]   alloc kstat_irqs on node -1
    [   85.170631] alloc irq_2_iommu on node -1
    [   85.170635]   alloc irq_desc for 102 on node -1
    [   85.170636]   alloc kstat_irqs on node -1
    [   85.170639] alloc irq_2_iommu on node -1
    [   85.170646] BUG: unable to handle kernel NULL pointer dereference
    at 0000000000000088
    
    As you can see igb and ixgbe are both alternating on create_irq_nr()
    via pci_enable_msix() in their probe function.
    
    ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
    choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
    calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
    NULL via dynamic_irq_init().
    
    igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
    via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
    
    	cfg_new = irq_desc_ptrs[102]->chip_data;
    	if (cfg_new->vector != 0)
    		continue;
    
    This hits the NULL deref.
    
    Another possible race exists via pci_disable_msix() in a driver or in
    the number of error paths that call free_msi_irqs():
    
    destroy_irq()
    dynamic_irq_cleanup() which sets desc->chip_data = NULL
    ...race window...
    desc->chip_data = cfg;
    
    Remove the save and restore code for cfg in create_irq_nr() and
    destroy_irq() and take the desc->lock when checking the irq_cfg.
    Reported-and-analyzed-by: default avatarBrandon Philips <bphilips@suse.de>
    Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
    LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org>
    Signed-off-by: default avatarBrandon Phililps <bphilips@suse.de>
    Cc: stable@kernel.org
    Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
    ced5b697
chip.c 17.9 KB