• Anton Vorontsov's avatar
    gianfar: Fix soft lockup with multi-interrupt TSECs · a6d0b91a
    Anton Vorontsov authored
    This patch fixes following bug:
    
    BUG: soft lockup - CPU#0 stuck for 61s! [S03mountvirtfs-:922]
    Modules linked in:
    NIP: c006505c LR: c00675f0 CTR: c0020438
    REGS: c7a1db90 TRAP: 0901   Not tainted  (2.6.28-rc8-01311-g8c7396ae)
    MSR: 00009032 <EE,ME,IR,DR>  CR: 28248442  XER: 20000000
    TASK = c7a288a0[922] 'S03mountvirtfs-' THREAD: c7a1c000
    GPR00: 00009032 c7a1dc40 c7a288a0 00000024 c79a1840 00000000 00000300 00000020
    GPR08: c035f97c 00000000 00004008 c04d5210 00000000
    NIP [c006505c] handle_IRQ_event+0x34/0xb0
    LR [c00675f0] handle_level_irq+0xa8/0x144
    Call Trace:
    [c7a1dc40] [c00204d8] ipic_mask_irq+0xa0/0xb4 (unreliable)
    [c7a1dc60] [c00675f0] handle_level_irq+0xa8/0x144
    [c7a1dc80] [c00067f8] do_IRQ+0x78/0x108
    [c7a1dc90] [c0014d7c] ret_from_except+0x0/0x14
    --- Exception: 501 at gfar_schedule_cleanup+0x54/0x7c
        LR = gfar_transmit+0x14/0x28
    [c7a1dd50] [c0352a3c] _spin_unlock_irqrestore+0x18/0x30 (unreliable)
    [c7a1dd60] [c01f49a8] gfar_transmit+0x14/0x28
    [c7a1dd70] [c0065084] handle_IRQ_event+0x5c/0xb0
    [c7a1dd90] [c00675f0] handle_level_irq+0xa8/0x144
    [c7a1ddb0] [c00067f8] do_IRQ+0x78/0x108
    [c7a1ddc0] [c0014d7c] ret_from_except+0x0/0x14
    --- Exception: 501 at up_read+0x10/0x48
        LR = do_page_fault+0x2b0/0x3e0
    [c7a1de80] [c7a177e8] 0xc7a177e8 (unreliable)
    [c7a1de90] [c0017964] do_page_fault+0x2b0/0x3e0
    [c7a1df40] [c0014b14] handle_page_fault+0xc/0x80
    --- Exception: 301 at 0xfe98b7c
        LR = 0xfe989c0
    Instruction dump:
    7c0802a6 bf810010 7c9f2378 7c7c1b78 90010024 80040004 70090020 40820010
    7c0000a6 60008000 7c000124 3bc00000 <3ba00000> 48000010 83ff0014 2f9f0000
    
    
    The bug introduced by commit 8c7396ae
    ("gianfar: Merge Tx and Rx interrupt for scheduling clean up ring").
    
    The commit merged TX and RX interrupt code into a single routine that
    schedules NAPI, but no locks were introduced. This causes irq races, so
    when irqs are enabled and netif_rx_schedule_prep() returns 0, nobody
    disable the interrupts again. This leads to interrupt storm and finally
    to the lockup.
    Signed-off-by: default avatarAnton Vorontsov <avorontsov@ru.mvista.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    a6d0b91a
gianfar.c 60.1 KB