• Yossi Etigin's avatar
    IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop() · e8224e4b
    Yossi Etigin authored
    Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
    ipoib_stop().  We avoid it by scheduling the piece of code that takes
    the lock on ipoib_workqueue instead of executing it directly.  This
    works because we only flush the ipoib_workqueue with the RTNL not held.
    
    The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
    which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
    which calls ipoib_mcast_leave(). The latter calls
    ib_sa_free_multicast(), and this waits until the multicast completion
    handler finishes.  This handler is ipoib_mcast_join_complete(), which
    waits for the rtnl_lock(), which was already taken by ipoib_stop().
    
    This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
    RTNL in ipoib_stop()").
    Signed-off-by: default avatarYossi Etigin <yosefe@voltaire.com>
    Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
    e8224e4b
ipoib.h 20.2 KB