• Moni Shoua's avatar
    IB/sa: Fail requests made while creating new SM AH · 164ba089
    Moni Shoua authored
    This patch solves a race that occurs after an event occurs that causes
    the SA query module to flush its SM address handle (AH).  When SM AH
    becomes invalid and needs an update it is handled by the global
    workqueue.  On the other hand this event is also handled in the IPoIB
    driver by queuing work in the ipoib_workqueue that does multicast
    joins.  Although queuing is in the right order, it is done to 2
    different workqueues and so there is no guarantee that the first to be
    queued is the first to be executed.
    
    This causes a problem because IPoIB may end up sending an request to
    the old SM, which will take a long time to time out (since the old SM
    is gone); this leads to a much longer than necessary interruption in
    multicast traffer.
    
    The patch sets the SA query module's SM AH to NULL when the event
    occurs, and until update_sm_ah() is done, any request that needs sm_ah
    fails with -EAGAIN return status.
    
    For consumers, the patch doesn't make things worse.  Before the patch,
    MADs are sent to the wrong SM so the request gets lost.  Consumers can
    be improved if they examine the return code and respond to EAGAIN
    properly but even without an improvement the situation is not getting
    worse.
    Signed-off-by: default avatarMoni Levy <monil@voltaire.com>
    Signed-off-by: default avatarMoni Shoua <monis@voltaire.com>
    Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
    164ba089
sa_query.c 28.7 KB