• Faisal Latif's avatar
    RDMA/nes: Fix nes_nic_cm_xmit() error handling · 5962c2c8
    Faisal Latif authored
    We are getting crash or hung situation when we are running network
    cable pull tests during RDMA traffic.
    
    In schedule_nes_timer(), we return an error if nes_nic_cm_xmit()
    returns failure.  This is changed to success as skb is being put on
    the timer routines to be processed later.  In send_syn() case, we are
    indicating connect failure once from nes_connect() and the other when
    the rexmit retries expires.
    
    The other issue is skb->users which we are incrementing before calling
    nes_nic_cm_xmit() which calls dev_queue_xmit() but in case of failure
    we are decrementing the skb->users at the same time putting the skb on
    the rexmit path.  Even if dev_queue_xmit() fails, the skb->users is
    decremented already.  We are removing the decrement of skb->users in
    case of failure from both schedule_nes_timer() as well as from
    nes_cm_timer_tick().
    
    There is also extra check in nes_cm_timer_tick() for rexmit failure
    which does a break from the loop is removed.  This causes problem as
    the other nodes have their cm_node->ref_count incremented and are not
    processed.
    Signed-off-by: default avatarFaisal Latif <faisal.latif@intel.com>
    Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
    5962c2c8
nes_cm.c 99 KB