Commit 5962c2c8 authored by Faisal Latif's avatar Faisal Latif Committed by Roland Dreier

RDMA/nes: Fix nes_nic_cm_xmit() error handling

We are getting crash or hung situation when we are running network
cable pull tests during RDMA traffic.

In schedule_nes_timer(), we return an error if nes_nic_cm_xmit()
returns failure.  This is changed to success as skb is being put on
the timer routines to be processed later.  In send_syn() case, we are
indicating connect failure once from nes_connect() and the other when
the rexmit retries expires.

The other issue is skb->users which we are incrementing before calling
nes_nic_cm_xmit() which calls dev_queue_xmit() but in case of failure
we are decrementing the skb->users at the same time putting the skb on
the rexmit path.  Even if dev_queue_xmit() fails, the skb->users is
decremented already.  We are removing the decrement of skb->users in
case of failure from both schedule_nes_timer() as well as from
nes_cm_timer_tick().

There is also extra check in nes_cm_timer_tick() for rexmit failure
which does a break from the loop is removed.  This causes problem as
the other nodes have their cm_node->ref_count incremented and are not
processed.
Signed-off-by: default avatarFaisal Latif <faisal.latif@intel.com>
Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
parent 79fc3d74
...@@ -446,8 +446,8 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb, ...@@ -446,8 +446,8 @@ int schedule_nes_timer(struct nes_cm_node *cm_node, struct sk_buff *skb,
if (ret != NETDEV_TX_OK) { if (ret != NETDEV_TX_OK) {
nes_debug(NES_DBG_CM, "Error sending packet %p " nes_debug(NES_DBG_CM, "Error sending packet %p "
"(jiffies = %lu)\n", new_send, jiffies); "(jiffies = %lu)\n", new_send, jiffies);
atomic_dec(&new_send->skb->users);
new_send->timetosend = jiffies; new_send->timetosend = jiffies;
ret = NETDEV_TX_OK;
} else { } else {
cm_packets_sent++; cm_packets_sent++;
if (!send_retrans) { if (!send_retrans) {
...@@ -631,7 +631,6 @@ static void nes_cm_timer_tick(unsigned long pass) ...@@ -631,7 +631,6 @@ static void nes_cm_timer_tick(unsigned long pass)
nes_debug(NES_DBG_CM, "rexmit failed for " nes_debug(NES_DBG_CM, "rexmit failed for "
"node=%p\n", cm_node); "node=%p\n", cm_node);
cm_packets_bounced++; cm_packets_bounced++;
atomic_dec(&send_entry->skb->users);
send_entry->retrycount--; send_entry->retrycount--;
nexttimeout = jiffies + NES_SHORT_TIME; nexttimeout = jiffies + NES_SHORT_TIME;
settimer = 1; settimer = 1;
...@@ -667,11 +666,6 @@ static void nes_cm_timer_tick(unsigned long pass) ...@@ -667,11 +666,6 @@ static void nes_cm_timer_tick(unsigned long pass)
spin_unlock_irqrestore(&cm_node->retrans_list_lock, flags); spin_unlock_irqrestore(&cm_node->retrans_list_lock, flags);
rem_ref_cm_node(cm_node->cm_core, cm_node); rem_ref_cm_node(cm_node->cm_core, cm_node);
if (ret != NETDEV_TX_OK) {
nes_debug(NES_DBG_CM, "rexmit failed for cm_node=%p\n",
cm_node);
break;
}
} }
if (settimer) { if (settimer) {
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment