RDMA/cxgb3: Correctly serialize peer abort path
Open MPI and other stress testing exposed a few bad bugs in handling
aborts in the middle of a normal close. Fix these by:
- serializing abort reply and peer abort processing with disconnect
processing
- warning (and ignoring) if ep timer is stopped when it wasn't running
- cleaning up disconnect path to correctly deal with aborting and
dead endpoints
- in iwch_modify_qp(), taking a ref on the ep before releasing the qp
lock if iwch_ep_disconnect() will be called. The ref is dropped
after calling disconnect.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index 8891c3b..6cd484e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -832,6 +832,7 @@
abort=0;
disconnect = 1;
ep = qhp->ep;
+ get_ep(&ep->com);
}
flush_qp(qhp, &flag);
break;
@@ -848,6 +849,7 @@
abort=1;
disconnect = 1;
ep = qhp->ep;
+ get_ep(&ep->com);
}
goto err;
break;
@@ -929,8 +931,10 @@
* on the EP. This can be a normal close (RTS->CLOSING) or
* an abnormal close (RTS/CLOSING->ERROR).
*/
- if (disconnect)
+ if (disconnect) {
iwch_ep_disconnect(ep, abort, GFP_KERNEL);
+ put_ep(&ep->com);
+ }
/*
* If free is 1, then we've disassociated the EP from the QP