[TCP]: Fix sock_orphan dead lock

Calling sock_orphan inside bh_lock_sock in tcp_close can lead to dead locks. For example, the inet_diag code holds sk_callback_lock without disabling BH. If an inbound packet arrives during that admittedly tiny window, it will cause a dead lock on bh_lock_sock. Another possible path would be through sock_wfree if the network device driver frees the tx skb in process context with BH enabled. We can fix this by moving sock_orphan out of bh_lock_sock. The tricky bit is to work out when we need to destroy the socket ourselves and when it has already been destroyed by someone else. By moving sock_orphan before the release_sock we can solve this problem. This is because as long as we own the socket lock its state cannot change. So we simply record the socket state before the release_sock and then check the state again after we regain the socket lock. If the socket state has transitioned to TCP_CLOSE in the time being, we know that the socket has been destroyed. Otherwise the socket is still ours to keep. Note that I've also moved the increment on the orphan count forward. This may look like a problem as we're increasing it even if the socket is just about to be destroyed where it'll be decreased again. However, this simply enlarges a window that already exists. This also changes the orphan count test by one. Considering what the orphan count is meant to do this is no big deal. This problem was discoverd by Ingo Molnar using his lock validator. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>

[TCP]: Fix sock_orphan dead lock
Calling sock_orphan inside bh_lock_sock in tcp_close can lead to dead locks. For example, the inet_diag code holds sk_callback_lock without disabling BH. If an inbound packet arrives during that admittedly tiny window, it will cause a dead lock on bh_lock_sock. Another possible path would be through sock_wfree if the network device driver frees the tx skb in process context with BH enabled. We can fix this by moving sock_orphan out of bh_lock_sock. The tricky bit is to work out when we need to destroy the socket ourselves and when it has already been destroyed by someone else. By moving sock_orphan before the release_sock we can solve this problem. This is because as long as we own the socket lock its state cannot change. So we simply record the socket state before the release_sock and then check the state again after we regain the socket lock. If the socket state has transitioned to TCP_CLOSE in the time being, we know that the socket has been destroyed. Otherwise the socket is still ours to keep. Note that I've also moved the increment on the orphan count forward. This may look like a problem as we're increasing it even if the socket is just about to be destroyed where it'll be decreased again. However, this simply enlarges a window that already exists. This also changes the orphan count test by one. Considering what the orphan count is meant to do this is no big deal. This problem was discoverd by Ingo Molnar using his lock validator. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
75c2d907 · Herbert Xu · David S. Miller · 82e84249 · 75c2d907
Commit 75c2d907 authored May 03, 2006 by Herbert Xu Committed by David S. Miller May 03, 2006
Hide whitespace changes
Inline Side-by-side

Showing with 9 additions and 4 deletions

net/ipv4/tcp.c net/ipv4/tcp.c +9 -4

No files found.
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1468,6 +1468,7 @@ void tcp_close(struct sock *sk, long timeout)
 {
 	struct sk_buff *skb;
 	int data_was_unread = 0;
+	int state;

 	lock_sock(sk);
 	sk->sk_shutdown = SHUTDOWN_MASK;
@@ -1544,6 +1545,11 @@ void tcp_close(struct sock *sk, long timeout)
 	sk_stream_wait_close(sk, timeout);

 adjudge_to_death:
+	state = sk->sk_state;
+	sock_hold(sk);
+	sock_orphan(sk);
+	atomic_inc(sk->sk_prot->orphan_count);
+
 	/* It is the last release_sock in its life. It will remove backlog. */
 	release_sock(sk);

@@ -1555,8 +1561,9 @@ adjudge_to_death:
 	bh_lock_sock(sk);
 	BUG_TRAP(!sock_owned_by_user(sk));

-	sock_hold(sk);
-	sock_orphan(sk);
+	/* Have we already been destroyed by a softirq or backlog? */
+	if (state != TCP_CLOSE && sk->sk_state == TCP_CLOSE)
+		goto out;

 	/*	This is a (useful) BSD violating of the RFC. There is a
 	 *	problem with TCP as specified in that the other end could
@@ -1584,7 +1591,6 @@ adjudge_to_death:
 			if (tmo > TCP_TIMEWAIT_LEN) {
 				inet_csk_reset_keepalive_timer(sk, tcp_fin_time(sk));
 			} else {
-				atomic_inc(sk->sk_prot->orphan_count);
 				tcp_time_wait(sk, TCP_FIN_WAIT2, tmo);
 				goto out;
 			}
@@ -1603,7 +1609,6 @@ adjudge_to_death:
 			NET_INC_STATS_BH(LINUX_MIB_TCPABORTONMEMORY);
 		}
 	}
-	atomic_inc(sk->sk_prot->orphan_count);

 	if (sk->sk_state == TCP_CLOSE)
 		inet_csk_destroy_sock(sk);