From a9dcaece7756f8e7fd10fd577b0d4e99b01cfbc5 Mon Sep 17 00:00:00 2001 From: Craig Gallek Date: Wed, 10 Feb 2016 11:50:39 -0500 Subject: [PATCH] soreuseport: Prep for fast reuseport TCP socket selection Both of the lines in this patch probably should have been included in the initial implementation of this code for generic socket support, but weren't technically necessary since only UDP sockets were supported. First, the sk_reuseport_cb points to a structure which assumes each socket in the group has this pointer assigned at the same time it's added to the array in the structure. The sk_clone_lock function breaks this assumption. Since a child socket shouldn't implicitly be in a reuseport group, the simple fix is to clear the field in the clone. Second, the SO_ATTACH_REUSEPORT_xBPF socket options require that SO_REUSEPORT also be set first. For UDP sockets, this is easily enforced at bind-time since that process both puts the socket in the appropriate receive hlist and updates the reuseport structures. Since these operations can happen at two different times for TCP sockets (bind and listen) it must be explicitly checked to enforce the use of SO_REUSEPORT with SO_ATTACH_REUSEPORT_xBPF in the setsockopt call. Signed-off-by: Craig Gallek Signed-off-by: David S. Miller --- net/core/filter.c | 2 +- net/core/sock.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/filter.c b/net/core/filter.c index 0a436d5bbd0d..26a41a27c91d 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1210,7 +1210,7 @@ static int __reuseport_attach_prog(struct bpf_prog *prog, struct sock *sk) if (bpf_prog_size(prog->len) > sysctl_optmem_max) return -ENOMEM; - if (sk_unhashed(sk)) { + if (sk_unhashed(sk) && sk->sk_reuseport) { err = reuseport_alloc(sk); if (err) return err; diff --git a/net/core/sock.c b/net/core/sock.c index a0edd6121902..0ac6e6d306f7 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1622,6 +1622,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority) newsk = NULL; goto out; } + RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL); newsk->sk_err = 0; newsk->sk_err_soft = 0;