Currently, SO_BINDTODEVICE requires CAP_NET_RAW. This change allows a
non-root user to bind a socket to an interface if it is not already
bound. This is useful to allow an application to bind itself to a
specific VRF for outgoing or incoming connections. Currently, an
application wanting to manage connections through several VRF need to
be privileged.
Previously, IP_UNICAST_IF and IPV6_UNICAST_IF were added for
Wine (76e21053b5 and c4062dfc42) specifically for use by
non-root processes. However, they are restricted to sendmsg() and not
usable with TCP. Allowing SO_BINDTODEVICE would allow TCP clients to
get the same privilege. As for TCP servers, outside the VRF use case,
SO_BINDTODEVICE would only further restrict connections a server could
accept.
When an application is restricted to a VRF (with `ip vrf exec`), the
socket is bound to an interface at creation and therefore, a
non-privileged call to SO_BINDTODEVICE to escape the VRF fails.
When an application bound a socket to SO_BINDTODEVICE and transmit it
to a non-privileged process through a Unix socket, a tentative to
change the bound device also fails.
Before:
>>> import socket
>>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
PermissionError: [Errno 1] Operation not permitted
After:
>>> import socket
>>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
>>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
PermissionError: [Errno 1] Operation not permitted
Bug: 323792489
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c427bfec18f2190b8f4718785ee8ed2db4f84ee6)
Change-Id: Ie3f4c536b78da12dd961dc681c6dfb0cdf1a06b7
Signed-off-by: Peiyong Wang <wangpeiyong@xiaomi.corp-partner.google.com>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
int type = nla_type(nla);
if (type > XFRMA_MAX) {
return -EOPNOTSUPP;
}
@type is then used as an array index and can be used
as a Spectre v1 gadget.
if (nla_len(nla) < compat_policy[type].len) {
array_index_nospec() can be used to prevent leaking
content of kernel memory to malicious users.
Bug: 254441685
Fixes: 5106f4a8acff ("xfrm/compat: Add 32=>64-bit messages translator")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit b6ee896385380aa621102e8ea402ba12db1cabff)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: Iac8d61100685ad513e04d2623fe0b79ba331167a
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
[ Upstream commit 932021a11805b9da4bd6abf66fe233cccd59fe0e ]
Function hci_sched_le needs to update the respective counter variable
inplace other the likes of hci_quote_sent would attempt to use the
possible outdated value of conn->{le_cnt,acl_cnt}.
Link: https://github.com/bluez/bluez/issues/915
Fixes: 73d80deb7b ("Bluetooth: prioritizing data over HCI")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 08829a8ff1303b1a903d1417dc0a06ffc7d17044)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 116523c8fac05d1d26f748fee7919a4ec5df67ea ]
Change that introduced the use of __check_timeout did not account for
link types properly, it always assumes ACL_LINK is used thus causing
hdev->acl_last_tx to be used even in case of LE_LINK and then again
uses ACL_LINK with hci_link_tx_to.
To fix this __check_timeout now takes the link type as parameter and
then procedure to use the right last_tx based on the link type and pass
it to hci_link_tx_to.
Fixes: 1b1d29e51499 ("Bluetooth: Make use of __check_timeout on hci_sched_le")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: David Beinder <david@beinder.at>
Stable-dep-of: 932021a11805 ("Bluetooth: hci_core: Fix LE quote calculation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit edb7dbcf8c1e95dc18ada839526ff86df3258d11)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 1b1d29e5149990e44634b2e681de71effd463591 ]
This reuse __check_timeout on hci_sched_le following the same logic
used hci_sched_acl.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Stable-dep-of: 932021a11805 ("Bluetooth: hci_core: Fix LE quote calculation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 67cddb2a1b256941952ebf501f8fc4936b704c8b)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit 806a5198c05987b748b50f3d0c0cfb3d417381a4 ]
This removes the bogus check for max > hcon->le_conn_max_interval since
the later is just the initial maximum conn interval not the maximum the
stack could support which is really 3200=4000ms.
In order to pass GAP/CONN/CPUP/BV-05-C one shall probably enter values
of the following fields in IXIT that would cause hci_check_conn_params
to fail:
TSPX_conn_update_int_min
TSPX_conn_update_int_max
TSPX_conn_update_peripheral_latency
TSPX_conn_update_supervision_timeout
Link: https://github.com/bluez/bluez/issues/847
Fixes: e4b019515f95 ("Bluetooth: Enforce validation on max value of connection interval")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a1f9c8328219b9bb2c29846d6c74ea981e60b5a7)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
[ Upstream commit e4b019515f950b4e6e5b74b2e1bb03a90cb33039 ]
Right now Linux BT stack cannot pass test case "GAP/CONN/CPUP/BV-05-C
'Connection Parameter Update Procedure Invalid Parameters Central
Responder'" in Bluetooth Test Suite revision GAP.TS.p44. [0]
That was revoled by commit c49a8682fc5d ("Bluetooth: validate BLE
connection interval updates"), but later got reverted due to devices
like keyboards and mice may require low connection interval.
So only validate the max value connection interval to pass the Test
Suite, and let devices to request low connection interval if needed.
[0] https://www.bluetooth.org/docman/handlers/DownloadDoc.ashx?doc_id=229869
Fixes: 68d19d7d9957 ("Revert "Bluetooth: validate BLE connection interval updates"")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4debb1e930570f20caa59d815c50a89fa33124d7)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
In case of using BT_ERR and BT_INFO, convert to bt_dev_err and
bt_dev_info when possible. This allows for controller specific
reporting.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The bpfilter_umh will be stopped via __stop_umh() when the bpfilter
error occurred.
The bpfilter_umh() couldn't start again because there is no restart
routine.
The section of the bpfilter_umh_{start/end} is no longer .init.rodata
because these area should be reused in the restart routine. hence
the section name is changed to .bpfilter_umh.
The bpfilter_ops->start() is restart callback. it will be called when
bpfilter_umh is stopped.
The stop bit means bpfilter_umh is stopped. this bit is set by both
start and stop routine.
Before this patch,
Test commands:
$ iptables -vnL
$ kill -9 <pid of bpfilter_umh>
$ iptables -vnL
[ 480.045136] bpfilter: write fail -32
$ iptables -vnL
All iptables commands will fail.
After this patch,
Test commands:
$ iptables -vnL
$ kill -9 <pid of bpfilter_umh>
$ iptables -vnL
$ iptables -vnL
Now, all iptables commands will work.
Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Change-Id: I4c1fa7fd3af172967e450f3db8fab77434b341b6
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
Now, UMH process is killed, do_exit() calls the umh_info->cleanup callback
to release members of the umh_info.
This patch makes bpfilter_umh's cleanup routine to use the
umh_info->cleanup callback.
Change-Id: I865aa89eb58d9bba883ed59e4b921fe401413853
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
[ Upstream commit 1dfd1711de2952fd1bfeea7152bd1687a4eea771 ]
The locking in force_sig_info is not prepared to deal with
a task that exits or execs (as sighand may change). As force_sig
is only built to handle synchronous exceptions.
Further the function force_sig_info changes the signal state if the
signal is ignored, or blocked or if SIGNAL_UNKILLABLE will prevent the
delivery of the signal. The signal SIGKILL can not be ignored and can
not be blocked and SIGNAL_UNKILLABLE won't prevent it from being
delivered.
So using force_sig rather than send_sig for SIGKILL is pointless.
Because it won't impact the sending of the signal and and because
using force_sig is wrong, replace force_sig with send_sig.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Change-Id: Id62ece4eade762ecc94dd3e4a940907e25c48fab
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
Fixes the following Sparse warnings:
net/bpfilter/bpfilter_kern.c:62:21: warning: cast removes address space
of expression
net/bpfilter/bpfilter_kern.c:101:49: warning: Using plain integer as
NULL pointer
Change-Id: I18ad265abd911bab943a3c84a3dac540e28a1da6
Signed-off-by: Shanthosh RK <shanthosh.rk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
What we want here is to embed a user-space program into the kernel.
Instead of the complex ELF magic, let's simply wrap it in the assembly
with the '.incbin' directive.
Change-Id: I8a532c9f9c5d2bc853821becb40832d56cdbb72c
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
syzbot reported the following crash
[ 338.293946] bpfilter: read fail -512
[ 338.304515] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 338.311863] general protection fault: 0000 [#1] SMP KASAN
[ 338.344360] RIP: 0010:__vfs_write+0x4a6/0x960
[ 338.426363] Call Trace:
[ 338.456967] __kernel_write+0x10c/0x380
[ 338.460928] __bpfilter_process_sockopt+0x1d8/0x35b
[ 338.487103] bpfilter_mbox_request+0x4d/0xb0
[ 338.491492] bpfilter_ip_get_sockopt+0x6b/0x90
This can happen when multiple cpus trying to talk to user mode process
via bpfilter_mbox_request(). One cpu grabs the mutex while another goes to
sleep on the same mutex. Then former cpu sees that umh pipe is down and
shuts down the pipes. Later cpu finally acquires the mutex and crashes
on freed pipe.
Fix the race by using info.pid as an indicator that umh and pipes are healthy
and check it after acquiring the mutex.
Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Reported-by: syzbot+7ade6c94abb2774c0fee@syzkaller.appspotmail.com
Change-Id: Ic3f5aa80143601b532f6ca2feb4ef97b249805f0
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
bpfilter_process_sockopt is a callback that gets called from
ip_setsockopt() and ip_getsockopt(). However, when CONFIG_INET is
disabled, it never gets called at all, and assigning a function to the
callback pointer results in a link failure:
net/bpfilter/bpfilter_kern.o: In function `__stop_umh':
bpfilter_kern.c:(.text.unlikely+0x3): undefined reference to `bpfilter_process_sockopt'
net/bpfilter/bpfilter_kern.o: In function `load_umh':
bpfilter_kern.c:(.init.text+0x73): undefined reference to `bpfilter_process_sockopt'
Since there is no caller in this configuration, I assume we can
simply make the assignment conditional.
Change-Id: Ia233a7b3be4b060f653aad9d4a3f045ba3b8a07e
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
[ Upstream commit 114039b342014680911c35bd6b72624180fd669a ]
To avoid potentially breaking existing users.
Both mac/no-mac cases have to be amended; mac_header >= network_header
is not enough (verified with a new test, see next patch).
Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
Change-Id: If63ce98321997875863f3fce255061e8a64bba5e
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221121180340.1983627-1-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
C flow dissector supports input flags that tell it to customize parsing
by either stopping early or trying to parse as deep as possible. Pass
those flags to the BPF flow dissector so it can make the same
decisions. In the next commits I'll add support for those flags to
our reference bpf_flow.c
v3:
* Export copy of flow dissector flags instead of moving (Alexei Starovoitov)
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Petar Penkov <ppenkov@google.com>
Change-Id: I46a68f8b2249915fff5d97a1394ea662d9a0ac46
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
When called without skb, gather all required data from the
__skb_flow_dissect's arguments and use recently introduces
no-skb mode of bpf flow dissector.
Note: WARN_ON_ONCE(!net) will now trigger for eth_get_headlen users.
Change-Id: Ice7d27b1adaeb9adb02ea52fda999450a6d765af
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
This new argument will be used in the next patches for the
eth_get_headlen use case. eth_get_headlen calls flow dissector
with only data (without skb) so there is currently no way to
pull attached BPF flow dissector program. With this new argument,
we can amend the callers to explicitly pass network namespace
so we can use attached BPF program.
Change-Id: I07dbe25c77a13f8cc33b037e63c532979d290ad0
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
When the core networking needs to detect the transport offset in a given
packet and parse it explicitly, a full-blown flow_keys struct is used for
storage.
This patch introduces a smaller keys store, rework the basic flow dissect
helper to use it, and apply this new helper where possible - namely in
skb_probe_transport_header(). The used flow dissector data structures
are renamed to match more closely the new role.
The above gives ~50% performance improvement in micro benchmarking around
skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due
to the smaller memset. Small, but measurable improvement is measured also
in macro benchmarking.
v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(),
as per DaveM suggestion
[Linux4: Apply to include/linux/virtio_net.h]
Suggested-by: David Miller <davem@davemloft.net>
Change-Id: Iebe280d3793028c9dd640b5dc59d046601f02cca
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
This tests that:
* a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it
uses either:
* a variable offset to the tracepoint buffer, or
* an offset beyond the size of the tracepoint buffer
* a tracer can modify the buffer provided when attached to a writable
tracepoint in bpf_prog_test_run
Change-Id: I3fdf650c1d9a68d64d08b170b14e46a35d180dd7
Signed-off-by: Matt Mullins <mmullins@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
Now that we use skb-less flow dissector let's return true nhoff and
thoff. We used to adjust them by ETH_HLEN because that's how it was
done in the skb case. For VLAN tests that looks confusing: nhoff is
pointing to vlan parts :-\
Warning, this is an API change for BPF_PROG_TEST_RUN! Feel free to drop
if you think that it's too late at this point to fix it.
Change-Id: Icae82218f62c91b7860981bc62269eba8e58c25c
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
Now that we have bpf_flow_dissect which can work on raw data,
use it when doing BPF_PROG_TEST_RUN for flow dissector.
Simplifies bpf_prog_test_run_flow_dissector and allows us to
test no-skb mode.
Note, that previously, with bpf_flow_dissect_skb we used to call
eth_type_trans which pulled L2 (ETH_HLEN) header and we explicitly called
skb_reset_network_header. That means flow_keys->nhoff would be
initialized to 0 (skb_network_offset) in init_flow_keys.
Now we call bpf_flow_dissect with nhoff set to ETH_HLEN and need
to undo it once the dissection is done to preserve the existing behavior.
Change-Id: Icaf9c2a76c6176c647081535f7d48d94fff2e126
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
This should allow us later to extend BPF_PROG_TEST_RUN for non-skb case
and be sure that nobody is erroneously setting ctx_{in,out}.
Fixes: b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Change-Id: Ibe85a9217cd265966ef57e1298b37a6b9baa50c4
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff
makes process unkillable. The problem is that when CONFIG_PREEMPT is
enabled, we never see need_resched() return true. This is due to the
fact that preempt_enable() (which we do in bpf_test_run_one on each
iteration) now handles resched if it's needed.
Let's disable preemption for the whole run, not per test. In this case
we can properly see whether resched is needed.
Let's also properly return -EINTR to the userspace in case of a signal
interrupt.
This is a follow up for a recently fixed issue in bpf_test_run, see
commit df1a2cb7c74b ("bpf/test_run: fix unkillable
BPF_PROG_TEST_RUN").
Reported-by: syzbot <syzkaller@googlegroups.com>
Change-Id: I38d1d9553d1ee281666de7000d06684962658397
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff
makes process unkillable. The problem is that when CONFIG_PREEMPT is
enabled, we never see need_resched() return true. This is due to the
fact that preempt_enable() (which we do in bpf_test_run_one on each
iteration) now handles resched if it's needed.
Let's disable preemption for the whole run, not per test. In this case
we can properly see whether resched is needed.
Let's also properly return -EINTR to the userspace in case of a signal
interrupt.
See recent discussion:
http://lore.kernel.org/netdev/CAH3MdRWHr4N8jei8jxDppXjmw-Nw=puNDLbu1dQOFQHxfU2onA@mail.gmail.com
I'll follow up with the same fix bpf_prog_test_run_flow_dissector in
bpf-next.
Reported-by: syzbot <syzkaller@googlegroups.com>
Change-Id: I6744e83dc918d20f44cdf9f37cbb0f223a970aa4
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
Use data_size_out as a size hint when copying test output to user space.
ENOSPC is returned if the output buffer is too small.
Callers which so far did not set data_size_out are not affected.
Change-Id: Ic1a42d1903e96a26a27a56489b75be05c58996ff
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
Commit b0b9395d865e ("bpf: support input __sk_buff context in
BPF_PROG_TEST_RUN") started using bpf_check_uarg_tail_zero in
BPF_PROG_TEST_RUN. However, bpf_check_uarg_tail_zero is not defined
for !CONFIG_BPF_SYSCALL:
net/bpf/test_run.c: In function ‘bpf_ctx_init’:
net/bpf/test_run.c:142:9: error: implicit declaration of function ‘bpf_check_uarg_tail_zero’ [-Werror=implicit-function-declaration]
err = bpf_check_uarg_tail_zero(data_in, max_size, size);
^~~~~~~~~~~~~~~~~~~~~~~~
Let's not build net/bpf/test_run.c when CONFIG_BPF_SYSCALL is not set.
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN")
Change-Id: Iddc8eb00de3fbde87f765f533b91d2fbd3b41603
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
[ Upstream commit d4bac0288a2b444e468e6df9cb4ed69479ddf14a ]
Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to
read when these offsets extend into frags.
This has been observed with iwlwifi and reproduced with tun with
IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port,
applied to a RAW socket, will silently miss matching packets.
const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt);
const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest);
struct sock_filter filter_code[] = {
BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4),
BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2),
BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport),
This is unexpected behavior. Socket filter programs should be
consistent regardless of environment. Silent misses are
particularly concerning as hard to detect.
Use skb_copy_bits for offsets outside linear, same as done for
non-SKF_(LL|NET) offsets.
Offset is always positive after subtracting the reference threshold
SKB_(LL|NET)_OFF, so is always >= skb_(mac|network)_offset. The sum of
the two is an offset against skb->data, and may be negative, but it
cannot point before skb->head, as skb_(mac|network)_offset would too.
This appears to go back to when frag support was introduced to
sk_run_filter in linux-2.4.4, before the introduction of git.
The amount of code change and 8/16/32 bit duplication are unfortunate.
But any attempt I made to be smarter saved very few LoC while
complicating the code.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/
Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244
Reported-by: Matt Moeller <moeller.matt@gmail.com>
Co-developed-by: Maciej Żenczykowski <maze@google.com>
Change-Id: Ia1ff7be23c39d872596848731818740696c86c29
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20250408132833.195491-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
[ Upstream commit 96a233e600df351bcb06e3c20efe048855552926 ]
A couple of the syscalls which load values (bpf_skb_load_helper_16() and
bpf_skb_load_helper_32()) are using u16/u32 types which are triggering
warnings as they are then converted from big-endian to CPU-endian. Fix
these by making the types __be instead.
Fixes the following sparse warnings:
net/core/filter.c:246:32: warning: cast to restricted __be16
net/core/filter.c:246:32: warning: cast to restricted __be16
net/core/filter.c:246:32: warning: cast to restricted __be16
net/core/filter.c:246:32: warning: cast to restricted __be16
net/core/filter.c:273:32: warning: cast to restricted __be32
net/core/filter.c:273:32: warning: cast to restricted __be32
net/core/filter.c:273:32: warning: cast to restricted __be32
net/core/filter.c:273:32: warning: cast to restricted __be32
net/core/filter.c:273:32: warning: cast to restricted __be32
net/core/filter.c:273:32: warning: cast to restricted __be32
Change-Id: I6486ac70a70ffae777a6cd01f4f4b6926f44797f
Signed-off-by: Ben Dooks <ben.dooks@sifive.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220714105101.297304-1-ben.dooks@sifive.com
Stable-dep-of: d4bac0288a2b ("bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
[ Upstream commit 955afd57dc4bf7e8c620a0a9e3af3c881c2c6dff ]
Found in the test_txmsg_pull in test_sockmap,
```
txmsg_cork = 512; // corking is importrant here
opt->iov_length = 3;
opt->iov_count = 1;
opt->rate = 512; // sendmsg will be invoked 512 times
```
The first sendmsg will send an sk_msg with size 3, and bpf_msg_pull_data
will be invoked the first time. sk_msg_reset_curr will reset the copybreak
from 3 to 0. In the second sendmsg, since we are in the stage of corking,
psock->cork will be reused in func sk_msg_alloc. msg->sg.copybreak is 0
now, the second msg will overwrite the first msg. As a result, we could
not pass the data integrity test.
The same problem happens in push and pop test. Thus, fix sk_msg_reset_curr
to restore the correct copybreak.
Fixes: bb9aefde5bba ("bpf: sockmap, updating the sg structure should also update curr")
Change-Id: Ibbac29c62d1b26f03d5f3588c14316829aac017f
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Link: https://lore.kernel.org/r/20241106222520.527076-9-zijianzhang@bytedance.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
[ Upstream commit 5d609ba262475db450ba69b8e8a557bd768ac07a ]
Several fixes to bpf_msg_pop_data,
1. In sk_msg_shift_left, we should put_page
2. if (len == 0), return early is better
3. pop the entire sk_msg (last == msg->sg.size) should be supported
4. Fix for the value of variable "a"
5. In sk_msg_shift_left, after shifting, i has already pointed to the next
element. Addtional sk_msg_iter_var_next may result in BUG.
Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages")
Change-Id: I03032e0683aec3b938bb1340b493e280c8e871b8
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20241106222520.527076-8-zijianzhang@bytedance.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
[ Upstream commit 15ab0548e3107665c34579ae523b2b6e7c22082a ]
Several fixes to bpf_msg_push_data,
1. test_sockmap has tests where bpf_msg_push_data is invoked to push some
data at the end of a message, but -EINVAL is returned. In this case, in
bpf_msg_push_data, after the first loop, i will be set to msg->sg.end, add
the logic to handle it.
2. In the code block of "if (start - offset)", it's possible that "i"
points to the last of sk_msg_elem. In this case, "sk_msg_iter_next(msg,
end)" might still be called twice, another invoking is in "if (!copy)"
code block, but actually only one is needed. Add the logic to handle it,
and reconstruct the code to make the logic more clear.
Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data")
Change-Id: Ie497b74e5b7e5ff9b8332096e8b4611109ff51be
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Link: https://lore.kernel.org/r/20241106222520.527076-7-zijianzhang@bytedance.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>