kernel_realme_sm7125

Evolution-X-Devices/kernel_realme_sm7125

Author	SHA1	Message	Date
basamaryan	8d60ed7eac	wireguard: compat: guard __kernel_timespec to <4.14 Change-Id: Iac483a5dd995b3e25e6f10c7fc1c6f3ebd6b7796	2026-01-15 15:45:46 +05:30
Arnd Bergmann	224943ee8f	UPSTREAM: libceph: use timespec64 in for keepalive2 and ticket validity ceph_con_keepalive_expired() is the last user of timespec_add() and some of the last uses of ktime_get_real_ts(). Replacing this with timespec64 based interfaces lets us remove that deprecated API. I'm introducing new ceph_encode_timespec64()/ceph_decode_timespec64() here that take timespec64 structures and convert to/from ceph_timespec, which is defined to have an unsigned 32-bit tv_sec member. This extends the range of valid times to year 2106, avoiding the year 2038 overflow. The ceph file system portion still uses the old functions for inode timestamps, this will be done separately after the VFS layer is converted. Change-Id: I638be3e081770e6b766b972343ed7652a473383d Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2026-01-15 15:45:44 +05:30
theshaenix	9fbf85f1ff	tcp: restore missing OPLUS modem TCP hook declarations Restore OPLUS modem network power hook declarations in tcp_output.c that were accidentally removed during previous cleanups. These extern declarations and headers are required when OPLUS_FEATURE_MODEM_DATA_NWPOWER is enabled to provide the corresponding modem wakeup and TCP output hook infrastructure. Fixes build failures caused by unresolved OPLUS TCP hook symbols.	2025-12-28 19:20:51 +05:30
Taehee Yoo	efcf368e9e	UPSTREAM: net: bpfilter: disallow to remove bpfilter module while being used The bpfilter.ko module can be removed while functions of the bpfilter.ko are executing. so panic can occurred. in order to protect that, locks can be used. a bpfilter_lock protects routines in the __bpfilter_process_sockopt() but it's not enough because __exit routine can be executed concurrently. Now, the bpfilter_umh can not run in parallel. So, the module do not removed while it's being used and it do not double-create UMH process. The members of the umh_info and the bpfilter_umh_ops are protected by the bpfilter_umh_ops.lock. test commands: while : do iptables -I FORWARD -m string --string ap --algo kmp & modprobe -rv bpfilter & done splat looks like: [ 298.623435] BUG: unable to handle kernel paging request at fffffbfff807440b [ 298.628512] #PF error: [normal kernel read fault] [ 298.633018] PGD 124327067 P4D 124327067 PUD 11c1a3067 PMD 119eb2067 PTE 0 [ 298.638859] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 298.638859] CPU: 0 PID: 2997 Comm: iptables Not tainted 4.20.0+ #154 [ 298.638859] RIP: 0010:__mutex_lock+0x6b9/0x16a0 [ 298.638859] Code: c0 00 00 e8 89 82 ff ff 80 bd 8f fc ff ff 00 0f 85 d9 05 00 00 48 8b 85 80 fc ff ff 48 bf 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 38 00 0f 85 1d 0e 00 00 48 8b 85 c8 fc ff ff 49 39 47 58 c6 [ 298.638859] RSP: 0018:ffff88810e7777a0 EFLAGS: 00010202 [ 298.638859] RAX: 1ffffffff807440b RBX: ffff888111bd4d80 RCX: 0000000000000000 [ 298.638859] RDX: 1ffff110235ff806 RSI: ffff888111bd5538 RDI: dffffc0000000000 [ 298.638859] RBP: ffff88810e777b30 R08: 0000000080000002 R09: 0000000000000000 [ 298.638859] R10: 0000000000000000 R11: 0000000000000000 R12: fffffbfff168a42c [ 298.638859] R13: ffff888111bd4d80 R14: ffff8881040e9a05 R15: ffffffffc03a2000 [ 298.638859] FS: 00007f39e3758700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 [ 298.638859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 298.638859] CR2: fffffbfff807440b CR3: 000000011243e000 CR4: 00000000001006f0 [ 298.638859] Call Trace: [ 298.638859] ? mutex_lock_io_nested+0x1560/0x1560 [ 298.638859] ? kasan_kmalloc+0xa0/0xd0 [ 298.638859] ? kmem_cache_alloc+0x1c2/0x260 [ 298.638859] ? __alloc_file+0x92/0x3c0 [ 298.638859] ? alloc_empty_file+0x43/0x120 [ 298.638859] ? alloc_file_pseudo+0x220/0x330 [ 298.638859] ? sock_alloc_file+0x39/0x160 [ 298.638859] ? __sys_socket+0x113/0x1d0 [ 298.638859] ? __x64_sys_socket+0x6f/0xb0 [ 298.638859] ? do_syscall_64+0x138/0x560 [ 298.638859] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 298.638859] ? __alloc_file+0x92/0x3c0 [ 298.638859] ? init_object+0x6b/0x80 [ 298.638859] ? cyc2ns_read_end+0x10/0x10 [ 298.638859] ? cyc2ns_read_end+0x10/0x10 [ 298.638859] ? hlock_class+0x140/0x140 [ 298.638859] ? sched_clock_local+0xd4/0x140 [ 298.638859] ? sched_clock_local+0xd4/0x140 [ 298.638859] ? check_flags.part.37+0x440/0x440 [ 298.638859] ? __lock_acquire+0x4f90/0x4f90 [ 298.638859] ? set_rq_offline.part.89+0x140/0x140 [ ... ] Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Change-Id: I3186f2ec6cbf8263aed16d562021f741ab25bf86 Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:12 +05:30
Taehee Yoo	fd18f996d9	UPSTREAM: net: bpfilter: restart bpfilter_umh when error occurred The bpfilter_umh will be stopped via __stop_umh() when the bpfilter error occurred. The bpfilter_umh() couldn't start again because there is no restart routine. The section of the bpfilter_umh_{start/end} is no longer .init.rodata because these area should be reused in the restart routine. hence the section name is changed to .bpfilter_umh. The bpfilter_ops->start() is restart callback. it will be called when bpfilter_umh is stopped. The stop bit means bpfilter_umh is stopped. this bit is set by both start and stop routine. Before this patch, Test commands: $ iptables -vnL $ kill -9 <pid of bpfilter_umh> $ iptables -vnL [ 480.045136] bpfilter: write fail -32 $ iptables -vnL All iptables commands will fail. After this patch, Test commands: $ iptables -vnL $ kill -9 <pid of bpfilter_umh> $ iptables -vnL $ iptables -vnL Now, all iptables commands will work. Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Change-Id: I4c1fa7fd3af172967e450f3db8fab77434b341b6 Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:12 +05:30
Taehee Yoo	cccaf12d45	UPSTREAM: net: bpfilter: use cleanup callback to release umh_info Now, UMH process is killed, do_exit() calls the umh_info->cleanup callback to release members of the umh_info. This patch makes bpfilter_umh's cleanup routine to use the umh_info->cleanup callback. Change-Id: I865aa89eb58d9bba883ed59e4b921fe401413853 Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:11 +05:30
Eric W. Biederman	274c4e577c	UPSTREAM: signal/bpfilter: Fix bpfilter_kernl to use send_sig not force_sig [ Upstream commit 1dfd1711de2952fd1bfeea7152bd1687a4eea771 ] The locking in force_sig_info is not prepared to deal with a task that exits or execs (as sighand may change). As force_sig is only built to handle synchronous exceptions. Further the function force_sig_info changes the signal state if the signal is ignored, or blocked or if SIGNAL_UNKILLABLE will prevent the delivery of the signal. The signal SIGKILL can not be ignored and can not be blocked and SIGNAL_UNKILLABLE won't prevent it from being delivered. So using force_sig rather than send_sig for SIGKILL is pointless. Because it won't impact the sending of the signal and and because using force_sig is wrong, replace force_sig with send_sig. Cc: Alexei Starovoitov <ast@kernel.org> Cc: David S. Miller <davem@davemloft.net> Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Change-Id: Id62ece4eade762ecc94dd3e4a940907e25c48fab Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:42:11 +05:30
Taehee Yoo	24bcc1cd34	UPSTREAM: net: bpfilter: use get_pid_task instead of pid_task pid_task() dereferences rcu protected tasks array. But there is no rcu_read_lock() in shutdown_umh() routine so that rcu_read_lock() is needed. get_pid_task() is wrapper function of pid_task. it holds rcu_read_lock() then calls pid_task(). if task isn't NULL, it increases reference count of task. test commands: %modprobe bpfilter %modprobe -rv bpfilter splat looks like: [15102.030932] ============================= [15102.030957] WARNING: suspicious RCU usage [15102.030985] 4.19.0-rc7+ #21 Not tainted [15102.031010] ----------------------------- [15102.031038] kernel/pid.c:330 suspicious rcu_dereference_check() usage! [15102.031063] other info that might help us debug this: [15102.031332] rcu_scheduler_active = 2, debug_locks = 1 [15102.031363] 1 lock held by modprobe/1570: [15102.031389] #0: 00000000580ef2b0 (bpfilter_lock){+.+.}, at: stop_umh+0x13/0x52 [bpfilter] [15102.031552] stack backtrace: [15102.031583] CPU: 1 PID: 1570 Comm: modprobe Not tainted 4.19.0-rc7+ #21 [15102.031607] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015 [15102.031628] Call Trace: [15102.031676] dump_stack+0xc9/0x16b [15102.031723] ? show_regs_print_info+0x5/0x5 [15102.031801] ? lockdep_rcu_suspicious+0x117/0x160 [15102.031855] pid_task+0x134/0x160 [15102.031900] ? find_vpid+0xf0/0xf0 [15102.032017] shutdown_umh.constprop.1+0x1e/0x53 [bpfilter] [15102.032055] stop_umh+0x46/0x52 [bpfilter] [15102.032092] __x64_sys_delete_module+0x47e/0x570 [ ... ] Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Change-Id: Ideb9fdae191efb594fa2a258d5e559fb0bba39ce Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:10 +05:30
Shanthosh RK	927a815854	UPSTREAM: net: bpfilter: Fix type cast and pointer warnings Fixes the following Sparse warnings: net/bpfilter/bpfilter_kern.c:62:21: warning: cast removes address space of expression net/bpfilter/bpfilter_kern.c:101:49: warning: Using plain integer as NULL pointer Change-Id: I18ad265abd911bab943a3c84a3dac540e28a1da6 Signed-off-by: Shanthosh RK <shanthosh.rk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:10 +05:30
Masahiro Yamada	635b27ce03	BACKPORT: bpfilter: include bpfilter_umh in assembly instead of using objcopy What we want here is to embed a user-space program into the kernel. Instead of the complex ELF magic, let's simply wrap it in the assembly with the '.incbin' directive. Change-Id: I8a532c9f9c5d2bc853821becb40832d56cdbb72c Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:10 +05:30
Alexei Starovoitov	0e165cc9cc	UPSTREAM: bpfilter: fix race in pipe access syzbot reported the following crash [ 338.293946] bpfilter: read fail -512 [ 338.304515] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 338.311863] general protection fault: 0000 [#1] SMP KASAN [ 338.344360] RIP: 0010:__vfs_write+0x4a6/0x960 [ 338.426363] Call Trace: [ 338.456967] __kernel_write+0x10c/0x380 [ 338.460928] __bpfilter_process_sockopt+0x1d8/0x35b [ 338.487103] bpfilter_mbox_request+0x4d/0xb0 [ 338.491492] bpfilter_ip_get_sockopt+0x6b/0x90 This can happen when multiple cpus trying to talk to user mode process via bpfilter_mbox_request(). One cpu grabs the mutex while another goes to sleep on the same mutex. Then former cpu sees that umh pipe is down and shuts down the pipes. Later cpu finally acquires the mutex and crashes on freed pipe. Fix the race by using info.pid as an indicator that umh and pipes are healthy and check it after acquiring the mutex. Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Reported-by: syzbot+7ade6c94abb2774c0fee@syzkaller.appspotmail.com Change-Id: Ic3f5aa80143601b532f6ca2feb4ef97b249805f0 Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:09 +05:30
Arnd Bergmann	ef17e27535	UPSTREAM: bpfilter: fix building without CONFIG_INET bpfilter_process_sockopt is a callback that gets called from ip_setsockopt() and ip_getsockopt(). However, when CONFIG_INET is disabled, it never gets called at all, and assigning a function to the callback pointer results in a link failure: net/bpfilter/bpfilter_kern.o: In function `__stop_umh': bpfilter_kern.c:(.text.unlikely+0x3): undefined reference to `bpfilter_process_sockopt' net/bpfilter/bpfilter_kern.o: In function `load_umh': bpfilter_kern.c:(.init.text+0x73): undefined reference to `bpfilter_process_sockopt' Since there is no caller in this configuration, I assume we can simply make the assignment conditional. Change-Id: Ia233a7b3be4b060f653aad9d4a3f045ba3b8a07e Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:09 +05:30
Stanislav Fomichev	d693449e3e	BACKPORT: bpf/flow_dissector: support ipv6 flow_label and BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL Add support for exporting ipv6 flow label via bpf_flow_keys. Export flow label from bpf_flow.c and also return early when BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL is passed. Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Petar Penkov <ppenkov@google.com> Change-Id: I6b4c3771022f19c184867fb6045351d59cfde68b Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 18:42:07 +05:30
Stanislav Fomichev	9287f9616f	UPSTREAM: bpf: Move skb->len == 0 checks into __bpf_redirect [ Upstream commit 114039b342014680911c35bd6b72624180fd669a ] To avoid potentially breaking existing users. Both mac/no-mac cases have to be amended; mac_header >= network_header is not enough (verified with a new test, see next patch). Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len") Change-Id: If63ce98321997875863f3fce255061e8a64bba5e Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221121180340.1983627-1-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:42:06 +05:30
Zhengchao Shao	457d766adf	UPSTREAM: bpf: Don't redirect packets with invalid pkt_len commit fd1894224407c484f652ad456e1ce423e89bb3eb upstream. Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any skbs, that is, the flow->head is null. The root cause, as the [2] says, is because that bpf_prog_test_run_skb() run a bpf prog which redirects empty skbs. So we should determine whether the length of the packet modified by bpf prog or others like bpf_prog_test is valid before forwarding it directly. LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5 LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com Change-Id: Ieba6ad52b2eb657fb02ef7d3e414932d7743f073 Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-12-28 18:42:06 +05:30
Stanislav Fomichev	0e87beec78	UPSTREAM: bpf/flow_dissector: support flags in BPF_PROG_TEST_RUN This will allow us to write tests for those flags. v2: * Swap kfree(data) and kfree(user_ctx) (Song Liu) Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Petar Penkov <ppenkov@google.com> Change-Id: Ic1b6920aa27ee21945df37585af2b0dfe37dca6e Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 18:42:05 +05:30
Stanislav Fomichev	4815ef912a	UPSTREAM: bpf/flow_dissector: pass input flags to BPF flow dissector program C flow dissector supports input flags that tell it to customize parsing by either stopping early or trying to parse as deep as possible. Pass those flags to the BPF flow dissector so it can make the same decisions. In the next commits I'll add support for those flags to our reference bpf_flow.c v3: * Export copy of flow dissector flags instead of moving (Alexei Starovoitov) Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Petar Penkov <ppenkov@google.com> Change-Id: I46a68f8b2249915fff5d97a1394ea662d9a0ac46 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 18:42:05 +05:30
Stanislav Fomichev	3807c70734	UPSTREAM: flow_dissector: remove unused FLOW_DISSECTOR_F_STOP_AT_L3 flag This flag is not used by any caller, remove it. Change-Id: I280df0c749d35d6259fdcf3c7f9d3d142e6289a4 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:04 +05:30
Stanislav Fomichev	bca0f440f0	UPSTREAM: flow_dissector: handle no-skb use case When called without skb, gather all required data from the __skb_flow_dissect's arguments and use recently introduces no-skb mode of bpf flow dissector. Note: WARN_ON_ONCE(!net) will now trigger for eth_get_headlen users. Change-Id: Ice7d27b1adaeb9adb02ea52fda999450a6d765af Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:42:04 +05:30
Stanislav Fomichev	13988727f4	UPSTREAM: net: plumb network namespace into __skb_flow_dissect This new argument will be used in the next patches for the eth_get_headlen use case. eth_get_headlen calls flow dissector with only data (without skb) so there is currently no way to pull attached BPF flow dissector program. With this new argument, we can amend the callers to explicitly pass network namespace so we can use attached BPF program. Change-Id: I07dbe25c77a13f8cc33b037e63c532979d290ad0 Signed-off-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:42:04 +05:30
Paolo Abeni	b0f67d760b	BACKPORT: flow_dissector: do not rely on implicit casts This change fixes a couple of type mismatch reported by the sparse tool, explicitly using the requested type for the offending arguments. Change-Id: I73920d0a210f1ef2229a406c49844970208d6ada Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:03 +05:30
Paolo Abeni	036ccf95fb	BACKPORT: net: core: rework basic flow dissection helper When the core networking needs to detect the transport offset in a given packet and parse it explicitly, a full-blown flow_keys struct is used for storage. This patch introduces a smaller keys store, rework the basic flow dissect helper to use it, and apply this new helper where possible - namely in skb_probe_transport_header(). The used flow dissector data structures are renamed to match more closely the new role. The above gives ~50% performance improvement in micro benchmarking around skb_probe_transport_header() and ~30% around eth_get_headlen(), mostly due to the smaller memset. Small, but measurable improvement is measured also in macro benchmarking. v1 -> v2: use the new helper in eth_get_headlen() and skb_get_poff(), as per DaveM suggestion [Linux4: Apply to include/linux/virtio_net.h] Suggested-by: David Miller <davem@davemloft.net> Change-Id: Iebe280d3793028c9dd640b5dc59d046601f02cca Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:02 +05:30
Eric Dumazet	80f5987ed0	UPSTREAM: flow_dissector: disable preemption around BPF calls Various things in eBPF really require us to disable preemption before running an eBPF program. syzbot reported : BUG: assuming atomic context at net/core/flow_dissector.c:737 in_atomic(): 0, irqs_disabled(): 0, pid: 24710, name: syz-executor.3 2 locks held by syz-executor.3/24710: #0: 00000000e81a4bf1 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x168e/0x3ff0 drivers/net/tun.c:1850 #1: 00000000254afebd (rcu_read_lock){....}, at: __skb_flow_dissect+0x1e1/0x4bb0 net/core/flow_dissector.c:822 CPU: 1 PID: 24710 Comm: syz-executor.3 Not tainted 5.1.0+ #6 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 __cant_sleep kernel/sched/core.c:6165 [inline] __cant_sleep.cold+0xa3/0xbb kernel/sched/core.c:6142 bpf_flow_dissect+0xfe/0x390 net/core/flow_dissector.c:737 __skb_flow_dissect+0x362/0x4bb0 net/core/flow_dissector.c:853 skb_flow_dissect_flow_keys_basic include/linux/skbuff.h:1322 [inline] skb_probe_transport_header include/linux/skbuff.h:2500 [inline] skb_probe_transport_header include/linux/skbuff.h:2493 [inline] tun_get_user+0x2cfe/0x3ff0 drivers/net/tun.c:1940 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037 call_write_iter include/linux/fs.h:1872 [inline] do_iter_readv_writev+0x5fd/0x900 fs/read_write.c:693 do_iter_write fs/read_write.c:970 [inline] do_iter_write+0x184/0x610 fs/read_write.c:951 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015 do_writev+0x15b/0x330 fs/read_write.c:1058 __do_sys_writev fs/read_write.c:1131 [inline] __se_sys_writev fs/read_write.c:1128 [inline] __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128 do_syscall_64+0x103/0x670 arch/x86/entry/common.c:298 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Change-Id: I326ce1e572ae9c2dc060c172dbd0227bb8ee9eef Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Petar Penkov <ppenkov@google.com> Cc: Stanislav Fomichev <sdf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 18:42:01 +05:30
Matt Mullins	632a7711e4	UPSTREAM: selftests: bpf: test writable buffers in raw tps This tests that: * a BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE cannot be attached if it uses either: * a variable offset to the tracepoint buffer, or * an offset beyond the size of the tracepoint buffer * a tracer can modify the buffer provided when attached to a writable tracepoint in bpf_prog_test_run Change-Id: I3fdf650c1d9a68d64d08b170b14e46a35d180dd7 Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 18:42:01 +05:30
Stanislav Fomichev	540ab7d6c0	UPSTREAM: bpf/flow_dissector: don't adjust nhoff by ETH_HLEN in BPF_PROG_TEST_RUN Now that we use skb-less flow dissector let's return true nhoff and thoff. We used to adjust them by ETH_HLEN because that's how it was done in the skb case. For VLAN tests that looks confusing: nhoff is pointing to vlan parts :-\ Warning, this is an API change for BPF_PROG_TEST_RUN! Feel free to drop if you think that it's too late at this point to fix it. Change-Id: Icae82218f62c91b7860981bc62269eba8e58c25c Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:42:00 +05:30
Stanislav Fomichev	e21018e474	UPSTREAM: bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode Now that we have bpf_flow_dissect which can work on raw data, use it when doing BPF_PROG_TEST_RUN for flow dissector. Simplifies bpf_prog_test_run_flow_dissector and allows us to test no-skb mode. Note, that previously, with bpf_flow_dissect_skb we used to call eth_type_trans which pulled L2 (ETH_HLEN) header and we explicitly called skb_reset_network_header. That means flow_keys->nhoff would be initialized to 0 (skb_network_offset) in init_flow_keys. Now we call bpf_flow_dissect with nhoff set to ETH_HLEN and need to undo it once the dissection is done to preserve the existing behavior. Change-Id: Icaf9c2a76c6176c647081535f7d48d94fff2e126 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:42:00 +05:30
Stanislav Fomichev	fc4ff96a42	UPSTREAM: bpf: explicitly prohibit ctx_{in, out} in non-skb BPF_PROG_TEST_RUN This should allow us later to extend BPF_PROG_TEST_RUN for non-skb case and be sure that nobody is erroneously setting ctx_{in,out}. Fixes: b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Change-Id: Ibe85a9217cd265966ef57e1298b37a6b9baa50c4 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:42:00 +05:30
Bo YU	c556d97c4d	UPSTREAM: bpf: fix warning about using plain integer as NULL Sparse warning below: sudo make C=2 CF=-D__CHECK_ENDIAN__ M=net/bpf/ CHECK net/bpf//test_run.c net/bpf//test_run.c:19:77: warning: Using plain integer as NULL pointer ./include/linux/bpf-cgroup.h:295:77: warning: Using plain integer as NULL pointer Fixes: 8bad74f9840f ("bpf: extend cgroup bpf core to allow multiple cgroup storage types") Acked-by: Yonghong Song <yhs@fb.com> Change-Id: I1dba06e69eb1e3b4a3163dc32b48c4fc6bdc755d Signed-off-by: Bo YU <tsu.yubo@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:41:59 +05:30
Stanislav Fomichev	027a301e01	UPSTREAM: bpf/test_run: fix unkillable BPF_PROG_TEST_RUN for flow dissector Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff makes process unkillable. The problem is that when CONFIG_PREEMPT is enabled, we never see need_resched() return true. This is due to the fact that preempt_enable() (which we do in bpf_test_run_one on each iteration) now handles resched if it's needed. Let's disable preemption for the whole run, not per test. In this case we can properly see whether resched is needed. Let's also properly return -EINTR to the userspace in case of a signal interrupt. This is a follow up for a recently fixed issue in bpf_test_run, see commit df1a2cb7c74b ("bpf/test_run: fix unkillable BPF_PROG_TEST_RUN"). Reported-by: syzbot <syzkaller@googlegroups.com> Change-Id: I38d1d9553d1ee281666de7000d06684962658397 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:41:59 +05:30
Stanislav Fomichev	8bb7a7489c	UPSTREAM: bpf/test_run: fix unkillable BPF_PROG_TEST_RUN Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff makes process unkillable. The problem is that when CONFIG_PREEMPT is enabled, we never see need_resched() return true. This is due to the fact that preempt_enable() (which we do in bpf_test_run_one on each iteration) now handles resched if it's needed. Let's disable preemption for the whole run, not per test. In this case we can properly see whether resched is needed. Let's also properly return -EINTR to the userspace in case of a signal interrupt. See recent discussion: http://lore.kernel.org/netdev/CAH3MdRWHr4N8jei8jxDppXjmw-Nw=puNDLbu1dQOFQHxfU2onA@mail.gmail.com I'll follow up with the same fix bpf_prog_test_run_flow_dissector in bpf-next. Reported-by: syzbot <syzkaller@googlegroups.com> Change-Id: I6744e83dc918d20f44cdf9f37cbb0f223a970aa4 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:41:58 +05:30
Lorenz Bauer	85c8c0d034	UPSTREAM: bpf: respect size hint to BPF_PROG_TEST_RUN if present Use data_size_out as a size hint when copying test output to user space. ENOSPC is returned if the output buffer is too small. Callers which so far did not set data_size_out are not affected. Change-Id: Ic1a42d1903e96a26a27a56489b75be05c58996ff Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 18:41:58 +05:30
Stanislav Fomichev	0d382b1663	BACKPORT: bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT Performance impact should be minimal because it's under a new BPF_SOCK_OPS_RTT_CB_FLAG flag that has to be explicitly enabled. Suggested-by: Eric Dumazet <edumazet@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Priyaranjan Jha <priyarjha@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Change-Id: I19814edf78101f87e6dc364343f8173d9e230850 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:41:57 +05:30
Stanislav Fomichev	d9d8a2e741	UPSTREAM: bpf: support cloning sk storage on accept() Add new helper bpf_sk_storage_clone which optionally clones sk storage and call it from sk_clone_lock. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Change-Id: Iee30d2442b76f6fd7904829314e63f3391e7d811 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:41:56 +05:30
Stanislav Fomichev	510d66ef1a	UPSTREAM: bpf: fix missing bpf_check_uarg_tail_zero in BPF_PROG_TEST_RUN Commit b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN") started using bpf_check_uarg_tail_zero in BPF_PROG_TEST_RUN. However, bpf_check_uarg_tail_zero is not defined for !CONFIG_BPF_SYSCALL: net/bpf/test_run.c: In function ‘bpf_ctx_init’: net/bpf/test_run.c:142:9: error: implicit declaration of function ‘bpf_check_uarg_tail_zero’ [-Werror=implicit-function-declaration] err = bpf_check_uarg_tail_zero(data_in, max_size, size); ^~~~~~~~~~~~~~~~~~~~~~~~ Let's not build net/bpf/test_run.c when CONFIG_BPF_SYSCALL is not set. Reported-by: kbuild test robot <lkp@intel.com> Fixes: b0b9395d865e ("bpf: support input __sk_buff context in BPF_PROG_TEST_RUN") Change-Id: Iddc8eb00de3fbde87f765f533b91d2fbd3b41603 Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 18:41:55 +05:30
Willem de Bruijn	a34192bb10	UPSTREAM: bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags [ Upstream commit d4bac0288a2b444e468e6df9cb4ed69479ddf14a ] Classic BPF socket filters with SKB_NET_OFF and SKB_LL_OFF fail to read when these offsets extend into frags. This has been observed with iwlwifi and reproduced with tun with IFF_NAPI_FRAGS. The below straightforward socket filter on UDP port, applied to a RAW socket, will silently miss matching packets. const int offset_proto = offsetof(struct ip6_hdr, ip6_nxt); const int offset_dport = sizeof(struct ip6_hdr) + offsetof(struct udphdr, dest); struct sock_filter filter_code[] = { BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_AD_OFF + SKF_AD_PKTTYPE), BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, PACKET_HOST, 0, 4), BPF_STMT(BPF_LD + BPF_B + BPF_ABS, SKF_NET_OFF + offset_proto), BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 2), BPF_STMT(BPF_LD + BPF_H + BPF_ABS, SKF_NET_OFF + offset_dport), This is unexpected behavior. Socket filter programs should be consistent regardless of environment. Silent misses are particularly concerning as hard to detect. Use skb_copy_bits for offsets outside linear, same as done for non-SKF_(LL\|NET) offsets. Offset is always positive after subtracting the reference threshold SKB_(LL\|NET)_OFF, so is always >= skb_(mac\|network)_offset. The sum of the two is an offset against skb->data, and may be negative, but it cannot point before skb->head, as skb_(mac\|network)_offset would too. This appears to go back to when frag support was introduced to sk_run_filter in linux-2.4.4, before the introduction of git. The amount of code change and 8/16/32 bit duplication are unfortunate. But any attempt I made to be smarter saved very few LoC while complicating the code. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/netdev/20250122200402.3461154-1-maze@google.com/ Link: https://elixir.bootlin.com/linux/2.4.4/source/net/core/filter.c#L244 Reported-by: Matt Moeller <moeller.matt@gmail.com> Co-developed-by: Maciej Żenczykowski <maze@google.com> Change-Id: Ia1ff7be23c39d872596848731818740696c86c29 Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://lore.kernel.org/r/20250408132833.195491-2-willemdebruijn.kernel@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:55 +05:30
Ben Dooks	a576634920	UPSTREAM: bpf: Add endian modifiers to fix endian warnings [ Upstream commit 96a233e600df351bcb06e3c20efe048855552926 ] A couple of the syscalls which load values (bpf_skb_load_helper_16() and bpf_skb_load_helper_32()) are using u16/u32 types which are triggering warnings as they are then converted from big-endian to CPU-endian. Fix these by making the types __be instead. Fixes the following sparse warnings: net/core/filter.c:246:32: warning: cast to restricted __be16 net/core/filter.c:246:32: warning: cast to restricted __be16 net/core/filter.c:246:32: warning: cast to restricted __be16 net/core/filter.c:246:32: warning: cast to restricted __be16 net/core/filter.c:273:32: warning: cast to restricted __be32 net/core/filter.c:273:32: warning: cast to restricted __be32 net/core/filter.c:273:32: warning: cast to restricted __be32 net/core/filter.c:273:32: warning: cast to restricted __be32 net/core/filter.c:273:32: warning: cast to restricted __be32 net/core/filter.c:273:32: warning: cast to restricted __be32 Change-Id: I6486ac70a70ffae777a6cd01f4f4b6926f44797f Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220714105101.297304-1-ben.dooks@sifive.com Stable-dep-of: d4bac0288a2b ("bpf: support SKF_NET_OFF and SKF_LL_OFF on skb frags") Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:54 +05:30
Cong Wang	ca4bde13e5	UPSTREAM: bpf: Check negative offsets in __bpf_skb_min_len() [ Upstream commit 9ecc4d858b92c1bb0673ad9c327298e600c55659 ] skb_network_offset() and skb_transport_offset() can be negative when they are called after we pull the transport header, for example, when we use eBPF sockmap at the point of ->sk_data_ready(). __bpf_skb_min_len() uses an unsigned int to get these offsets, this leads to a very large number which then causes bpf_skb_change_tail() failed unexpectedly. Fix this by using a signed int to get these offsets and ensure the minimum is at least zero. Fixes: `5293efe62d` ("bpf: add bpf_skb_change_tail helper") Change-Id: I6267f793ec92ff120f2a5b689976ba19443dae9a Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241213034057.246437-2-xiyou.wangcong@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:54 +05:30
Zijian Zhang	3051bf87f5	UPSTREAM: bpf, sockmap: Fix sk_msg_reset_curr [ Upstream commit 955afd57dc4bf7e8c620a0a9e3af3c881c2c6dff ] Found in the test_txmsg_pull in test_sockmap, ``` txmsg_cork = 512; // corking is importrant here opt->iov_length = 3; opt->iov_count = 1; opt->rate = 512; // sendmsg will be invoked 512 times ``` The first sendmsg will send an sk_msg with size 3, and bpf_msg_pull_data will be invoked the first time. sk_msg_reset_curr will reset the copybreak from 3 to 0. In the second sendmsg, since we are in the stage of corking, psock->cork will be reused in func sk_msg_alloc. msg->sg.copybreak is 0 now, the second msg will overwrite the first msg. As a result, we could not pass the data integrity test. The same problem happens in push and pop test. Thus, fix sk_msg_reset_curr to restore the correct copybreak. Fixes: bb9aefde5bba ("bpf: sockmap, updating the sg structure should also update curr") Change-Id: Ibbac29c62d1b26f03d5f3588c14316829aac017f Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Link: https://lore.kernel.org/r/20241106222520.527076-9-zijianzhang@bytedance.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:53 +05:30
Zijian Zhang	1ebe3da854	UPSTREAM: bpf, sockmap: Several fixes to bpf_msg_pop_data [ Upstream commit 5d609ba262475db450ba69b8e8a557bd768ac07a ] Several fixes to bpf_msg_pop_data, 1. In sk_msg_shift_left, we should put_page 2. if (len == 0), return early is better 3. pop the entire sk_msg (last == msg->sg.size) should be supported 4. Fix for the value of variable "a" 5. In sk_msg_shift_left, after shifting, i has already pointed to the next element. Addtional sk_msg_iter_var_next may result in BUG. Fixes: 7246d8ed4dcc ("bpf: helper to pop data from messages") Change-Id: I03032e0683aec3b938bb1340b493e280c8e871b8 Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20241106222520.527076-8-zijianzhang@bytedance.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:53 +05:30
Zijian Zhang	d1493947c0	UPSTREAM: bpf, sockmap: Several fixes to bpf_msg_push_data [ Upstream commit 15ab0548e3107665c34579ae523b2b6e7c22082a ] Several fixes to bpf_msg_push_data, 1. test_sockmap has tests where bpf_msg_push_data is invoked to push some data at the end of a message, but -EINVAL is returned. In this case, in bpf_msg_push_data, after the first loop, i will be set to msg->sg.end, add the logic to handle it. 2. In the code block of "if (start - offset)", it's possible that "i" points to the last of sk_msg_elem. In this case, "sk_msg_iter_next(msg, end)" might still be called twice, another invoking is in "if (!copy)" code block, but actually only one is needed. Add the logic to handle it, and reconstruct the code to make the logic more clear. Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data") Change-Id: Ie497b74e5b7e5ff9b8332096e8b4611109ff51be Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Link: https://lore.kernel.org/r/20241106222520.527076-7-zijianzhang@bytedance.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:52 +05:30
Fred Li	f3f691f1ba	UPSTREAM: bpf: Fix a segment issue when downgrading gso_size [ Upstream commit fa5ef655615a01533035c6139248c5b33aa27028 ] Linearize the skb when downgrading gso_size because it may trigger a BUG_ON() later when the skb is segmented as described in [1,2]. Fixes: `2be7e212d5` ("bpf: add bpf_skb_adjust_room helper") Change-Id: I130565140af5a4ca59c1742d7cff5fa2dacb7817 Signed-off-by: Fred Li <dracodingfly@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/all/20240626065555.35460-2-dracodingfly@gmail.com [1] Link: https://lore.kernel.org/all/668d5cf1ec330_1c18c32947@willemb.c.googlers.com.notmuch [2] Link: https://lore.kernel.org/bpf/20240719024653.77006-1-dracodingfly@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:52 +05:30
John Fastabend	401e6f7e7f	UPSTREAM: bpf: sockmap, updating the sg structure should also update curr [ Upstream commit bb9aefde5bbaf6c168c77ba635c155b4980c2287 ] Curr pointer should be updated when the sg structure is shifted. Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages") Change-Id: I3c5d1dd94c2b874b6f0c34010f8e2a766f6081c1 Signed-off-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20231206232706.374377-3-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:51 +05:30
Martin KaFai Lau	e04cccd4fc	BACKPORT: bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state commit 1fe4850b34ab512ff911e2c035c75fb6438f7307 upstream. The bpf_fib_lookup() helper does not only look up the fib (ie. route) but it also looks up the neigh. Before returning the neigh, the helper does not check for NUD_VALID. When a neigh state (neigh->nud_state) is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper still returns SUCCESS instead of NO_NEIGH in this case. Because of the SUCCESS return value, the bpf prog directly uses the returned dmac and ends up filling all zero in the eth header. This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is not valid. Change-Id: I1d81fbe4e026d2572099c684582a30a0e62e5f03 Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217004150.2980689-3-martin.lau@linux.dev Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-12-28 18:41:51 +05:30
Stanislav Fomichev	c534323a39	UPSTREAM: bpf: Move skb->len == 0 checks into __bpf_redirect [ Upstream commit 114039b342014680911c35bd6b72624180fd669a ] To avoid potentially breaking existing users. Both mac/no-mac cases have to be amended; mac_header >= network_header is not enough (verified with a new test, see next patch). Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len") Change-Id: I7f16f664a202d1f962bd9e2d8d8e60064f62ef4f Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221121180340.1983627-1-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:50 +05:30
Jon Maxwell	afda5c7962	UPSTREAM: bpf: Fix request_sock leak in sk lookup helpers [ Upstream commit 3046a827316c0e55fc563b4fb78c93b9ca5c7c37 ] A customer reported a request_socket leak in a Calico cloud environment. We found that a BPF program was doing a socket lookup with takes a refcnt on the socket and that it was finding the request_socket but returning the parent LISTEN socket via sk_to_full_sk() without decrementing the child request socket 1st, resulting in request_sock slab object leak. This patch retains the existing behaviour of returning full socks to the caller but it also decrements the child request_socket if one is present before doing so to prevent the leak. Thanks to Curtis Taylor for all the help in diagnosing and testing this. And thanks to Antoine Tenart for the reproducer and patch input. v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to validate RCU flags on the listen socket so that it balances with bpf_sk_release() and update comments as per Martin KaFai Lau's suggestion. One small change to Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)" to avoid an extra instruction. Fixes: f7355a6c0497 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()") Fixes: edbf8c01de5a ("bpf: add skc_lookup_tcp helper") Co-developed-by: Antoine Tenart <atenart@kernel.org> Change-Id: I4867220345184d2c036b76988b3267f5bc6aeb63 Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Curtis Taylor <cutaylor-pub@yahoo.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/56d6f898-bde0-bb25-3427-12a330b29fb8@iogearbox.net Link: https://lore.kernel.org/bpf/20220615011540.813025-1-jmaxwell37@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:49 +05:30
Maxim Mikityanskiy	6f28e767ab	UPSTREAM: bpf: Support dual-stack sockets in bpf_tcp_check_syncookie [ Upstream commit 2e8702cc0cfa1080f29fd64003c00a3e24ac38de ] bpf_tcp_gen_syncookie looks at the IP version in the IP header and validates the address family of the socket. It supports IPv4 packets in AF_INET6 dual-stack sockets. On the other hand, bpf_tcp_check_syncookie looks only at the address family of the socket, ignoring the real IP version in headers, and validates only the packet size. This implementation has some drawbacks: 1. Packets are not validated properly, allowing a BPF program to trick bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4 socket. 2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end up receiving a SYNACK with the cookie, but the following ACK gets dropped. This patch fixes these issues by changing the checks in bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP version from the header is taken into account, and it is validated properly with address family. Fixes: 399040847084 ("bpf: add helper to check for a valid SYN cookie") Change-Id: Id130e436818a58b9775e61f9d11a029561b1e7a4 Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Arthur Fabre <afabre@cloudflare.com> Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:49 +05:30
Jakub Sitnicki	1afd149e04	UPSTREAM: bpf: Make dst_port field in struct bpf_sock 16-bit wide [ Upstream commit 4421a582718ab81608d8486734c18083b822390d ] Menglong Dong reports that the documentation for the dst_port field in struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the field is a zero-padded 16-bit integer in network byte order. The value appears to the BPF user as if laid out in memory as so: offsetof(struct bpf_sock, dst_port) + 0 <port MSB> + 8 <port LSB> +16 0x00 +24 0x00 32-, 16-, and 8-bit wide loads from the field are all allowed, but only if the offset into the field is 0. 32-bit wide loads from dst_port are especially confusing. The loaded value, after converting to host byte order with bpf_ntohl(dst_port), contains the port number in the upper 16-bits. Remove the confusion by splitting the field into two 16-bit fields. For backward compatibility, allow 32-bit wide loads from offsetof(struct bpf_sock, dst_port). While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port. Reported-by: Menglong Dong <imagedong@tencent.com> Change-Id: Id86817d538b4f552ca112639c0a40fb2d8bd9eb9 Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20220130115518.213259-2-jakub@cloudflare.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:48 +05:30
Felix Maurer	171f00c51c	UPSTREAM: bpf: Do not try bpf_msg_push_data with len 0 commit 4a11678f683814df82fca9018d964771e02d7e6d upstream. If bpf_msg_push_data() is called with len 0 (as it happens during selftests/bpf/test_sockmap), we do not need to do anything and can return early. Calling bpf_msg_push_data() with len 0 previously lead to a wrong ENOMEM error: we later called get_order(copy + len); if len was 0, copy + len was also often 0 and get_order() returned some undefined value (at the moment 52). alloc_pages() caught that and failed, but then bpf_msg_push_data() returned ENOMEM. This was wrong because we are most probably not out of memory and actually do not need any additional memory. Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data") Change-Id: I7d37f4ebfd344bb8c372a3376f33e02afef024e9 Signed-off-by: Felix Maurer <fmaurer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/df69012695c7094ccb1943ca02b4920db3537466.1644421921.git.fmaurer@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-12-28 18:41:48 +05:30
Kuniyuki Iwashima	3c7b131958	UPSTREAM: bpf: Fix SO_RCVBUF/SO_SNDBUF handling in _bpf_setsockopt(). [ Upstream commit 04c350b1ae6bdb12b84009a4d0bf5ab4e621c47b ] The commit 4057765f2dee ("sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values") added a change to prevent underflow in setsockopt() around SO_SNDBUF/SO_RCVBUF. This patch adds the same change to _bpf_setsockopt(). Fixes: 4057765f2dee ("sock: consistent handling of extreme SO_SNDBUF/SO_RCVBUF values") Change-Id: I797792499bbaee6cc5058519bfb6e33633d76af5 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220104013153.97906-2-kuniyu@amazon.co.jp Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:47 +05:30
John Fastabend	910788a608	UPSTREAM: bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding [ Upstream commit e0dc3b93bd7bcff8c3813d1df43e0908499c7cf0 ] Strparser is reusing the qdisc_skb_cb struct to stash the skb message handling progress, e.g. offset and length of the skb. First this is poorly named and inherits a struct from qdisc that doesn't reflect the actual usage of cb[] at this layer. But, more importantly strparser is using the following to access its metadata. (struct _strp_msg )((void )skb->cb + offsetof(struct qdisc_skb_cb, data)) Where _strp_msg is defined as: struct _strp_msg { struct strp_msg strp; /* 0 8 / int accum_len; / 8 4 / / size: 12, cachelines: 1, members: 2 / / last cacheline: 12 bytes */ }; So we use 12 bytes of ->data[] in struct. However in BPF code running parser and verdict the user has read capabilities into the data[] array as well. Its not too problematic, but we should not be exposing internal state to BPF program. If its really needed then we can use the probe_read() APIs which allow reading kernel memory. And I don't believe cb[] layer poses any API breakage by moving this around because programs can't depend on cb[] across layers. In order to fix another issue with a ctx rewrite we need to stash a temp variable somewhere. To make this work cleanly this patch builds a cb struct for sk_skb types called sk_skb_cb struct. Then we can use this consistently in the strparser, sockmap space. Additionally we can start allowing ->cb[] write access after this. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Change-Id: I2512e377df9d6f73136b43e59c364c6c87be5d1d Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20211103204736.248403-5-john.fastabend@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 18:41:46 +05:30

1 2 3 4 5 ...

53396 Commits