kernel_realme_sm7125

Evolution-X-Devices/kernel_realme_sm7125

Author	SHA1	Message	Date
Greg Kroah-Hartman	56f57b9ff9	UPSTREAM: Revert "bpf: Add map and need_defer parameters to .map_fd_put_ptr()" This reverts commit eb6f68ec92ab60b0540ebf64fe851e99d846e086 which is commit 20c20bd11a0702ce4dc9300c3da58acf551d9725 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I4611eed3677738ab29469733e2b4f6734ef3d605 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2025-12-28 18:39:24 +05:30
Thomas Gleixner	0ca2ddec81	BACKPORT: treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 64 file(s). Change-Id: Ic7cca08bbba3c38e0d53d3374c43ee8bf1e24172 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-12-28 16:55:54 +05:30
Roman Gushchin	1b3dacdeb4	UPSTREAM: bpf: move memory size checks to bpf_map_charge_init() Most bpf map types doing similar checks and bytes to pages conversion during memory allocation and charging. Let's unify these checks by moving them into bpf_map_charge_init(). Change-Id: I55ceded2303102feba9e485042e8f5169f490609 Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 16:55:52 +05:30
Roman Gushchin	9c99f47f18	UPSTREAM: bpf: rework memlock-based memory accounting for maps In order to unify the existing memlock charging code with the memcg-based memory accounting, which will be added later, let's rework the current scheme. Currently the following design is used: 1) .alloc() callback optionally checks if the allocation will likely succeed using bpf_map_precharge_memlock() 2) .alloc() performs actual allocations 3) .alloc() callback calculates map cost and sets map.memory.pages 4) map_create() calls bpf_map_init_memlock() which sets map.memory.user and performs actual charging; in case of failure the map is destroyed <map is in use> 1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which performs uncharge and releases the user 2) .map_free() callback releases the memory The scheme can be simplified and made more robust: 1) .alloc() calculates map cost and calls bpf_map_charge_init() 2) bpf_map_charge_init() sets map.memory.user and performs actual charge 3) .alloc() performs actual allocations <map is in use> 1) .map_free() callback releases the memory 2) bpf_map_charge_finish() performs uncharge and releases the user The new scheme also allows to reuse bpf_map_charge_init()/finish() functions for memcg-based accounting. Because charges are performed before actual allocations and uncharges after freeing the memory, no bogus memory pressure can be created. In cases when the map structure is not available (e.g. it's not created yet, or is already destroyed), on-stack bpf_map_memory structure is used. The charge can be transferred with the bpf_map_charge_move() function. Change-Id: I299bfa9d3e74f366861b6de3bf17951a1374824b Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 16:55:49 +05:30
Roman Gushchin	17cb21617c	UPSTREAM: bpf: group memory related fields in struct bpf_map_memory Group "user" and "pages" fields of bpf_map into the bpf_map_memory structure. Later it can be extended with "memcg" and other related information. The main reason for a such change (beside cosmetics) is to pass bpf_map_memory structure to charging functions before the actual allocation of bpf_map. Change-Id: I04e4edf805bfe4c26fce45f7166317fe00dd0dfa Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 16:55:32 +05:30
Daniel Borkmann	7e651b7c95	UPSTREAM: bpf: allow for key-less BTF in array map Given we'll be reusing BPF array maps for global data/bss/rodata sections, we need a way to associate BTF DataSec type as its map value type. In usual cases we have this ugly BPF_ANNOTATE_KV_PAIR() macro hack e.g. via 38d5d3b3d5db ("bpf: Introduce BPF_ANNOTATE_KV_PAIR") to get initial map to type association going. While more use cases for it are discouraged, this also won't work for global data since the use of array map is a BPF loader detail and therefore unknown at compilation time. For array maps with just a single entry we make an exception in terms of BTF in that key type is declared optional if value type is of DataSec type. The latter LLVM is guaranteed to emit and it also aligns with how we regard global data maps as just a plain buffer area reusing existing map facilities for allowing things like introspection with existing tools. Change-Id: I6fd7e20b453529e07aa1c77beacff4e62c7500bd Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 16:54:13 +05:30
Daniel Borkmann	c0b753938a	UPSTREAM: bpf: add program side {rd, wr}only support for maps This work adds two new map creation flags BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG in order to allow for read-only or write-only BPF maps from a BPF program side. Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only applies to system call side, meaning the BPF program has full read/write access to the map as usual while bpf(2) calls with map fd can either only read or write into the map depending on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows for the exact opposite such that verifier is going to reject program loads if write into a read-only map or a read into a write-only map is detected. For read-only map case also some helpers are forbidden for programs that would alter the map state such as map deletion, update, etc. As opposed to the two BPF_F_RDONLY / BPF_F_WRONLY flags, BPF_F_RDONLY_PROG as well as BPF_F_WRONLY_PROG really do correspond to the map lifetime. We've enabled this generic map extension to various non-special maps holding normal user data: array, hash, lru, lpm, local storage, queue and stack. Further generic map types could be followed up in future depending on use-case. Main use case here is to forbid writes into .rodata map values from verifier side. Change-Id: Iad96790cec92137902fe3ad12f53f1a94d58bc61 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 16:53:37 +05:30
Daniel Borkmann	a4fa738008	BACKPORT: bpf: implement lookup-free direct value access for maps This generic extension to BPF maps allows for directly loading an address residing inside a BPF map value as a single BPF ldimm64 instruction! The idea is similar to what BPF_PSEUDO_MAP_FD does today, which is a special src_reg flag for ldimm64 instruction that indicates that inside the first part of the double insns's imm field is a file descriptor which the verifier then replaces as a full 64bit address of the map into both imm parts. For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following: the first part of the double insns's imm field is again a file descriptor corresponding to the map, and the second part of the imm field is an offset into the value. The verifier will then replace both imm parts with an address that points into the BPF map value at the given value offset for maps that support this operation. Currently supported is array map with single entry. It is possible to support more than just single map element by reusing both 16bit off fields of the insns as a map index, so full array map lookup could be expressed that way. It hasn't been implemented here due to lack of concrete use case, but could easily be done so in future in a compatible way, since both off fields right now have to be 0 and would correctly denote a map index 0. The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of map pointer versus load of map's value at offset 0, and changing BPF_PSEUDO_MAP_FD's encoding into off by one to differ between regular map pointer and map value pointer would add unnecessary complexity and increases barrier for debugability thus less suitable. Using the second part of the imm field as an offset into the value does /not/ come with limitations since maximum possible value size is in u32 universe anyway. This optimization allows for efficiently retrieving an address to a map value memory area without having to issue a helper call which needs to prepare registers according to calling convention, etc, without needing the extra NULL test, and without having to add the offset in an additional instruction to the value base pointer. The verifier then treats the destination register as PTR_TO_MAP_VALUE with constant reg->off from the user passed offset from the second imm field, and guarantees that this is within bounds of the map value. Any subsequent operations are normally treated as typical map value handling without anything extra needed from verification side. The two map operations for direct value access have been added to array map for now. In future other types could be supported as well depending on the use case. The main use case for this commit is to allow for BPF loader support for global variables that reside in .data/.rodata/.bss sections such that we can directly load the address of them with minimal additional infrastructure required. Loader support has been added in subsequent commits for libbpf library. Change-Id: I51974f2fe227ba837b338b8b3ebb44c145583673 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 16:53:35 +05:30
Alexei Starovoitov	c738604c2e	UPSTREAM: bpf: introduce BPF_F_LOCK flag Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands and for map_update() helper function. In all these cases take a lock of existing element (which was provided in BTF description) before copying (in or out) the rest of map value. Implementation details that are part of uapi: Array: The array map takes the element lock for lookup/update. Hash: hash map also takes the lock for lookup/update and tries to avoid the bucket lock. If old element exists it takes the element lock and updates the element in place. If element doesn't exist it allocates new one and inserts into hash table while holding the bucket lock. In rare case the hashmap has to take both the bucket lock and the element lock to update old value in place. Cgroup local storage: It is similar to array. update in place and lookup are done with lock taken. Change-Id: I76b13e23e1f6241c1f919a1c24650530f7705d9e Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 16:52:34 +05:30
Alexei Starovoitov	d65ea532aa	BACKPORT: bpf: introduce bpf_spin_lock Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let bpf program serialize access to other variables. Example: struct hash_elem { int cnt; struct bpf_spin_lock lock; }; struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key); if (val) { bpf_spin_lock(&val->lock); val->cnt++; bpf_spin_unlock(&val->lock); } Restrictions and safety checks: - bpf_spin_lock is only allowed inside HASH and ARRAY maps. - BTF description of the map is mandatory for safety analysis. - bpf program can take one bpf_spin_lock at a time, since two or more can cause dead locks. - only one 'struct bpf_spin_lock' is allowed per map element. It drastically simplifies implementation yet allows bpf program to use any number of bpf_spin_locks. - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed. - bpf program must bpf_spin_unlock() before return. - bpf program can access 'struct bpf_spin_lock' only via bpf_spin_lock()/bpf_spin_unlock() helpers. - load/store into 'struct bpf_spin_lock lock;' field is not allowed. - to use bpf_spin_lock() helper the BTF description of map value must be a struct and have 'struct bpf_spin_lock anyname;' field at the top level. Nested lock inside another struct is not allowed. - syscall map_lookup doesn't copy bpf_spin_lock field to user space. - syscall map_update and program map_update do not update bpf_spin_lock field. - bpf_spin_lock cannot be on the stack or inside networking packet. bpf_spin_lock can only be inside HASH or ARRAY map value. - bpf_spin_lock is available to root only and to all program types. - bpf_spin_lock is not allowed in inner maps of map-in-map. - ld_abs is not allowed inside spin_lock-ed region. - tracing progs and socket filter progs cannot use bpf_spin_lock due to insufficient preemption checks Implementation details: - cgroup-bpf class of programs can nest with xdp/tc programs. Hence bpf_spin_lock is equivalent to spin_lock_irqsave. Other solutions to avoid nested bpf_spin_lock are possible. Like making sure that all networking progs run with softirq disabled. spin_lock_irqsave is the simplest and doesn't add overhead to the programs that don't use it. - arch_spinlock_t is used when its implemented as queued_spin_lock - archs can force their own arch_spinlock_t - on architectures where queued_spin_lock is not available and sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used. - presence of bpf_spin_lock inside map value could have been indicated via extra flag during map_create, but specifying it via BTF is cleaner. It provides introspection for map key/value and reduces user mistakes. Next steps: - allow bpf_spin_lock in other map types (like cgroup local storage) - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper to request kernel to grab bpf_spin_lock before rewriting the value. That will serialize access to map elements. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Change-Id: Id03322189a8f05c006a05479f7078b23c8c020ea Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 16:52:33 +05:30
Roman Gushchin	f2c3f40d17	UPSTREAM: bpf: pass struct btf pointer to the map_check_btf() callback If key_type or value_type are of non-trivial data types (e.g. structure or typedef), it's not possible to check them without the additional information, which can't be obtained without a pointer to the btf structure. So, let's pass btf pointer to the map_check_btf() callbacks. Change-Id: I95716060b450288d4ffcbe231d1cf5fdb530e292 Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 16:51:44 +05:30
Prashant Bhole	22e4ac9e11	UPSTREAM: bpf: return EOPNOTSUPP when map lookup isn't supported Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below map types: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Change-Id: I13937c36055b419f4446d8bfa06f139c757480c9 Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 16:50:04 +05:30
Yonghong Song	abc203c232	UPSTREAM: bpf: add bpffs pretty print for program array map Added bpffs pretty print for program array map. For a particular array index, if the program array points to a valid program, the "<index>: <prog_id>" will be printed out like 0: 6 which means bpf program with id "6" is installed at index "0". Change-Id: Ibfeac1777df6dc8742debe574ba259d212e7ecea Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-12-28 16:49:33 +05:30
Yonghong Song	6f8425660b	UPSTREAM: bpf: add bpffs pretty print for percpu arraymap/hash/lru_hash Added bpffs pretty print for percpu arraymap, percpu hashmap and percpu lru hashmap. For each map <key, value> pair, the format is: <key_value>: { cpu0: <value_on_cpu0> cpu1: <value_on_cpu1> ... cpun: <value_on_cpun> } For example, on my VM, there are 4 cpus, and for test_btf test in the next patch: cat /sys/fs/bpf/pprint_test_percpu_hash You may get: ... 43602: { cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} } 72847: { cpu0: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu1: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu2: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu3: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} } ... Change-Id: I286e7505765aa92ea9a8919ddecf8434a24fc187 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 16:49:33 +05:30
Tao Chen	ecdf6ce03b	UPSTREAM: bpf: Check percpu map value size first [ Upstream commit 1d244784be6b01162b732a5a7d637dfc024c3203 ] Percpu map is often used, but the map value size limit often ignored, like issue: https://github.com/iovisor/bcc/issues/2519. Actually, percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first, like percpu map of local_storage. Maybe the error message seems clearer compared with "cannot allocate memory". Change-Id: Iadd0604341b9af7bbada45665f06ae1e529a7882 Signed-off-by: Jinke Han <jinkehan@didiglobal.com> Signed-off-by: Tao Chen <chen.dylane@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240910144111.1464912-2-chen.dylane@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-12-28 16:48:51 +05:30
Daniel Borkmann	f8cdf99834	UPSTREAM: bpf: decouple btf from seq bpf fs dump and enable more maps Commit a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") and 699c86d6ec21 ("bpf: btf: add pretty print for hash/lru_hash maps") enabled support for BTF and dumping via BPF fs for array and hash/lru map. However, both can be decoupled from each other such that regular BPF maps can be supported for attaching BTF key/value information, while not all maps necessarily need to dump via map_seq_show_elem() callback. The basic sanity check which is a prerequisite for all maps is that key/value size has to match in any case, and some maps can have extra checks via map_check_btf() callback, e.g. probing certain types or indicating no support in general. With that we can also enable retrieving BTF info for per-cpu map types and lpm. Change-Id: Ic26e072b39b443f3ffd9c1b53a1cd45a3ecea360 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com>	2025-12-28 16:44:30 +05:30
Martin KaFai Lau	3455a0099d	BACKPORT: bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY This patch introduces a new map type BPF_MAP_TYPE_REUSEPORT_SOCKARRAY. To unleash the full potential of a bpf prog, it is essential for the userspace to be capable of directly setting up a bpf map which can then be consumed by the bpf prog to make decision. In this case, decide which SO_REUSEPORT sk to serve the incoming request. By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control and visibility on where a SO_REUSEPORT sk should be located in a bpf map. The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that the bpf prog can directly select a sk from the bpf map. That will raise the programmability of the bpf prog attached to a reuseport group (a group of sk serving the same IP:PORT). For example, in UDP, the bpf prog can peek into the payload (e.g. through the "data" pointer introduced in the later patch) to learn the application level's connection information and then decide which sk to pick from a bpf map. The userspace can tightly couple the sk's location in a bpf map with the application logic in generating the UDP payload's connection information. This connection info contact/API stays within the userspace. Also, when used with map-in-map, the userspace can switch the old-server-process's inner map to a new-server-process's inner map in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)". The bpf prog will then direct incoming requests to the new process instead of the old process. The old process can finish draining the pending requests (e.g. by "accept()") before closing the old-fds. [Note that deleting a fd from a bpf map does not necessary mean the fd is closed] During map_update_elem(), Only SO_REUSEPORT sk (i.e. which has already been added to a reuse->socks[]) can be used. That means a SO_REUSEPORT sk that is "bind()" for UDP or "bind()+listen()" for TCP. These conditions are ensured in "reuseport_array_update_check()". A SO_REUSEPORT sk can only be added once to a map (i.e. the same sk cannot be added twice even to the same map). SO_REUSEPORT already allows another sk to be created for the same IP:PORT. There is no need to re-create a similar usage in the BPF side. When a SO_REUSEPORT is deleted from the "reuse->socks[]" (e.g. "close()"), it will notify the bpf map to remove it from the map also. It is done through "bpf_sk_reuseport_detach()" and it will only be called if >=1 of the "reuse->sock[]" has ever been added to a bpf map. The map_update()/map_delete() has to be in-sync with the "reuse->socks[]". Hence, the same "reuseport_lock" used by "reuse->socks[]" has to be used here also. Care has been taken to ensure the lock is only acquired when the adding sk passes some strict tests. and freeing the map does not require the reuseport_lock. The reuseport_array will also support lookup from the syscall side. It will return a sock_gen_cookie(). The sock_gen_cookie() is on-demand (i.e. a sk's cookie is not generated until the very first map_lookup_elem()). The lookup cookie is 64bits but it goes against the logical userspace expectation on 32bits sizeof(fd) (and as other fd based bpf maps do also). It may catch user in surprise if we enforce value_size=8 while userspace still pass a 32bits fd during update. Supporting different value_size between lookup and update seems unintuitive also. We also need to consider what if other existing fd based maps want to return 64bits value from syscall's lookup in the future. Hence, reuseport_array supports both value_size 4 and 8, and assuming user will usually use value_size=4. The syscall's lookup will return ENOSPC on value_size=4. It will will only return 64bits value from sock_gen_cookie() when user consciously choose value_size=8 (as a signal that lookup is desired) which then requires a 64bits value in both lookup and update. Change-Id: Iacf628194cf89b00b45a75ced7d2dbddfb4c6a8a Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 16:43:43 +05:30
Martin KaFai Lau	320482377c	BACKPORT: bpf: btf: Use exact btf value_size match in map_check_btf() The current map_check_btf() in BPF_MAP_TYPE_ARRAY rejects '> map->value_size' to ensure map_seq_show_elem() will not access things beyond an array element. Yonghong suggested that using '!=' is a more correct check. The 8 bytes round_up on value_size is stored in array->elem_size. Hence, using '!=' on map->value_size is a proper check. This patch also adds new tests to check the btf array key type and value type. Two of these new tests verify the btf's value_size (the change in this patch). It also fixes two existing tests that wrongly encoded a btf's type size (pprint_test) and the value_type_id (in one of the raw_tests[]). However, that do not affect these two BTF verification tests before or after this test changes. These two tests mainly failed at array creation time after this patch. Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") Suggested-by: Yonghong Song <yhs@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Change-Id: I0ded0933cb7ba85373a700f746e9a2901db791e1 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 16:42:58 +05:30
Martin KaFai Lau	4fd21497df	UPSTREAM: bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id" could cause confusion because the "id" of "btf_id" means the BPF obj id given to the BTF object while "btf_key_id" and "btf_value_id" means the BTF type id within that BTF object. To make it clear, btf_key_id and btf_value_id are renamed to btf_key_type_id and btf_value_type_id. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Change-Id: Ib10a5ee00041fb5b3ce1c80a407bb9277386c85b Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 16:32:19 +05:30
Martin KaFai Lau	9538c84ec3	UPSTREAM: bpf: btf: Add pretty print support to the basic arraymap This patch adds pretty print support to the basic arraymap. Support for other bpf maps can be added later. This patch adds new attrs to the BPF_MAP_CREATE command to allow specifying the btf_fd, btf_key_id and btf_value_id. The BPF_MAP_CREATE can then associate the btf to the map if the creating map supports BTF. A BTF supported map needs to implement two new map ops, map_seq_show_elem() and map_check_btf(). This patch has implemented these new map ops for the basic arraymap. It also adds file_operations, bpffs_map_fops, to the pinned map such that the pinned map can be opened and read. After that, the user has an intuitive way to do "cat bpffs/pathto/a-pinned-map" instead of getting an error. bpffs_map_fops should not be extended further to support other operations. Other operations (e.g. write/key-lookup...) should be realized by the userspace tools (e.g. bpftool) through the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc. Follow up patches will allow the userspace to obtain the BTF from a map-fd. Here is a sample output when reading a pinned arraymap with the following map's value: struct map_value { int count_a; int count_b; }; cat /sys/fs/bpf/pinned_array_map: 0: {1,2} 1: {3,4} 2: {5,6} ... Change-Id: I57605a80ce783f8545f7947d0a3445ccd81147e4 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 16:19:41 +05:30
Jakub Kicinski	5cc7e77e1d	BACKPORT: bpf: arraymap: use bpf_map_init_from_attr() Arraymap was not converted to use bpf_map_init_from_attr() to avoid merge conflicts with emergency fixes. Do it now. Change-Id: I6e385e0af32a1837b23d352c19824492f8c86764 Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 16:11:43 +05:30
Jakub Kicinski	144ef06777	UPSTREAM: bpf: arraymap: move checks out of alloc function Use the new callback to perform allocation checks for array maps. The fd maps don't need a special allocation callback, they only need a special check callback. Change-Id: Idaed60c9b368f29ce7994a37d8bb381958ab0aeb Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-12-28 16:11:43 +05:30
Yonghong Song	174c5bdce6	BACKPORT: bpf: perf event change needed for subsequent bpf helpers This patch does not impact existing functionalities. It contains the changes in perf event area needed for subsequent bpf_perf_event_read_value and bpf_perf_prog_read_value helpers. Change-Id: I3cb46eff8eb241ad744ab796bc3300e7560fe05d Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-12-28 15:58:40 +05:30
Hou Tao	ec5141d441	bpf: Add map and need_defer parameters to .map_fd_put_ptr() [ Upstream commit 20c20bd11a0702ce4dc9300c3da58acf551d9725 ] map is the pointer of outer map, and need_defer needs some explanation. need_defer tells the implementation to defer the reference release of the passed element and ensure that the element is still alive before the bpf program, which may manipulate it, exits. The following three cases will invoke map_fd_put_ptr() and different need_defer values will be passed to these callers: 1) release the reference of the old element in the map during map update or map deletion. The release must be deferred, otherwise the bpf program may incur use-after-free problem, so need_defer needs to be true. 2) release the reference of the to-be-added element in the error path of map update. The to-be-added element is not visible to any bpf program, so it is OK to pass false for need_defer parameter. 3) release the references of all elements in the map during map release. Any bpf program which has access to the map must have been exited and released, so need_defer=false will be OK. These two parameters will be used by the following patches to fix the potential use-after-free problem for map-in-map. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 5aa1e7d3f6d0db96c7139677d9e898bbbd6a7dcf) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2025-12-17 18:08:37 +05:30
Greg Kroah-Hartman	f9cf23e1ff	Merge 4.14.79 into android-4.14-p Changes in 4.14.79 xfrm: Validate address prefix lengths in the xfrm selector. xfrm6: call kfree_skb when skb is toobig xfrm: reset transport header back to network header after all input transforms ahave been applied xfrm: reset crypto_done when iterating over multiple input xfrms mac80211: Always report TX status cfg80211: reg: Init wiphy_idx in regulatory_hint_core() mac80211: fix pending queue hang due to TX_DROP cfg80211: Address some corner cases in scan result channel updating mac80211: TDLS: fix skb queue/priority assignment mac80211: fix TX status reporting for ieee80211s xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry. ARM: 8799/1: mm: fix pci_ioremap_io() offset check xfrm: validate template mode netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev arm64: hugetlb: Fix handling of young ptes ARM: dts: BCM63xx: Fix incorrect interrupt specifiers net: macb: Clean 64b dma addresses if they are not detected soc: fsl: qbman: qman: avoid allocating from non existing gen_pool soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift() nl80211: Fix possible Spectre-v1 for NL80211_TXRATE_HT mac80211_hwsim: do not omit multicast announce of first added radio Bluetooth: SMP: fix crash in unpairing pxa168fb: prepare the clock qed: Avoid implicit enum conversion in qed_set_tunn_cls_info qed: Fix mask parameter in qed_vf_prep_tunn_req_tlv qed: Avoid implicit enum conversion in qed_roce_mode_to_flavor qed: Avoid constant logical operation warning in qed_vf_pf_acquire qed: Avoid implicit enum conversion in qed_iwarp_parse_rx_pkt nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds asix: Check for supported Wake-on-LAN modes ax88179_178a: Check for supported Wake-on-LAN modes lan78xx: Check for supported Wake-on-LAN modes sr9800: Check for supported Wake-on-LAN modes r8152: Check for supported Wake-on-LAN Modes smsc75xx: Check for Wake-on-LAN modes smsc95xx: Check for Wake-on-LAN modes cfg80211: fix use-after-free in reg_process_hint() perf/core: Fix perf_pmu_unregister() locking perf/ring_buffer: Prevent concurent ring buffer access perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events net: fec: fix rare tx timeout declance: Fix continuation with the adapter identification message net: qualcomm: rmnet: Skip processing loopback packets locking/ww_mutex: Fix runtime warning in the WW mutex selftest be2net: don't flip hw_features when VXLANs are added/deleted net: cxgb3_main: fix a missing-check bug yam: fix a missing-check bug ocfs2: fix crash in ocfs2_duplicate_clusters_by_page() iwlwifi: mvm: check for short GI only for OFDM iwlwifi: dbg: allow wrt collection before ALIVE iwlwifi: fix the ALIVE notification layout tools/testing/nvdimm: unit test clear-error commands usbip: vhci_hcd: update 'status' file header and format scsi: aacraid: address UBSAN warning regression IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush IB/rxe: put the pool on allocation failure s390/qeth: fix error handling in adapter command callbacks net/mlx5: Fix mlx5_get_vector_affinity function powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=n dm integrity: fail early if required HMAC key is not available net: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b net: phy: Add general dummy stubs for MMD register access net/mlx5e: Refine ets validation function scsi: qla2xxx: Avoid double completion of abort command kbuild: set no-integrated-as before incl. arch Makefile IB/mlx5: Avoid passing an invalid QP type to firmware ARM: tegra: Fix ULPI regression on Tegra20 l2tp: remove configurable payload offset cifs: Use ULL suffix for 64-bit constant test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches KVM: x86: Update the exit_qualification access bits while walking an address sparc64: Fix regression in pmdp_invalidate(). tpm: move the delay_msec increment after sleep in tpm_transmit() bpf: sockmap, map_release does not hold refcnt for pinned maps tpm: tpm_crb: relinquish locality on error path. xen-netfront: Update features after registering netdev xen-netfront: Fix mismatched rtnl_unlock IB/usnic: Update with bug fixes from core code mmc: dw_mmc-rockchip: correct property names in debug MIPS: Workaround GCC __builtin_unreachable reordering bug lan78xx: Don't reset the interface on open enic: do not overwrite error code iio: buffer: fix the function signature to match implementation selftests/powerpc: Add ptrace hw breakpoint test scsi: ibmvfc: Avoid unnecessary port relogin scsi: sd: Remember that READ CAPACITY(16) succeeded btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf net: phy: phylink: Don't release NULL GPIO x86/paravirt: Fix some warning messages net: stmmac: mark PM functions as __maybe_unused kconfig: fix the rule of mainmenu_stmt symbol libertas: call into generic suspend code before turning off power perf tests: Fix indexing when invoking subtests compiler.h: Allow arch-specific asm/compiler.h ARM: dts: imx53-qsb: disable 1.2GHz OPP perf python: Use -Wno-redundant-decls to build with PYTHON=python3 rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window() rxrpc: Only take the rwind and mtu values from latest ACK rxrpc: Fix connection-level abort handling net: ena: fix warning in rmmod caused by double iounmap net: ena: fix NULL dereference due to untimely napi initialization selftests: rtnetlink.sh explicitly requires bash. fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters() sch_netem: restore skb->dev after dequeuing from the rbtree mtd: spi-nor: Add support for is25wp series chips kvm: x86: fix WARN due to uninitialized guest FPU state ARM: dts: r8a7790: Correct critical CPU temperature media: uvcvideo: Fix driver reference counting ALSA: usx2y: Fix invalid stream URBs Revert "netfilter: ipv6: nf_defrag: drop skb dst before queueing" perf tools: Disable parallelism for 'make clean' drm/i915/gvt: fix memory leak of a cmd_entry struct on error exit path bridge: do not add port to router list when receives query with source 0.0.0.0 net: bridge: remove ipv6 zero address check in mcast queries ipv6: mcast: fix a use-after-free in inet6_mc_check ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called llc: set SOCK_RCU_FREE in llc_sap_add_socket() net: fec: don't dump RX FIFO register when not available net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs net: sched: gred: pass the right attribute to gred_change_table_def() net: socket: fix a missing-check bug net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules net: udp: fix handling of CHECKSUM_COMPLETE packets r8169: fix NAPI handling under high load sctp: fix race on sctp_id2asoc udp6: fix encap return code for resubmitting vhost: Fix Spectre V1 vulnerability virtio_net: avoid using netif_tx_disable() for serializing tx routine ethtool: fix a privilege escalation bug bonding: fix length of actor system ip6_tunnel: Fix encapsulation layout openvswitch: Fix push/pop ethernet validation net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type net: sched: Fix for duplicate class dump net: drop skb on failure in ip_check_defrag() net: fix pskb_trim_rcsum_slow() with odd trim offset net/mlx5e: fix csum adjustments caused by RXFCS rtnetlink: Disallow FDB configuration for non-Ethernet device net: ipmr: fix unresolved entry dumps net: bcmgenet: Poll internal PHY for GENETv5 net/sched: cls_api: add missing validation of netlink attributes net/mlx5: Fix build break when CONFIG_SMP=n Linux 4.14.79 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-11-08 07:44:15 -08:00
John Fastabend	3c0cff34e9	bpf: sockmap, map_release does not hold refcnt for pinned maps [ Upstream commit ba6b8de423f8d0dee48d6030288ed81c03ddf9f0 ] Relying on map_release hook to decrement the reference counts when a map is removed only works if the map is not being pinned. In the pinned case the ref is decremented immediately and the BPF programs released. After this BPF programs may not be in-use which is not what the user would expect. This patch moves the release logic into bpf_map_put_uref() and brings sockmap in-line with how a similar case is handled in prog array maps. Fixes: 3d9e952697de ("bpf: sockmap, fix leaking maps with attached but not detached progs") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2018-11-04 14:52:44 +01:00
Greg Kroah-Hartman	4576e0eca9	Merge 4.14.26 into android-4.14 Changes in 4.14.26 bpf: fix mlock precharge on arraymaps bpf: fix memory leak in lpm_trie map_free callback function bpf: fix rcu lockdep warning for lpm_trie map_free callback bpf, x64: implement retpoline for tail call bpf, arm64: fix out of bounds access in tail call bpf: add schedule points in percpu arrays management bpf: allow xadd only on aligned memory bpf, ppc64: fix out of bounds access in tail call KVM: x86: fix backward migration with async_PF Linux 4.14.26 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-03-11 17:37:01 +01:00
Eric Dumazet	e1760b3563	bpf: add schedule points in percpu arrays management [ upstream commit 32fff239de37ef226d5b66329dd133f64d63b22d ] syszbot managed to trigger RCU detected stalls in bpf_array_free_percpu() It takes time to allocate a huge percpu map, but even more time to free it. Since we run in process context, use cond_resched() to yield cpu if needed. Fixes: `a10423b87a` ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-03-11 16:23:22 +01:00
Daniel Borkmann	d9fd73c60b	bpf: fix mlock precharge on arraymaps [ upstream commit 9c2d63b843a5c8a8d0559cc067b5398aa5ec3ffc ] syzkaller recently triggered OOM during percpu map allocation; while there is work in progress by Dennis Zhou to add __GFP_NORETRY semantics for percpu allocator under pressure, there seems also a missing bpf_map_precharge_memlock() check in array map allocation. Given today the actual bpf_map_charge_memlock() happens after the find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock() is there to bail out early before we go and do the map setup work when we find that we hit the limits anyway. Therefore add this for array map as well. Fixes: `6c90598174` ("bpf: pre-allocate hash map elements") Fixes: `a10423b87a` ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Dennis Zhou <dennisszhou@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-03-11 16:23:21 +01:00
Greg Kroah-Hartman	9b68347c35	Merge 4.14.14 into android-4.14 Changes in 4.14.14 dm bufio: fix shrinker scans when (nr_to_scan < retain_target) KVM: Fix stack-out-of-bounds read in write_mmio can: vxcan: improve handling of missing peer name attribute can: gs_usb: fix return value of the "set_bittiming" callback IB/srpt: Disable RDMA access by the initiator IB/srpt: Fix ACL lookup during login MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task MIPS: Factor out NT_PRFPREG regset access helpers MIPS: Guard against any partial write attempt with PTRACE_SETREGSET MIPS: Consistently handle buffer counter with PTRACE_SETREGSET MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC kvm: vmx: Scrub hardware GPRs at VM-exit platform/x86: wmi: Call acpi_wmi_init() later iw_cxgb4: only call the cq comp_handler when the cq is armed iw_cxgb4: atomically flush the qp iw_cxgb4: only clear the ARMED bit if a notification is needed iw_cxgb4: reflect the original WR opcode in drain cqes iw_cxgb4: when flushing, complete all wrs in a chain x86/acpi: Handle SCI interrupts above legacy space gracefully ALSA: pcm: Remove incorrect snd_BUG_ON() usages ALSA: pcm: Workaround for weird PulseAudio behavior on rewind error ALSA: pcm: Add missing error checks in OSS emulation plugin builder ALSA: pcm: Abort properly at pending signal in OSS read/write loops ALSA: pcm: Allow aborting mutex lock at OSS read/write loops ALSA: aloop: Release cable upon open error path ALSA: aloop: Fix inconsistent format due to incomplete rule ALSA: aloop: Fix racy hw constraints adjustment x86/acpi: Reduce code duplication in mp_override_legacy_irq() 8021q: fix a memory leak for VLAN 0 device ip6_tunnel: disable dst caching if tunnel is dual-stack net: core: fix module type in sock_diag_bind phylink: ensure we report link down when LOS asserted RDS: Heap OOB write in rds_message_alloc_sgs() RDS: null pointer dereference in rds_atomic_free_op net: fec: restore dev_id in the cases of probe error net: fec: defer probe if regulator is not ready net: fec: free/restore resource in related probe error pathes sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled sctp: fix the handling of ICMP Frag Needed for too small MTUs sh_eth: fix TSU resource handling net: stmmac: enable EEE in MII, GMII or RGMII only sh_eth: fix SH7757 GEther initialization ipv6: fix possible mem leaks in ipv6_make_skb() ethtool: do not print warning for applications using legacy API mlxsw: spectrum_router: Fix NULL pointer deref net/sched: Fix update of lastuse in act modules implementing stats_update ipv6: sr: fix TLVs not being copied using setsockopt mlxsw: spectrum: Relax sanity checks during enslavement sfp: fix sfp-bus oops when removing socket/upstream membarrier: Disable preemption when calling smp_call_function_many() crypto: algapi - fix NULL dereference in crypto_remove_spawns() mmc: renesas_sdhi: Add MODULE_LICENSE rbd: reacquire lock should update lock owner client id rbd: set max_segments to USHRT_MAX iwlwifi: pcie: fix DMA memory mapping / unmapping x86/microcode/intel: Extend BDW late-loading with a revision check KVM: x86: Add memory barrier on vmcs field lookup KVM: PPC: Book3S PR: Fix WIMG handling under pHyp KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt KVM: PPC: Book3S HV: Fix use after free in case of multiple resize requests KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt() drm/vmwgfx: Don't cache framebuffer maps drm/vmwgfx: Potential off by one in vmw_view_add() drm/i915/gvt: Clear the shadow page table entry after post-sync drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. drm/i915: Move init_clock_gating() back to where it was drm/i915: Fix init_clock_gating for resume bpf: prevent out-of-bounds speculation bpf, array: fix overflow in max_entries and undefined behavior in index_mask bpf: arsh is not supported in 32 bit alu thus reject it USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ USB: serial: cp210x: add new device ID ELV ALC 8xxx usb: misc: usb3503: make sure reset is low for at least 100us USB: fix usbmon BUG trigger USB: UDC core: fix double-free in usb_add_gadget_udc_release usbip: remove kernel addresses from usb device and urb debug msgs usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl Bluetooth: Prevent stack info leak from the EFS element. uas: ignore UAS for Norelsys NS1068(X) chips mux: core: fix double get_device() kdump: write correct address of mem_section into vmcoreinfo apparmor: fix ptrace label match when matching stacked labels e1000e: Fix e1000_check_for_copper_link_ich8lan return value. x86/pti: Unbreak EFI old_memmap x86/Documentation: Add PTI description x86/cpufeatures: Add X86_BUG_SPECTRE_V[12] sysfs/cpu: Add vulnerability folder x86/cpu: Implement CPU vulnerabilites sysfs functions x86/tboot: Unbreak tboot with PTI enabled x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*() x86/cpu/AMD: Make LFENCE a serializing instruction x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC sysfs/cpu: Fix typos in vulnerability documentation x86/alternatives: Fix optimize_nops() checking x86/pti: Make unpoison of pgd for trusted boot work for real objtool: Detect jumps to retpoline thunks objtool: Allow alternatives to be ignored x86/retpoline: Add initial retpoline support x86/spectre: Add boot time option to select Spectre v2 mitigation x86/retpoline/crypto: Convert crypto assembler indirect jumps x86/retpoline/entry: Convert entry assembler indirect jumps x86/retpoline/ftrace: Convert ftrace assembler indirect jumps x86/retpoline/hyperv: Convert assembler indirect jumps x86/retpoline/xen: Convert Xen hypercall indirect jumps x86/retpoline/checksum32: Convert assembler indirect jumps x86/retpoline/irq32: Convert assembler indirect jumps x86/retpoline: Fill return stack buffer on vmexit selftests/x86: Add test_vsyscall x86/pti: Fix !PCID and sanitize defines security/Kconfig: Correct the Documentation reference for PTI x86,perf: Disable intel_bts when PTI x86/retpoline: Remove compile time warning Linux 4.14.14 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-01-17 10:33:24 +01:00
Daniel Borkmann	67c05d9414	bpf, array: fix overflow in max_entries and undefined behavior in index_mask commit bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1 upstream. syzkaller tried to alloc a map with 0xfffffffd entries out of a userns, and thus unprivileged. With the recently added logic in b2157399cc98 ("bpf: prevent out-of-bounds speculation") we round this up to the next power of two value for max_entries for unprivileged such that we can apply proper masking into potentially zeroed out map slots. However, this will generate an index_mask of 0xffffffff, and therefore a + 1 will let this overflow into new max_entries of 0. This will pass allocation, etc, and later on map access we still enforce on the original attr->max_entries value which was 0xfffffffd, therefore triggering GPF all over the place. Thus bail out on overflow in such case. Moreover, on 32 bit archs roundup_pow_of_two() can also not be used, since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit space is undefined. Therefore, do this by hand in a 64 bit variable. This fixes all the issues triggered by syzkaller's reproducers. Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-17 09:45:25 +01:00
Alexei Starovoitov	a5dbaf8768	bpf: prevent out-of-bounds speculation commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream. Under speculation, CPUs may mis-predict branches in bounds checks. Thus, memory accesses under a bounds check may be speculated even if the bounds check fails, providing a primitive for building a side channel. To avoid leaking kernel data round up array-based maps and mask the index after bounds check, so speculated load with out of bounds index will load either valid value from the array or zero from the padded area. Unconditionally mask index for all array types even when max_entries are not rounded to power of 2 for root user. When map is created by unpriv user generate a sequence of bpf insns that includes AND operation to make sure that JITed code includes the same 'index & index_mask' operation. If prog_array map is created by unpriv user replace bpf_tail_call(ctx, map, index); with if (index >= max_entries) { index &= map->index_mask; bpf_tail_call(ctx, map, index); } (along with roundup to power 2) to prevent out-of-bounds speculation. There is secondary redundant 'if (index >= max_entries)' in the interpreter and in all JITs, but they can be optimized later if necessary. Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array) cannot be used by unpriv, so no changes there. That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on all architectures with and without JIT. v2->v3: Daniel noticed that attack potentially can be crafted via syscall commands without loading the program, so add masking to those paths as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-17 09:45:25 +01:00
Chenbo Feng	cace572e16	BACKPORT: bpf: Add file mode configuration into bpf maps Introduce the map read/write flags to the eBPF syscalls that returns the map fd. The flags is used to set up the file mode when construct a new file descriptor for bpf maps. To not break the backward capability, the f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise it should be O_RDONLY or O_WRONLY. When the userspace want to modify or read the map content, it will check the file mode to see if it is allowed to make the change. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Bug: 30950746 Change-Id: Icfad20f1abb77f91068d244fb0d87fa40824dd1b (cherry picked from commit 6e71b04a82248ccf13a94b85cbc674a9fefe53f5) Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2017-12-18 21:11:22 +05:30
Daniel Borkmann	bc6d5031b4	bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu allocator. Given we support __GFP_NOWARN now, lets just let the allocation request fail naturally instead. The two call sites from BPF mistakenly assumed __GFP_NOWARN would work, so no changes needed to their actual __alloc_percpu_gfp() calls which use the flag already. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-19 13:13:50 +01:00
Daniel Borkmann	7b0c2a0508	bpf: inline map in map lookup functions for array and htab Avoid two successive functions calls for the map in map lookup, first is the bpf_map_lookup_elem() helper call, and second the callback via map->ops->map_lookup_elem() to get to the map in map implementation. Implementation inlines array and htab flavor for map in map lookups. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-19 21:56:34 -07:00
Martin KaFai Lau	96eabe7a40	bpf: Allow selecting numa node during map creation The current map creation API does not allow to provide the numa-node preference. The memory usually comes from where the map-creation-process is running. The performance is not ideal if the bpf_prog is known to always run in a numa node different from the map-creation-process. One of the use case is sharding on CPU to different LRU maps (i.e. an array of LRU maps). Here is the test result of map_perf_test on the INNER_LRU_HASH_PREALLOC test if we force the lru map used by CPU0 to be allocated from a remote numa node: [ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ] ># taskset -c 10 ./map_perf_test 512 8 1260000 8000000 5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec 4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec 3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec 6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec 2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec 1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec 7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec 0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec #<<< After specifying numa node: ># taskset -c 10 ./map_perf_test 512 8 1260000 8000000 5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec 3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec 1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec 6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec 2:inner_lru_hash_map_perf pre-alloc `1614630` events per sec 4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec 7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec 0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<< This patch adds one field, numa_node, to the bpf_attr. Since numa node 0 is a valid node, a new flag BPF_F_NUMA_NODE is also added. The numa_node field is honored if and only if the BPF_F_NUMA_NODE flag is set. Numa node selection is not supported for percpu map. This patch does not change all the kmalloc. F.e. 'htab = kzalloc()' is not changed since the object is small enough to stay in the cache. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-19 21:35:43 -07:00
Martin KaFai Lau	14dc6f04f4	bpf: Add syscall lookup support for fd array and htab This patch allows userspace to do BPF_MAP_LOOKUP_ELEM on BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS. The lookup returns a prog-id or map-id to the userspace. The userspace can then use the BPF_PROG_GET_FD_BY_ID or BPF_MAP_GET_FD_BY_ID to get a fd. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-29 13:13:25 -04:00
Alexei Starovoitov	f91840a32d	perf, bpf: Add BPF support to all perf_event types Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all perf_event types, including HW_CACHE, RAW, and dynamic pmu events. Only tracepoint/kprobe events are treated differently which require BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly. Also add support for reading all event counters using bpf_perf_event_read() helper. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-04 21:58:01 -04:00
Daniel Borkmann	a316338cb7	bpf: fix wrong exposure of map_flags into fdinfo for lpm trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via attr->map_flags, since it does not support preallocation yet. We check the flag, but we never copy the flag into trie->map.map_flags, which is later on exposed into fdinfo and used by loaders such as iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test whether a pinned map has the same spec as the one from the BPF obj file and if not, bails out, which is currently the case for lpm since it exposes always 0 as flags. Also copy over flags in array_map_alloc() and stack_map_alloc(). They always have to be 0 right now, but we should make sure to not miss to copy them over at a later point in time when we add actual flags for them to use. Fixes: `b95a5c4db0` ("bpf: add a longest prefix match trie map implementation") Reported-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-25 13:44:28 -04:00
Teng Qin	8fe4592438	bpf: map_get_next_key to return first key on NULL When iterating through a map, we need to find a key that does not exist in the map so map_get_next_key will give us the first key of the map. This often requires a lot of guessing in production systems. This patch makes map_get_next_key return the first key when the key pointer in the parameter is NULL. Signed-off-by: Teng Qin <qinteng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:57:45 -04:00
Johannes Berg	40077e0cf6	bpf: remove struct bpf_map_type_list There's no need to have struct bpf_map_type_list since it just contains a list_head, the type, and the ops pointer. Since the types are densely packed and not actually dynamically registered, it's much easier and smaller to have an array of type->ops pointer. Also initialize this array statically to remove code needed to initialize it. In order to save duplicating the list, move it to the types header file added by the previous patch and include it in the same fashion. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-11 14:38:43 -04:00
Martin KaFai Lau	56f668dfe0	bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 15:45:45 -07:00
Martin KaFai Lau	fad73a1a35	bpf: Fix and simplifications on inline map lookup Fix in verifier: For the same bpf_map_lookup_elem() instruction (i.e. "call 1"), a broken case is "a different type of map could be used for the same lookup instruction". For example, an array in one case and a hashmap in another. We have to resort to the old dynamic call behavior in this case. The fix is to check for collision on insn_aux->map_ptr. If there is collision, don't inline the map lookup. Please see the "do_reg_lookup()" in test_map_in_map_kern.c in the later patch for how-to trigger the above case. Simplifications on array_map_gen_lookup(): 1. Calculate elem_size from map->value_size. It removes the need for 'struct bpf_array' which makes the later map-in-map implementation easier. 2. Remove the 'elem_size == 1' test Fixes: `81ed18ab30` ("bpf: add helper inlining infra and optimize map_array lookup") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 15:45:45 -07:00
Alexei Starovoitov	81ed18ab30	bpf: add helper inlining infra and optimize map_array lookup Optimize bpf_call -> bpf_map_lookup_elem() -> array_map_lookup_elem() into a sequence of bpf instructions. When JIT is on the sequence of bpf instructions is the sequence of native cpu instructions with significantly faster performance than indirect call and two function's prologue/epilogue. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-16 20:44:11 -07:00
Daniel Borkmann	c78f8bdfa1	bpf: mark all registered map/prog types as __ro_after_init All map types and prog types are registered to the BPF core through bpf_register_map_type() and bpf_register_prog_type() during init and remain unchanged thereafter. As by design we don't (and never will) have any pluggable code that can register to that at any later point in time, lets mark all the existing bpf_{map,prog}_type_list objects in the tree as __ro_after_init, so they can be moved to read-only section from then onwards. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-02-17 13:40:04 -05:00
Daniel Borkmann	d407bd25a2	bpf: don't trigger OOM killer under pressure with map alloc This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(), that are to be used for map allocations. Using kmalloc() for very large allocations can cause excessive work within the page allocator, so i) fall back earlier to vmalloc() when the attempt is considered costly anyway, and even more importantly ii) don't trigger OOM killer with any of the allocators. Since this is based on a user space request, for example, when creating maps with element pre-allocation, we really want such requests to fail instead of killing other user space processes. Also, don't spam the kernel log with warnings should any of the allocations fail under pressure. Given that, we can make backend selection in bpf_map_area_alloc() generic, and convert all maps over to use this API for spots with potentially large allocation requests. Note, replacing the one kmalloc_array() is fine as overflow checks happen earlier in htab_map_alloc(), since it must also protect the multiplication for vmalloc() should kmalloc_array() fail. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-18 17:12:26 -05:00
Michal Hocko	7984c27c2c	bpf: do not use KMALLOC_SHIFT_MAX Commit `01b3f52157` ("bpf: fix allocation warnings in bpf maps and integer overflow") has added checks for the maximum allocateable size. It (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect it is not very clean because we already have KMALLOC_MAX_SIZE for this very reason so let's change both checks to use KMALLOC_MAX_SIZE instead. The original motivation for using KMALLOC_SHIFT_MAX was to work around an incorrect KMALLOC_MAX_SIZE which could lead to allocation warnings but it is no longer needed since "slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER". Link: http://lkml.kernel.org/r/20161220130659.16461-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-01-10 18:31:54 -08:00
Sargun Dhillon	60d20f9195	bpf: Add bpf_current_task_under_cgroup helper This adds a bpf helper that's similar to the skb_in_cgroup helper to check whether the probe is currently executing in the context of a specific subset of the cgroupsv2 hierarchy. It does this based on membership test for a cgroup arraymap. It is invalid to call this in an interrupt, and it'll return an error. The helper is primarily to be used in debugging activities for containers, where you may have multiple programs running in a given top-level "container". Signed-off-by: Sargun Dhillon <sargun@sargun.me> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-08-12 21:49:41 -07:00
Daniel Borkmann	858d68f102	bpf: bpf_event_entry_gen's alloc needs to be in atomic context Should have been obvious, only called from bpf() syscall via map_update_elem() that calls bpf_fd_array_map_update_elem() under RCU read lock and thus this must also be in GFP_ATOMIC, of course. Fixes: `3b1efb196e` ("bpf, maps: flush own entries on perf map release") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-16 22:03:39 -07:00
Martin KaFai Lau	4ed8ec521e	cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations. To update an element, the caller is expected to obtain a cgroup2 backed fd by open(cgroup2_dir) and then update the array with that fd. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-01 16:30:38 -04:00

1 2

70 Commits