kernel_samsung_a71

Author	SHA1	Message	Date
Greg Kroah-Hartman	b758102651	UPSTREAM: Revert "bpf: Add map and need_defer parameters to .map_fd_put_ptr()" This reverts commit eb6f68ec92ab60b0540ebf64fe851e99d846e086 which is commit 20c20bd11a0702ce4dc9300c3da58acf551d9725 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I4611eed3677738ab29469733e2b4f6734ef3d605 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2025-10-02 22:15:11 +08:00
Thomas Gleixner	3deb30fb74	BACKPORT: treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 64 file(s). Change-Id: Ic7cca08bbba3c38e0d53d3374c43ee8bf1e24172 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-10-02 22:14:45 +08:00
Roman Gushchin	1a371f6225	UPSTREAM: bpf: move memory size checks to bpf_map_charge_init() Most bpf map types doing similar checks and bytes to pages conversion during memory allocation and charging. Let's unify these checks by moving them into bpf_map_charge_init(). Change-Id: I55ceded2303102feba9e485042e8f5169f490609 Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-02 22:14:45 +08:00
Roman Gushchin	632d849a6d	UPSTREAM: bpf: rework memlock-based memory accounting for maps In order to unify the existing memlock charging code with the memcg-based memory accounting, which will be added later, let's rework the current scheme. Currently the following design is used: 1) .alloc() callback optionally checks if the allocation will likely succeed using bpf_map_precharge_memlock() 2) .alloc() performs actual allocations 3) .alloc() callback calculates map cost and sets map.memory.pages 4) map_create() calls bpf_map_init_memlock() which sets map.memory.user and performs actual charging; in case of failure the map is destroyed <map is in use> 1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which performs uncharge and releases the user 2) .map_free() callback releases the memory The scheme can be simplified and made more robust: 1) .alloc() calculates map cost and calls bpf_map_charge_init() 2) bpf_map_charge_init() sets map.memory.user and performs actual charge 3) .alloc() performs actual allocations <map is in use> 1) .map_free() callback releases the memory 2) bpf_map_charge_finish() performs uncharge and releases the user The new scheme also allows to reuse bpf_map_charge_init()/finish() functions for memcg-based accounting. Because charges are performed before actual allocations and uncharges after freeing the memory, no bogus memory pressure can be created. In cases when the map structure is not available (e.g. it's not created yet, or is already destroyed), on-stack bpf_map_memory structure is used. The charge can be transferred with the bpf_map_charge_move() function. Change-Id: I299bfa9d3e74f366861b6de3bf17951a1374824b Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-02 22:14:44 +08:00
Roman Gushchin	7032f89546	UPSTREAM: bpf: group memory related fields in struct bpf_map_memory Group "user" and "pages" fields of bpf_map into the bpf_map_memory structure. Later it can be extended with "memcg" and other related information. The main reason for a such change (beside cosmetics) is to pass bpf_map_memory structure to charging functions before the actual allocation of bpf_map. Change-Id: I04e4edf805bfe4c26fce45f7166317fe00dd0dfa Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-02 22:14:44 +08:00
Daniel Borkmann	b0c33de60f	UPSTREAM: bpf: allow for key-less BTF in array map Given we'll be reusing BPF array maps for global data/bss/rodata sections, we need a way to associate BTF DataSec type as its map value type. In usual cases we have this ugly BPF_ANNOTATE_KV_PAIR() macro hack e.g. via 38d5d3b3d5db ("bpf: Introduce BPF_ANNOTATE_KV_PAIR") to get initial map to type association going. While more use cases for it are discouraged, this also won't work for global data since the use of array map is a BPF loader detail and therefore unknown at compilation time. For array maps with just a single entry we make an exception in terms of BTF in that key type is declared optional if value type is of DataSec type. The latter LLVM is guaranteed to emit and it also aligns with how we regard global data maps as just a plain buffer area reusing existing map facilities for allowing things like introspection with existing tools. Change-Id: I6fd7e20b453529e07aa1c77beacff4e62c7500bd Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-02 22:14:38 +08:00
Daniel Borkmann	6851cbec41	UPSTREAM: bpf: add program side {rd, wr}only support for maps This work adds two new map creation flags BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG in order to allow for read-only or write-only BPF maps from a BPF program side. Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only applies to system call side, meaning the BPF program has full read/write access to the map as usual while bpf(2) calls with map fd can either only read or write into the map depending on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows for the exact opposite such that verifier is going to reject program loads if write into a read-only map or a read into a write-only map is detected. For read-only map case also some helpers are forbidden for programs that would alter the map state such as map deletion, update, etc. As opposed to the two BPF_F_RDONLY / BPF_F_WRONLY flags, BPF_F_RDONLY_PROG as well as BPF_F_WRONLY_PROG really do correspond to the map lifetime. We've enabled this generic map extension to various non-special maps holding normal user data: array, hash, lru, lpm, local storage, queue and stack. Further generic map types could be followed up in future depending on use-case. Main use case here is to forbid writes into .rodata map values from verifier side. Change-Id: Iad96790cec92137902fe3ad12f53f1a94d58bc61 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-02 22:14:37 +08:00
Daniel Borkmann	5489474293	BACKPORT: bpf: implement lookup-free direct value access for maps This generic extension to BPF maps allows for directly loading an address residing inside a BPF map value as a single BPF ldimm64 instruction! The idea is similar to what BPF_PSEUDO_MAP_FD does today, which is a special src_reg flag for ldimm64 instruction that indicates that inside the first part of the double insns's imm field is a file descriptor which the verifier then replaces as a full 64bit address of the map into both imm parts. For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following: the first part of the double insns's imm field is again a file descriptor corresponding to the map, and the second part of the imm field is an offset into the value. The verifier will then replace both imm parts with an address that points into the BPF map value at the given value offset for maps that support this operation. Currently supported is array map with single entry. It is possible to support more than just single map element by reusing both 16bit off fields of the insns as a map index, so full array map lookup could be expressed that way. It hasn't been implemented here due to lack of concrete use case, but could easily be done so in future in a compatible way, since both off fields right now have to be 0 and would correctly denote a map index 0. The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of map pointer versus load of map's value at offset 0, and changing BPF_PSEUDO_MAP_FD's encoding into off by one to differ between regular map pointer and map value pointer would add unnecessary complexity and increases barrier for debugability thus less suitable. Using the second part of the imm field as an offset into the value does /not/ come with limitations since maximum possible value size is in u32 universe anyway. This optimization allows for efficiently retrieving an address to a map value memory area without having to issue a helper call which needs to prepare registers according to calling convention, etc, without needing the extra NULL test, and without having to add the offset in an additional instruction to the value base pointer. The verifier then treats the destination register as PTR_TO_MAP_VALUE with constant reg->off from the user passed offset from the second imm field, and guarantees that this is within bounds of the map value. Any subsequent operations are normally treated as typical map value handling without anything extra needed from verification side. The two map operations for direct value access have been added to array map for now. In future other types could be supported as well depending on the use case. The main use case for this commit is to allow for BPF loader support for global variables that reside in .data/.rodata/.bss sections such that we can directly load the address of them with minimal additional infrastructure required. Loader support has been added in subsequent commits for libbpf library. Change-Id: I51974f2fe227ba837b338b8b3ebb44c145583673 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-02 22:14:36 +08:00
Alexei Starovoitov	eb322f919d	UPSTREAM: bpf: introduce BPF_F_LOCK flag Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands and for map_update() helper function. In all these cases take a lock of existing element (which was provided in BTF description) before copying (in or out) the rest of map value. Implementation details that are part of uapi: Array: The array map takes the element lock for lookup/update. Hash: hash map also takes the lock for lookup/update and tries to avoid the bucket lock. If old element exists it takes the element lock and updates the element in place. If element doesn't exist it allocates new one and inserts into hash table while holding the bucket lock. In rare case the hashmap has to take both the bucket lock and the element lock to update old value in place. Cgroup local storage: It is similar to array. update in place and lookup are done with lock taken. Change-Id: I76b13e23e1f6241c1f919a1c24650530f7705d9e Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-10-02 22:14:32 +08:00
Alexei Starovoitov	eb0dfde540	BACKPORT: bpf: introduce bpf_spin_lock Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let bpf program serialize access to other variables. Example: struct hash_elem { int cnt; struct bpf_spin_lock lock; }; struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key); if (val) { bpf_spin_lock(&val->lock); val->cnt++; bpf_spin_unlock(&val->lock); } Restrictions and safety checks: - bpf_spin_lock is only allowed inside HASH and ARRAY maps. - BTF description of the map is mandatory for safety analysis. - bpf program can take one bpf_spin_lock at a time, since two or more can cause dead locks. - only one 'struct bpf_spin_lock' is allowed per map element. It drastically simplifies implementation yet allows bpf program to use any number of bpf_spin_locks. - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed. - bpf program must bpf_spin_unlock() before return. - bpf program can access 'struct bpf_spin_lock' only via bpf_spin_lock()/bpf_spin_unlock() helpers. - load/store into 'struct bpf_spin_lock lock;' field is not allowed. - to use bpf_spin_lock() helper the BTF description of map value must be a struct and have 'struct bpf_spin_lock anyname;' field at the top level. Nested lock inside another struct is not allowed. - syscall map_lookup doesn't copy bpf_spin_lock field to user space. - syscall map_update and program map_update do not update bpf_spin_lock field. - bpf_spin_lock cannot be on the stack or inside networking packet. bpf_spin_lock can only be inside HASH or ARRAY map value. - bpf_spin_lock is available to root only and to all program types. - bpf_spin_lock is not allowed in inner maps of map-in-map. - ld_abs is not allowed inside spin_lock-ed region. - tracing progs and socket filter progs cannot use bpf_spin_lock due to insufficient preemption checks Implementation details: - cgroup-bpf class of programs can nest with xdp/tc programs. Hence bpf_spin_lock is equivalent to spin_lock_irqsave. Other solutions to avoid nested bpf_spin_lock are possible. Like making sure that all networking progs run with softirq disabled. spin_lock_irqsave is the simplest and doesn't add overhead to the programs that don't use it. - arch_spinlock_t is used when its implemented as queued_spin_lock - archs can force their own arch_spinlock_t - on architectures where queued_spin_lock is not available and sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used. - presence of bpf_spin_lock inside map value could have been indicated via extra flag during map_create, but specifying it via BTF is cleaner. It provides introspection for map key/value and reduces user mistakes. Next steps: - allow bpf_spin_lock in other map types (like cgroup local storage) - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper to request kernel to grab bpf_spin_lock before rewriting the value. That will serialize access to map elements. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Change-Id: Id03322189a8f05c006a05479f7078b23c8c020ea Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-10-02 22:14:32 +08:00
Roman Gushchin	7a056468d8	UPSTREAM: bpf: pass struct btf pointer to the map_check_btf() callback If key_type or value_type are of non-trivial data types (e.g. structure or typedef), it's not possible to check them without the additional information, which can't be obtained without a pointer to the btf structure. So, let's pass btf pointer to the map_check_btf() callbacks. Change-Id: I95716060b450288d4ffcbe231d1cf5fdb530e292 Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-02 22:14:28 +08:00
Prashant Bhole	b61bfd4593	UPSTREAM: bpf: return EOPNOTSUPP when map lookup isn't supported Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below map types: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Change-Id: I13937c36055b419f4446d8bfa06f139c757480c9 Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-02 22:14:21 +08:00
Yonghong Song	84ace6305a	UPSTREAM: bpf: add bpffs pretty print for program array map Added bpffs pretty print for program array map. For a particular array index, if the program array points to a valid program, the "<index>: <prog_id>" will be printed out like 0: 6 which means bpf program with id "6" is installed at index "0". Change-Id: Ibfeac1777df6dc8742debe574ba259d212e7ecea Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-10-02 22:14:19 +08:00
Yonghong Song	c71bbb5912	UPSTREAM: bpf: add bpffs pretty print for percpu arraymap/hash/lru_hash Added bpffs pretty print for percpu arraymap, percpu hashmap and percpu lru hashmap. For each map <key, value> pair, the format is: <key_value>: { cpu0: <value_on_cpu0> cpu1: <value_on_cpu1> ... cpun: <value_on_cpun> } For example, on my VM, there are 4 cpus, and for test_btf test in the next patch: cat /sys/fs/bpf/pprint_test_percpu_hash You may get: ... 43602: { cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602\|[82,170,0,0,0,0,0,0]},ENUM_TWO} } 72847: { cpu0: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu1: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu2: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} cpu3: {72847,0,-72847,0x3,0x11c8f,0x3,{72847\|[143,28,1,0,0,0,0,0]},ENUM_THREE} } ... Change-Id: I286e7505765aa92ea9a8919ddecf8434a24fc187 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-10-02 22:14:19 +08:00
Tao Chen	aac2e51438	UPSTREAM: bpf: Check percpu map value size first [ Upstream commit 1d244784be6b01162b732a5a7d637dfc024c3203 ] Percpu map is often used, but the map value size limit often ignored, like issue: https://github.com/iovisor/bcc/issues/2519. Actually, percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first, like percpu map of local_storage. Maybe the error message seems clearer compared with "cannot allocate memory". Signed-off-by: Jinke Han <jinkehan@didiglobal.com> Signed-off-by: Tao Chen <chen.dylane@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240910144111.1464912-2-chen.dylane@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2025-10-02 22:14:18 +08:00
Daniel Borkmann	716ee03915	UPSTREAM: bpf: decouple btf from seq bpf fs dump and enable more maps Commit a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") and 699c86d6ec21 ("bpf: btf: add pretty print for hash/lru_hash maps") enabled support for BTF and dumping via BPF fs for array and hash/lru map. However, both can be decoupled from each other such that regular BPF maps can be supported for attaching BTF key/value information, while not all maps necessarily need to dump via map_seq_show_elem() callback. The basic sanity check which is a prerequisite for all maps is that key/value size has to match in any case, and some maps can have extra checks via map_check_btf() callback, e.g. probing certain types or indicating no support in general. With that we can also enable retrieving BTF info for per-cpu map types and lpm. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com>	2025-10-02 22:14:02 +08:00
Martin KaFai Lau	54514c7d06	BACKPORT: bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY This patch introduces a new map type BPF_MAP_TYPE_REUSEPORT_SOCKARRAY. To unleash the full potential of a bpf prog, it is essential for the userspace to be capable of directly setting up a bpf map which can then be consumed by the bpf prog to make decision. In this case, decide which SO_REUSEPORT sk to serve the incoming request. By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control and visibility on where a SO_REUSEPORT sk should be located in a bpf map. The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that the bpf prog can directly select a sk from the bpf map. That will raise the programmability of the bpf prog attached to a reuseport group (a group of sk serving the same IP:PORT). For example, in UDP, the bpf prog can peek into the payload (e.g. through the "data" pointer introduced in the later patch) to learn the application level's connection information and then decide which sk to pick from a bpf map. The userspace can tightly couple the sk's location in a bpf map with the application logic in generating the UDP payload's connection information. This connection info contact/API stays within the userspace. Also, when used with map-in-map, the userspace can switch the old-server-process's inner map to a new-server-process's inner map in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)". The bpf prog will then direct incoming requests to the new process instead of the old process. The old process can finish draining the pending requests (e.g. by "accept()") before closing the old-fds. [Note that deleting a fd from a bpf map does not necessary mean the fd is closed] During map_update_elem(), Only SO_REUSEPORT sk (i.e. which has already been added to a reuse->socks[]) can be used. That means a SO_REUSEPORT sk that is "bind()" for UDP or "bind()+listen()" for TCP. These conditions are ensured in "reuseport_array_update_check()". A SO_REUSEPORT sk can only be added once to a map (i.e. the same sk cannot be added twice even to the same map). SO_REUSEPORT already allows another sk to be created for the same IP:PORT. There is no need to re-create a similar usage in the BPF side. When a SO_REUSEPORT is deleted from the "reuse->socks[]" (e.g. "close()"), it will notify the bpf map to remove it from the map also. It is done through "bpf_sk_reuseport_detach()" and it will only be called if >=1 of the "reuse->sock[]" has ever been added to a bpf map. The map_update()/map_delete() has to be in-sync with the "reuse->socks[]". Hence, the same "reuseport_lock" used by "reuse->socks[]" has to be used here also. Care has been taken to ensure the lock is only acquired when the adding sk passes some strict tests. and freeing the map does not require the reuseport_lock. The reuseport_array will also support lookup from the syscall side. It will return a sock_gen_cookie(). The sock_gen_cookie() is on-demand (i.e. a sk's cookie is not generated until the very first map_lookup_elem()). The lookup cookie is 64bits but it goes against the logical userspace expectation on 32bits sizeof(fd) (and as other fd based bpf maps do also). It may catch user in surprise if we enforce value_size=8 while userspace still pass a 32bits fd during update. Supporting different value_size between lookup and update seems unintuitive also. We also need to consider what if other existing fd based maps want to return 64bits value from syscall's lookup in the future. Hence, reuseport_array supports both value_size 4 and 8, and assuming user will usually use value_size=4. The syscall's lookup will return ENOSPC on value_size=4. It will will only return 64bits value from sock_gen_cookie() when user consciously choose value_size=8 (as a signal that lookup is desired) which then requires a 64bits value in both lookup and update. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-10-02 22:14:01 +08:00
Martin KaFai Lau	e3ecf4c219	BACKPORT: bpf: btf: Use exact btf value_size match in map_check_btf() The current map_check_btf() in BPF_MAP_TYPE_ARRAY rejects '> map->value_size' to ensure map_seq_show_elem() will not access things beyond an array element. Yonghong suggested that using '!=' is a more correct check. The 8 bytes round_up on value_size is stored in array->elem_size. Hence, using '!=' on map->value_size is a proper check. This patch also adds new tests to check the btf array key type and value type. Two of these new tests verify the btf's value_size (the change in this patch). It also fixes two existing tests that wrongly encoded a btf's type size (pprint_test) and the value_type_id (in one of the raw_tests[]). However, that do not affect these two BTF verification tests before or after this test changes. These two tests mainly failed at array creation time after this patch. Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") Suggested-by: Yonghong Song <yhs@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-10-02 22:13:59 +08:00
Martin KaFai Lau	7f82a95bf2	UPSTREAM: bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id" could cause confusion because the "id" of "btf_id" means the BPF obj id given to the BTF object while "btf_key_id" and "btf_value_id" means the BTF type id within that BTF object. To make it clear, btf_key_id and btf_value_id are renamed to btf_key_type_id and btf_value_type_id. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-10-02 22:13:20 +08:00
Martin KaFai Lau	a469002199	UPSTREAM: bpf: btf: Add pretty print support to the basic arraymap This patch adds pretty print support to the basic arraymap. Support for other bpf maps can be added later. This patch adds new attrs to the BPF_MAP_CREATE command to allow specifying the btf_fd, btf_key_id and btf_value_id. The BPF_MAP_CREATE can then associate the btf to the map if the creating map supports BTF. A BTF supported map needs to implement two new map ops, map_seq_show_elem() and map_check_btf(). This patch has implemented these new map ops for the basic arraymap. It also adds file_operations, bpffs_map_fops, to the pinned map such that the pinned map can be opened and read. After that, the user has an intuitive way to do "cat bpffs/pathto/a-pinned-map" instead of getting an error. bpffs_map_fops should not be extended further to support other operations. Other operations (e.g. write/key-lookup...) should be realized by the userspace tools (e.g. bpftool) through the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc. Follow up patches will allow the userspace to obtain the BTF from a map-fd. Here is a sample output when reading a pinned arraymap with the following map's value: struct map_value { int count_a; int count_b; }; cat /sys/fs/bpf/pinned_array_map: 0: {1,2} 1: {3,4} 2: {5,6} ... Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-10-02 22:12:58 +08:00
Jakub Kicinski	9dffd994d6	BACKPORT: bpf: arraymap: use bpf_map_init_from_attr() Arraymap was not converted to use bpf_map_init_from_attr() to avoid merge conflicts with emergency fixes. Do it now. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-10-02 22:12:47 +08:00
Jakub Kicinski	0daa92a890	UPSTREAM: bpf: arraymap: move checks out of alloc function Use the new callback to perform allocation checks for array maps. The fd maps don't need a special allocation callback, they only need a special check callback. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-10-02 22:12:47 +08:00
Yonghong Song	456de77985	BACKPORT: bpf: perf event change needed for subsequent bpf helpers This patch does not impact existing functionalities. It contains the changes in perf event area needed for subsequent bpf_perf_event_read_value and bpf_perf_prog_read_value helpers. Change-Id: I066312fce9ebb0185b02ce6904e057d728473f90 Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2025-10-02 22:12:32 +08:00
Tim Zimmermann	3996f04715	Squashed revert of BPF backports Revert "Partially revert "fixup: add back code missed during BPF picking"" This reverts commit cc477455f73d317733850a9e4818dfd90be4d33d. Revert "bpf: lpm_trie: check left child of last leftmost node for NULL" This reverts commit e89007b7df49292c5ae52b3d165c0d815a61cd10. Revert "BACKPORT: bpf: Fix out-of-bounds write in trie_get_next_key()" This reverts commit a1c4f565bb00b05ab3734a64451c08b0b965ce42. Revert "bpf: Fix exact match conditions in trie_get_next_key()" This reverts commit 4356a64dad3d38372147457b3004930c6e2e9c51. Revert "bpf: fix kernel page fault in lpm map trie_get_next_key" This reverts commit df4649b5d6cb374edbb67e5a5ecbd102a2e6c897. Revert "bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map" This reverts commit fe6656a5d48df6144fe9929399c648957166edd0. Revert "bpf: allow helpers to return PTR_TO_SOCK_COMMON" This reverts commit b24d1ae9ccbf3ebe6f4baa50d2d48c03be02bc17. Revert "bpf: implement lookup-free direct value access for maps" This reverts commit de1959fcd3df0629380894d9c47ebb253c920ad1. Revert "bpf: Add bpf_verifier_vlog() and bpf_verifier_log_needed()" This reverts commit b777824607bd3eb8c9130f4639d97d15bcac9af5. Revert "bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE" This reverts commit 4cfef728c1eac6cce34f4fff1fbab3e66dc430d9. Revert "bpf: always allocate at least 16 bytes for setsockopt hook" This reverts commit 59817f83c964c753e93a75128ecaad4eeaa769fc. Revert "bpf, sockmap: convert to generic sk_msg interface" This reverts commit fe4ef742e22924b21749de333211941d0205501e. Revert "bpf: sockmap, convert bpf_compute_data_pointers to bpf_*_sk_skb" This reverts commit d17c8c2c2f623e087d6c297de50c173a006e6e55. Revert "bpf: sockmap: fix typos" This reverts commit 07e31378d7795371cdbccce06b4125b27ffce536. Revert "sockmap: convert refcnt to an atomic refcnt" This reverts commit c1fa11ec9da5dc0e8cae4334c550264cff77eef9. Revert "bpf: sockmap, add hash map support" This reverts commit 3f43379c38e329e9a7d4b5a1640670de37ba317b. Revert "bpf: sockmap, refactor sockmap routines to work with hashmap" This reverts commit 41a2b6e925db031978eb2484835f60908de884d7. Revert "bpf: implement getsockopt and setsockopt hooks" This reverts commit 9526fe6ff3e06939c12bb781e0dda01a8f3017ec. Revert "bpf: Introduce bpf sk local storage" This reverts commit ffedc38a46ddaca40de672fafe78c45fbfae9839. Revert "bpf: introduce BPF_F_LOCK flag" This reverts commit e7f5758fbcb1674e17c645837f7bff3b1febbad5. Revert "bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types" This reverts commit e29b4e3c2bdd3b5d0d34668836ae8e5115cb31af. Revert "bpf/verifier: add ARG_PTR_TO_UNINIT_MAP_VALUE" This reverts commit f25c66c27cd6a774fb73769d804f91e969dd5f7b. Revert "bpf: allow map helpers access to map values directly" This reverts commit 7af696635219d0c5cdf1a166bb7543cae9e50328. Revert "bpf: add writable context for raw tracepoints" This reverts commit a546d8f0433039cee0de6ce96d5d35c4033a7b98. Revert "bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock" This reverts commit 03093478c52e79c94791a04f8138d5c019119087. Revert "bpf: Support socket lookup in CGROUP_SOCK_ADDR progs" This reverts commit 8047013945361fbff0e449c8a212cb6fc93a5245. Revert "bpf: Extend the sk_lookup() helper to XDP hookpoint." This reverts commit 8315368983086e70ccc6f103d710903c63cca7df. Revert "xdp: generic XDP handling of xdp_rxq_info" This reverts commit 11d9514e6e6801941abf1c0485fd4ef53082d970. Revert "xdp: move struct xdp_buff from filter.h to xdp.h" This reverts commit a1795f54e4d99e02d5cb84a46fac0240cf29e206. Revert "net: avoid including xdp.h in filter.h" This reverts commit a39c59398f3ab64de44e5953ee0bd23c5136bb48. Revert "xdp: base API for new XDP rx-queue info concept" This reverts commit 49fb5bae77ab2041a2ad9f9f87ad7e0a6e215fdf. Revert "net: Add asynchronous callbacks for xfrm on layer 2." This reverts commit d0656f64d7719993d5634a9fc6600026e9a805ee. Revert "xfrm: Separate ESP handling from segmentation for GRO packets." This reverts commit c8afadf7f5ed8786652d307558345ef90ea91726. Revert "net: move secpath_exist helper to sk_buff.h" This reverts commit 0e5483057121dad47567b01845c656955e51989e. Revert "sk_buff: add skb extension infrastructure" This reverts commit 3a9ae74b075757495c4becf4dd1eec056d364801. Revert "fixup: add back code missed during BPF picking" This reverts commit 74ec8cef7051b5af72f2a6d83ca8c51c3c61c444. Revert "bpf: undo prog rejection on read-only lock failure" This reverts commit af2dc6e4993c4221603dbe6e81a3d0c8269f3171. Revert "bpf: Add helper to retrieve socket in BPF" This reverts commit 53495e3bc33cb46d9961ea122f576faded058aa1. Revert "SQUASH! bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helpe" This reverts commit 3b25fbf81c041af954d9f5ac1c7867eb07c40b07. Revert "bpf: introduce bpf_spin_lock" This reverts commit 0095fb54160e4f8b326fa8df103e334f90c5ab56. Revert "bpf: enable cgroup local storage map pretty print with kind_flag" This reverts commit 3fe92cb79b5eae557b113c37b03e78efee2280db. Revert "bpf: btf: fix struct/union/fwd types with kind_flag" This reverts commit 2bd4856277f459974dd6234a849cbe20fd475b8f. Revert "bpf: add bpffs pretty print for cgroup local storage maps" This reverts commit e07d8c8279f37cee8471846a63acc51f1ab7ce03. Revert "bpf: pass struct btf pointer to the map_check_btf() callback" This reverts commit 78a8140faf32710799c19495db28d71693c98030. Revert "bpf: Define cgroup_bpf_enabled for CONFIG_CGROUP_BPF=n" This reverts commit aada945d89950c67099e490af1c4c25eef7f31e6. Revert "bpf: introduce per-cpu cgroup local storage" This reverts commit d37432968663559f06c7fd7df44197a807fb84ca. Revert "bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info" This reverts commit 063c5a25e5f47e8b82b6c43a44ed7be851884abb. Revert "bpf: fix a compilation error when CONFIG_BPF_SYSCALL is not defined" This reverts commit bcf5bfaf50bb6f1f981d5c538f87e6da7aab78f2. Revert "bpf: Create a new btf_name_by_offset() for non type name use case" This reverts commit 52b4739d0bdd763e1b00feb50bef8a821f5c7570. Revert "bpf: reject any prog that failed read-only lock" This reverts commit 30d1bfec06a3bcaa773213113904580e3046a57a. Revert "bpf: Add bpf_line_info support" This reverts commit 50b094eeeb1ced32c62b3a10045bbf43126de760. Revert "bpf: don't leave partial mangled prog in jit_subprogs error path" This reverts commit a466f85be89f5daab4bd748f92915ea713d63934. Revert "bpf: btf: support proper non-jit func info" This reverts commit 492a556de94c502376ec3b0d5a724ec9fe9f6996. Revert "bpf: Introduce bpf_func_info" This reverts commit 39cade88686b0d9b7befc1f14e9d2c2cad19a769. Revert "bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO" This reverts commit 2010b6bacc271a48e74942506f3cf45268b6c264. Revert "bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv" This reverts commit a0ea14ac88a0f5529a635fc6e20277942fc6bb99. Revert "bpf: Expose check_uarg_tail_zero()" This reverts commit 1190aaae686534c2854838b3d642dac45d26b1f4. Revert "bpf: Append prog->aux->name in bpf_get_prog_name()" This reverts commit 8b82528df4a11a8501393c854978662fc218014e. Revert "bpf: get JITed image lengths of functions via syscall" This reverts commit 0722dbc626915fcb9acb952ebc1fcb0c4554cb07. Revert "bpf: get kernel symbol addresses via syscall" This reverts commit 6736ec7558dd262fef6669eec02a9797c7c4ecb7. Revert "bpf: Add gpl_compatible flag to struct bpf_prog_info" This reverts commit b60c7a51fd3692259c93413f3e87150078be1dac. Revert "bpf: centre subprog information fields" This reverts commit b5186fdf6f3e1bb38d7e4abfed5bf7dd6f85a6c3. Revert "bpf: unify main prog and subprog" This reverts commit e8e2ad5d9ae98bc7b85b99c0712a5dfbfc151a41. Revert "bpf: fix maximum stack depth tracking logic" This reverts commit 10c7127615dc2c00b724069a1620b2232d905113. Revert "bpf, x64: fix memleak when not converging on calls" This reverts commit 6bc867f718ef2656266f984b605151971026cc98. Revert "bpf: decouple btf from seq bpf fs dump and enable more maps" This reverts commit 3036e2c4384d3f43c695b88c8a1cf97b8337e3bd. Revert "bpf: Add reference tracking to verifier" This reverts commit 3a4900a188ac4de817dc6f114f01159d7bdd2f3e. Revert "bpf: properly enforce index mask to prevent out-of-bounds speculation" This reverts commit ef85925d5c07b46f7447487605da601fc7be026e. Revert "bpf, verifier: detect misconfigured mem, size argument pair" This reverts commit c3853ee3cb96833e907f18bf90e78040fe4cf06f. Revert "bpf: introduce ARG_PTR_TO_MEM_OR_NULL" This reverts commit 58560e13f545f2a079bbce17ac1b731d8b94fec7. Revert "bpf: Macrofy stack state copy" This reverts commit 88d98d8c2ae320ab248150eb86e1c89427e5017c. Revert "bpf: Generalize ptr_or_null regs check" This reverts commit d2cbc2e57b8624699a1548e67b7b3ce992b396fc. Revert "bpf: Add iterator for spilled registers" This reverts commit d956e1ba51a7e5ce86bb35002e26d4c1e0a2497c. Revert "bpf/verifier: refine retval R0 state for bpf_get_stack helper" This reverts commit ceaf6d678ccb60da107b0455da64c7bf90c5102d. Revert "bpf: Remove struct bpf_verifier_env argument from print_bpf_insn" This reverts commit 058fd54c07a289f9b506f2d2326434e411fa65fe. Revert "bpf: annotate bpf_insn_print_t with __printf" This reverts commit 9b07d2ccf07855d62446e274d817672713f15be4. Revert "bpf: allow for correlation of maps and helpers in dump" This reverts commit af690c2e2d177352f7270f77d8a6bc9e9f60c98c. Revert "bpf: Add bpf_patch_call_args prototype to include/linux/bpf.h" This reverts commit 8a2c588b3ab98916147fe4a449312ce8db70c471. Revert "bpf: x64: add JIT support for multi-function programs" This reverts commit 752f261e545f80942272c6becf82def1729f84be. Revert "bpf: fix net.core.bpf_jit_enable race" This reverts commit 4720901114c20204aa3ffa2076265d2c8cc9e81b. Revert "bpf: add support for bpf_call to interpreter" This reverts commit c79b2e547adc8e50dabc72244370cfd37ac6a6bd. Revert "bpf: introduce function calls (verification)" This reverts commit f779fda96c7d9e921525f48d67fa2e9c68b4bd48. Revert "bpf: cleanup register_is_null()" This reverts commit 1c81f751670b4feb3102e4de136e25fa24e303fe. Revert "bpf: print liveness info to verifier log" This reverts commit fdc851301b33b9d646bd1d37124cbd45cedcd62b. Revert "bpf: also improve pattern matches for meta access" This reverts commit 9aa150d07927b911f26e0db2af0efd6aa07b8707. Revert "bpf: add meta pointer for direct access" This reverts commit 94f3f502ef9ef150ed687113cfbd38e91b5edc44. Revert "bpf: rename bpf_compute_data_end into bpf_compute_data_pointers" This reverts commit 9573c6feb301346cd1493eea4e363c6d8345e899. Revert "bpf: squash of log related commits" This reverts commit b08f2111e030a72a92eec4ebd6201165d03a20b8. Revert "bpf: move instruction printing into a separate file" This reverts commit 8fcbd39afb58847914f3f84d9c076000e09d2fb9. Revert "bpf: btf: Introduce BTF ID" This reverts commit 423c40d67dfc783c3b0cb227d9da53e725e0f35c. Revert "bpf: btf: Add pretty print support to the basic arraymap" This reverts commit 6cd4d5bba662ca0d8980e5806ef37e0341eab929. Revert "nsfs: clean-up ns_get_path() signature to return int" This reverts commit ec1ce41701f411c5dee396cec2931fb651f447cc. Revert "bpf_obj_do_pin(): switch to vfs_mkobj(), quit abusing ->mknod()" This reverts commit 8fbcb4ebf5a751f4685cdd2757cff2264032a5d9. Revert "bpf: offload: report device information about offloaded maps" This reverts commit 1105e63f25a9db675671288b583a5ce2c7d10b1f. Revert "bpf: offload: add map offload infrastructure" This reverts commit 20cdf9df3d5bd010d799ea3c80219f625c998307. Revert "bpf: add map_alloc_check callback" This reverts commit 6feb4121ea083053ac9587ac426195efe9fb143d. Revert "bpf: offload: factor out netdev checking at allocation time" This reverts commit 1425fb5676b8fe9d761f2f6545e4be8880ce0ac8. Revert "bpf: rename bpf_dev_offload -> bpf_prog_offload" This reverts commit a03ae0ec508200433fd6c35b87e342df4de0b320. Revert "bpf: offload: allow netdev to disappear while verifier is running" This reverts commit f6cf7214fd1ff3a018009ba90c33eac1d8de21de. Revert "bpf: offload: free program id when device disappears" This reverts commit b12b5e56b799cfe900ab8f0ee4177c6c08a904c6. Revert "bpf: offload: report device information for offloaded programs" This reverts commit c73c9a0ffa332eeb49927a48780f5537597e2d42. Revert "bpf: offload: don't require rtnl for dev list manipulation" This reverts commit 1993f08662f07581a370899a2da209ba0c996dbb. Revert "bpf: offload: ignore namespace moves" This reverts commit 9fefb21d8aa2691019f9c4f0b8025fb45ba60b49. Revert "bpf: Add PTR_TO_SOCKET verifier type" This reverts commit 55fdbc844801cd4007237fa6c5842b46985a5c9a. Revert "bpf: extend cgroup bpf core to allow multiple cgroup storage types" This reverts commit a6d82e371ef32fb24d493cff32765b4607581dd4. Revert "bpf: permit CGROUP_DEVICE programs accessing helper bpf_get_current_cgroup_id()" This reverts commit 1bfd0a07a8317004a89d6de736e24861db8281b5. Revert "bpf: implement bpf_get_current_cgroup_id() helper" This reverts commit 23603ed6d7df86392701a7ea7d9a1dba66f28d4b. Revert "bpf: introduce the bpf_get_local_storage() helper function" This reverts commit 3d777256b1c9f34975c5230d836023ea3e0d4cfd. Revert "bpf/verifier: introduce BPF_PTR_TO_MAP_VALUE" This reverts commit 93c12733dc97984f7bf57a77160eacc480bfc3de. Revert "bpf: extend bpf_prog_array to store pointers to the cgroup storage" This reverts commit b26baff1fb34607938c9ac0e421e3f4b5fedad4d. Revert "BACKPORT: bpf: allocate cgroup storage entries on attaching bpf programs" This reverts commit 804605c21a3be3277c0031504dcd3fdd1be64290. Revert "bpf: include errno.h from bpf-cgroup.h" This reverts commit 6b4df332b357e9a5942ca4c6f985cd33dfc30e25. Revert "bpf: pass a pointer to a cgroup storage using pcpu variable" This reverts commit c8af92dc9fc00e49f06f6997969284ef5e5c5af5. Revert "bpf: introduce cgroup storage maps" This reverts commit c61c2271cb8a1e47678bddc8cdfae83035a07fec. Revert "bpf: add ability to charge bpf maps memory dynamically" This reverts commit 3a430745e9f675b450477fffead5568046432f29. Revert "bpf: add helper for copying attrs to struct bpf_map" This reverts commit 6d7be0ae93371692e564c00003ce184cbaefbb8d. Revert "bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP" This reverts commit 15f584d2d3d4814cfbd3059ab810db02af8773a0. Revert "bpf/tracing: fix a deadlock in perf_event_detach_bpf_prog" This reverts commit fc9bf5e48985f7c3a39bf34a27477a2607a5dc6d. Revert "bpf: set maximum number of attached progs to 64 for a single perf tp" This reverts commit 0d5fc9795d824fbca21b81c8d91748ba21313d4c. Revert "bpf: avoid rcu_dereference inside bpf_event_mutex lock region" This reverts commit 948e200e3173dd959de907e326f2a2c90eda4b28. Revert "bpf: fix bpf_prog_array_copy_to_user() issues" This reverts commit 66811698b8de9b3cf13c09730d287b6d1d5d3699. Revert "bpf: fix pointer offsets in context for 32 bit" This reverts commit 99661813c136c52e56b328a2a8ecd2bc0e187eba. Revert "BACKPORT: bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data" This reverts commit 36f0ea00dd121b13f80617e5b2eb93ba160df85a. Revert "BACKPORT: bpf: Sysctl hook" This reverts commit 4a543990e03b5de4a2c23777abd0f77afd61cc2d. Revert "BACKPORT: flow_dissector: implements flow dissector BPF hook" This reverts commit de610a8a4324170a0deaf12e2e64c2ff068785fb. Revert "BACKPORT: bpf: Add base proto function for cgroup-bpf programs" This reverts commit f3ac0a6cbec3472ff2e3808a436891881f3cbf87. Revert "FROMLIST: [net-next,v2,1/2] bpf: Allow CGROUP_SKB eBPF program to access sk_buff" This reverts commit 6d4dcc0e3de628003d91075e4b1ab1a128b8892e. Revert "BACKPORT: bpf: introduce BPF_RAW_TRACEPOINT" This reverts commit b2a5c6b4958c8250e58ddb6c334018a5f7ee5437. Revert "bpf/tracing: fix kernel/events/core.c compilation error" This reverts commit 70249d4eb7359e9dc59e044951beb99d0d8725cd. Revert "BACKPORT: bpf/tracing: allow user space to query prog array on the same tp" This reverts commit 08a6d8c01372940bfec78fdc6cb8a47e08c745b0. Revert "bpf: sockmap, add sock close() hook to remove socks" This reverts commit e6b363b8d09d9740dff309fb4dc88e7a1e90726b. Revert "BACKPORT: bpf: remove the verifier ops from program structure" This reverts commit 94c2f61efa741bf6a97415f42cfbfb9ec83dfd8e. Revert "bpf, cgroup: implement eBPF-based device controller for cgroup v2" This reverts commit 22faa9c56550a34488e607ca3aca59c68b1f7938. Revert "BACKPORT: bpf: split verifier and program ops" This reverts commit d2b1388504c1129d5756bb9b20af9bd64e75d015. Revert "bpf: btf: Break up btf_type_is_void()" This reverts commit 052989c47b68feaf381d371ec1e6a169edc26d30. Revert "bpf: btf: refactor btf_int_bits_seq_show()" This reverts commit 8cc3fb30656cfab91205194a8ee7661bdd95e005. Revert "BACKPORT: bpf: fix unconnected udp hooks" This reverts commit b108e725aa70e39cfd37296d1a1d31e8896fa7b7. Revert "BACKPORT: bpf: enforce return code for cgroup-bpf programs" This reverts commit 10215080915bfbdaa9f666a95ffda02cc1ef7a29. Revert "bpf: Hooks for sys_sendmsg" This reverts commit cd847db1be8a37e0e7e9c813b5d8f93697dc5af0. Revert "BACKPORT: devmap: Allow map lookups from eBPF" This reverts commit 37da95fde647e8967b362e0769136bfbebc03628. Revert "BACKPORT: xdp: Add devmap_hash map type for looking up devices by hashed index" This reverts commit ae6a87f44c4ef20ac290ce68c4d5b542cf46f3d7. Revert "kernel: bpf: devmap: Create __dev_map_alloc_node" This reverts commit 15928a97ed93cf9f606a21bf869ff421b997a2c5. Revert "BACKPORT: bpf: Post-hooks for sys_bind" This reverts commit c221d44e76c3ab69285c9986680e5eb726cf157b. Revert "BACKPORT: bpf: Hooks for sys_connect" This reverts commit 003311ea43163c77e4e0c1921b81438286925baa. Revert "BACKPORT: net: Introduce __inet_bind() and __inet6_bind" This reverts commit 74f1eb60012c13bd606e4dc718e63aec7f8cce8f. Revert "BACKPORT: bpf: Hooks for sys_bind" This reverts commit cef0bd97f2fec8363c3ef58b2cb508deaa9bc5b2. Revert "BACKPORT: bpf: introduce BPF_PROG_QUERY command" This reverts commit a4ef81ce48cb25843ddb4d4331dacf2742215909. Revert "BACKPORT: bpf: Check attach type at prog load time" This reverts commit 750a3f976c75797e572a6dfdd2e8865b8b49964a. Revert "bpf: offload: rename the ifindex field" This reverts commit 921e6becfb28fbe505603bf927f195d1d72a0eea. Revert "BACKPORT: bpf: offload: add infrastructure for loading programs for a specific netdev" This reverts commit cb1607a58d026a4ac1d9e71f6c3cd1dc23820e2f. Revert "BACKPORT: net: bpf: rename ndo_xdp to ndo_bpf" This reverts commit 932d47ebc5910bb1ec954002206b1ce8749a9cd6. Revert "bpf: btf: fix truncated last_member_type_id in btf_struct_resolve" This reverts commit e7af669fe00a8e2030913088836189a9f65a04d8. Revert "bpf/btf: Fix BTF verification of enum members in struct/union" This reverts commit a098516b98fe35e8f0e89709443fff8b37eb04b8. Revert "bpf: fix BTF limits" This reverts commit 794ad07fab9540989f96351c11b039e2229c2a8e. Revert "bpf, btf: fix a missing check bug in btf_parse" This reverts commit 27c4178ecc8edbb2306fa479f275ffd35f5b57c9. Revert "bpf: btf: Fix a missing check bug" This reverts commit 71f5a7d140aa5a37d164e217b2fefcb2d409b894. Revert "bpf: btf: Fix end boundary calculation for type section" This reverts commit 549615befd671b6877677acb009b66cd374408d3. Revert "bpf: fix bpf_skb_load_bytes_relative pkt length check" This reverts commit 5f3d68c4da18dfbcde4c02cb34c63599709fcf3c. Revert "bpf: btf: Ensure the member->offset is in the right order" This reverts commit 4f9d26cbc747a4728c4944b7dc9725fc2737f892. Revert "bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h" This reverts commit 480c6f80a14431f6d680a687363dcb0d9cd1d7a8. Revert "bpf: btf: Fix bitfield extraction for big endian" This reverts commit 0463c259aa21e99d1bf798c8cf54da18b5906938. Revert "bpf: btf: Ensure t->type == 0 for BTF_KIND_FWD" This reverts commit ecc54be6970a3484eb163ac09996856c9ece5727. Revert "bpf: btf: Check array t->size" This reverts commit 3cda848b9be9fbb6dfa8912a425801c263bcbff7. Revert "bpf: btf: avoid -Wreturn-type warning" This reverts commit fd7fede5952004dcacb39f318249c4cf8e5c51e0. Revert "bpf: btf: Avoid variable length array" This reverts commit 2826641eb171c705d0b2db86d8834eff33945d0e. Revert "bpf: btf: Remove unused bits from uapi/linux/btf.h" This reverts commit 2d9e7a574f7e47a027974ec616ac812ad6a2d086. Revert "bpf: btf: Check array->index_type" This reverts commit f9ee68f7e8a471450536a70b43bd96d4bdfbfb81. Revert "bpf: btf: Change how section is supported in btf_header" This reverts commit 63a4474da4bf56c8a700d542bcf3a57a4b737ed6. Revert "bpf: Fix compiler warning on info.map_ids for 32bit platform" This reverts commit a4f706ea7d2b874ef739168a12a30ae5454487a6. Revert "BACKPORT: bpf: Use char in prog and map name" This reverts commit 8d4ad88eabb5d1500814c5f5b76a11f80346669c. Revert "bpf: Change bpf_obj_name_cpy() to better ensure map's name is init by 0" This reverts commit c4acfd3c9f5a97123c240676750f3e4ae2a2c24c. Revert "BACKPORT: bpf: Add map_name to bpf_map_info" This reverts commit 0e03a4e584eabe3f4c448f06f271753cdaae3aab. Revert "BACKPORT: bpf: Add name, load_time, uid and map_ids to bpf_prog_info" This reverts commit 16872f60e6c1fc6b10e905ff18c14d8aaeb4e09d. Revert "bpf: btf: Avoid WARN_ON when CONFIG_REFCOUNT_FULL=y" This reverts commit 0b618ec6e162e650aaa583a31f4de4c4558148bf. Revert "BACKPORT: bpf: btf: Clean up btf.h in uapi" This reverts commit ea0c0ad08c18ddf62dbb6c8edc814c75cbb3e8b9. Revert "bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd" This reverts commit f51fe1d1edb742176c622bc93301e98a1cbf2e63. Revert "BACKPORT: bpf: btf: Add BPF_BTF_LOAD command" This reverts commit 85db8f764069f15d1b181bea67336ce4d66a58c1. Revert "bpf: btf: Add pretty print capability for data with BTF type info" This reverts commit 0a8aae433c53b1f441cab70979517660fb6a6038. Revert "bpf: btf: Check members of struct/union" This reverts commit ce2e8103ac1a977ce32db51ec042faea6f100a3d. Revert "bpf: btf: Validate type reference" This reverts commit a1aa96e6dae2b4c8c0b0a4dedab3006d3f697460. Revert "bpf: Update logging functions to work with BTF" This reverts commit b9289460f0a6b5c261ec0b6dcafa6fcd09d4957e. Revert "BACKPORT: bpf: btf: Introduce BPF Type Format (BTF)" This reverts commit ceebd58f6470e8ec6d9d694ab382fe88f43b998b. Revert "BACKPORT: bpf: Rename bpf_verifer_log" This reverts commit 50bdc7513d966811fb418d24a0e5797ffd8c907c. Revert "BACKPORT: bpf: encapsulate verifier log state into a structure" This reverts commit 0bcb397bde4675fdeb977d9debed20ed213f9ecd. Change-Id: Iecaa276b078c6d2db773a8071e7da9e6195277d6	2025-10-02 22:12:00 +08:00
Daniel Borkmann	3e0f9ad71f	bpf: implement lookup-free direct value access for maps This generic extension to BPF maps allows for directly loading an address residing inside a BPF map value as a single BPF ldimm64 instruction! The idea is similar to what BPF_PSEUDO_MAP_FD does today, which is a special src_reg flag for ldimm64 instruction that indicates that inside the first part of the double insns's imm field is a file descriptor which the verifier then replaces as a full 64bit address of the map into both imm parts. For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following: the first part of the double insns's imm field is again a file descriptor corresponding to the map, and the second part of the imm field is an offset into the value. The verifier will then replace both imm parts with an address that points into the BPF map value at the given value offset for maps that support this operation. Currently supported is array map with single entry. It is possible to support more than just single map element by reusing both 16bit off fields of the insns as a map index, so full array map lookup could be expressed that way. It hasn't been implemented here due to lack of concrete use case, but could easily be done so in future in a compatible way, since both off fields right now have to be 0 and would correctly denote a map index 0. The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of map pointer versus load of map's value at offset 0, and changing BPF_PSEUDO_MAP_FD's encoding into off by one to differ between regular map pointer and map value pointer would add unnecessary complexity and increases barrier for debugability thus less suitable. Using the second part of the imm field as an offset into the value does /not/ come with limitations since maximum possible value size is in u32 universe anyway. This optimization allows for efficiently retrieving an address to a map value memory area without having to issue a helper call which needs to prepare registers according to calling convention, etc, without needing the extra NULL test, and without having to add the offset in an additional instruction to the value base pointer. The verifier then treats the destination register as PTR_TO_MAP_VALUE with constant reg->off from the user passed offset from the second imm field, and guarantees that this is within bounds of the map value. Any subsequent operations are normally treated as typical map value handling without anything extra needed from verification side. The two map operations for direct value access have been added to array map for now. In future other types could be supported as well depending on the use case. The main use case for this commit is to allow for BPF loader support for global variables that reside in .data/.rodata/.bss sections such that we can directly load the address of them with minimal additional infrastructure required. Loader support has been added in subsequent commits for libbpf library. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-17 16:58:07 +08:00
Alexei Starovoitov	b795918c20	bpf: introduce BPF_F_LOCK flag Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands and for map_update() helper function. In all these cases take a lock of existing element (which was provided in BTF description) before copying (in or out) the rest of map value. Implementation details that are part of uapi: Array: The array map takes the element lock for lookup/update. Hash: hash map also takes the lock for lookup/update and tries to avoid the bucket lock. If old element exists it takes the element lock and updates the element in place. If element doesn't exist it allocates new one and inserts into hash table while holding the bucket lock. In rare case the hashmap has to take both the bucket lock and the element lock to update old value in place. Cgroup local storage: It is similar to array. update in place and lookup are done with lock taken. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-09-17 16:58:05 +08:00
Alexei Starovoitov	0b7048f2ba	bpf: introduce bpf_spin_lock Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let bpf program serialize access to other variables. Example: struct hash_elem { int cnt; struct bpf_spin_lock lock; }; struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key); if (val) { bpf_spin_lock(&val->lock); val->cnt++; bpf_spin_unlock(&val->lock); } Restrictions and safety checks: - bpf_spin_lock is only allowed inside HASH and ARRAY maps. - BTF description of the map is mandatory for safety analysis. - bpf program can take one bpf_spin_lock at a time, since two or more can cause dead locks. - only one 'struct bpf_spin_lock' is allowed per map element. It drastically simplifies implementation yet allows bpf program to use any number of bpf_spin_locks. - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed. - bpf program must bpf_spin_unlock() before return. - bpf program can access 'struct bpf_spin_lock' only via bpf_spin_lock()/bpf_spin_unlock() helpers. - load/store into 'struct bpf_spin_lock lock;' field is not allowed. - to use bpf_spin_lock() helper the BTF description of map value must be a struct and have 'struct bpf_spin_lock anyname;' field at the top level. Nested lock inside another struct is not allowed. - syscall map_lookup doesn't copy bpf_spin_lock field to user space. - syscall map_update and program map_update do not update bpf_spin_lock field. - bpf_spin_lock cannot be on the stack or inside networking packet. bpf_spin_lock can only be inside HASH or ARRAY map value. - bpf_spin_lock is available to root only and to all program types. - bpf_spin_lock is not allowed in inner maps of map-in-map. - ld_abs is not allowed inside spin_lock-ed region. - tracing progs and socket filter progs cannot use bpf_spin_lock due to insufficient preemption checks Implementation details: - cgroup-bpf class of programs can nest with xdp/tc programs. Hence bpf_spin_lock is equivalent to spin_lock_irqsave. Other solutions to avoid nested bpf_spin_lock are possible. Like making sure that all networking progs run with softirq disabled. spin_lock_irqsave is the simplest and doesn't add overhead to the programs that don't use it. - arch_spinlock_t is used when its implemented as queued_spin_lock - archs can force their own arch_spinlock_t - on architectures where queued_spin_lock is not available and sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used. - presence of bpf_spin_lock inside map value could have been indicated via extra flag during map_create, but specifying it via BTF is cleaner. It provides introspection for map key/value and reduces user mistakes. Next steps: - allow bpf_spin_lock in other map types (like cgroup local storage) - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper to request kernel to grab bpf_spin_lock before rewriting the value. That will serialize access to map elements. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-09-17 16:58:03 +08:00
Roman Gushchin	382aa77db5	bpf: pass struct btf pointer to the map_check_btf() callback If key_type or value_type are of non-trivial data types (e.g. structure or typedef), it's not possible to check them without the additional information, which can't be obtained without a pointer to the btf structure. So, let's pass btf pointer to the map_check_btf() callbacks. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-17 16:58:02 +08:00
Martin KaFai Lau	fa392d4082	bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id" could cause confusion because the "id" of "btf_id" means the BPF obj id given to the BTF object while "btf_key_id" and "btf_value_id" means the BTF type id within that BTF object. To make it clear, btf_key_id and btf_value_id are renamed to btf_key_type_id and btf_value_type_id. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-09-17 16:58:01 +08:00
Daniel Borkmann	ebe70c600b	bpf: decouple btf from seq bpf fs dump and enable more maps Commit a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") and 699c86d6ec21 ("bpf: btf: add pretty print for hash/lru_hash maps") enabled support for BTF and dumping via BPF fs for array and hash/lru map. However, both can be decoupled from each other such that regular BPF maps can be supported for attaching BTF key/value information, while not all maps necessarily need to dump via map_seq_show_elem() callback. The basic sanity check which is a prerequisite for all maps is that key/value size has to match in any case, and some maps can have extra checks via map_check_btf() callback, e.g. probing certain types or indicating no support in general. With that we can also enable retrieving BTF info for per-cpu map types and lpm. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com>	2025-09-17 16:57:58 +08:00
Martin KaFai Lau	adceed2a9c	bpf: btf: Add pretty print support to the basic arraymap This patch adds pretty print support to the basic arraymap. Support for other bpf maps can be added later. This patch adds new attrs to the BPF_MAP_CREATE command to allow specifying the btf_fd, btf_key_id and btf_value_id. The BPF_MAP_CREATE can then associate the btf to the map if the creating map supports BTF. A BTF supported map needs to implement two new map ops, map_seq_show_elem() and map_check_btf(). This patch has implemented these new map ops for the basic arraymap. It also adds file_operations, bpffs_map_fops, to the pinned map such that the pinned map can be opened and read. After that, the user has an intuitive way to do "cat bpffs/pathto/a-pinned-map" instead of getting an error. bpffs_map_fops should not be extended further to support other operations. Other operations (e.g. write/key-lookup...) should be realized by the userspace tools (e.g. bpftool) through the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc. Follow up patches will allow the userspace to obtain the BTF from a map-fd. Here is a sample output when reading a pinned arraymap with the following map's value: struct map_value { int count_a; int count_b; }; cat /sys/fs/bpf/pinned_array_map: 0: {1,2} 1: {3,4} 2: {5,6} ... Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2025-09-17 16:57:55 +08:00
Hou Tao	ffb6438211	bpf: Add map and need_defer parameters to .map_fd_put_ptr() [ Upstream commit 20c20bd11a0702ce4dc9300c3da58acf551d9725 ] map is the pointer of outer map, and need_defer needs some explanation. need_defer tells the implementation to defer the reference release of the passed element and ensure that the element is still alive before the bpf program, which may manipulate it, exits. The following three cases will invoke map_fd_put_ptr() and different need_defer values will be passed to these callers: 1) release the reference of the old element in the map during map update or map deletion. The release must be deferred, otherwise the bpf program may incur use-after-free problem, so need_defer needs to be true. 2) release the reference of the to-be-added element in the error path of map update. The to-be-added element is not visible to any bpf program, so it is OK to pass false for need_defer parameter. 3) release the references of all elements in the map during map release. Any bpf program which has access to the map must have been exited and released, so need_defer=false will be OK. These two parameters will be used by the following patches to fix the potential use-after-free problem for map-in-map. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 5aa1e7d3f6d0db96c7139677d9e898bbbd6a7dcf) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Change-Id: Ifb87b3a6a590d0deab4aaa4cf5510753a42ef9ce	2024-07-31 15:16:01 +02:00
Greg Kroah-Hartman	f9cf23e1ff	Merge 4.14.79 into android-4.14-p Changes in 4.14.79 xfrm: Validate address prefix lengths in the xfrm selector. xfrm6: call kfree_skb when skb is toobig xfrm: reset transport header back to network header after all input transforms ahave been applied xfrm: reset crypto_done when iterating over multiple input xfrms mac80211: Always report TX status cfg80211: reg: Init wiphy_idx in regulatory_hint_core() mac80211: fix pending queue hang due to TX_DROP cfg80211: Address some corner cases in scan result channel updating mac80211: TDLS: fix skb queue/priority assignment mac80211: fix TX status reporting for ieee80211s xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry. ARM: 8799/1: mm: fix pci_ioremap_io() offset check xfrm: validate template mode netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev arm64: hugetlb: Fix handling of young ptes ARM: dts: BCM63xx: Fix incorrect interrupt specifiers net: macb: Clean 64b dma addresses if they are not detected soc: fsl: qbman: qman: avoid allocating from non existing gen_pool soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift() nl80211: Fix possible Spectre-v1 for NL80211_TXRATE_HT mac80211_hwsim: do not omit multicast announce of first added radio Bluetooth: SMP: fix crash in unpairing pxa168fb: prepare the clock qed: Avoid implicit enum conversion in qed_set_tunn_cls_info qed: Fix mask parameter in qed_vf_prep_tunn_req_tlv qed: Avoid implicit enum conversion in qed_roce_mode_to_flavor qed: Avoid constant logical operation warning in qed_vf_pf_acquire qed: Avoid implicit enum conversion in qed_iwarp_parse_rx_pkt nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds asix: Check for supported Wake-on-LAN modes ax88179_178a: Check for supported Wake-on-LAN modes lan78xx: Check for supported Wake-on-LAN modes sr9800: Check for supported Wake-on-LAN modes r8152: Check for supported Wake-on-LAN Modes smsc75xx: Check for Wake-on-LAN modes smsc95xx: Check for Wake-on-LAN modes cfg80211: fix use-after-free in reg_process_hint() perf/core: Fix perf_pmu_unregister() locking perf/ring_buffer: Prevent concurent ring buffer access perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events net: fec: fix rare tx timeout declance: Fix continuation with the adapter identification message net: qualcomm: rmnet: Skip processing loopback packets locking/ww_mutex: Fix runtime warning in the WW mutex selftest be2net: don't flip hw_features when VXLANs are added/deleted net: cxgb3_main: fix a missing-check bug yam: fix a missing-check bug ocfs2: fix crash in ocfs2_duplicate_clusters_by_page() iwlwifi: mvm: check for short GI only for OFDM iwlwifi: dbg: allow wrt collection before ALIVE iwlwifi: fix the ALIVE notification layout tools/testing/nvdimm: unit test clear-error commands usbip: vhci_hcd: update 'status' file header and format scsi: aacraid: address UBSAN warning regression IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush IB/rxe: put the pool on allocation failure s390/qeth: fix error handling in adapter command callbacks net/mlx5: Fix mlx5_get_vector_affinity function powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=n dm integrity: fail early if required HMAC key is not available net: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b net: phy: Add general dummy stubs for MMD register access net/mlx5e: Refine ets validation function scsi: qla2xxx: Avoid double completion of abort command kbuild: set no-integrated-as before incl. arch Makefile IB/mlx5: Avoid passing an invalid QP type to firmware ARM: tegra: Fix ULPI regression on Tegra20 l2tp: remove configurable payload offset cifs: Use ULL suffix for 64-bit constant test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches KVM: x86: Update the exit_qualification access bits while walking an address sparc64: Fix regression in pmdp_invalidate(). tpm: move the delay_msec increment after sleep in tpm_transmit() bpf: sockmap, map_release does not hold refcnt for pinned maps tpm: tpm_crb: relinquish locality on error path. xen-netfront: Update features after registering netdev xen-netfront: Fix mismatched rtnl_unlock IB/usnic: Update with bug fixes from core code mmc: dw_mmc-rockchip: correct property names in debug MIPS: Workaround GCC __builtin_unreachable reordering bug lan78xx: Don't reset the interface on open enic: do not overwrite error code iio: buffer: fix the function signature to match implementation selftests/powerpc: Add ptrace hw breakpoint test scsi: ibmvfc: Avoid unnecessary port relogin scsi: sd: Remember that READ CAPACITY(16) succeeded btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf net: phy: phylink: Don't release NULL GPIO x86/paravirt: Fix some warning messages net: stmmac: mark PM functions as __maybe_unused kconfig: fix the rule of mainmenu_stmt symbol libertas: call into generic suspend code before turning off power perf tests: Fix indexing when invoking subtests compiler.h: Allow arch-specific asm/compiler.h ARM: dts: imx53-qsb: disable 1.2GHz OPP perf python: Use -Wno-redundant-decls to build with PYTHON=python3 rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window() rxrpc: Only take the rwind and mtu values from latest ACK rxrpc: Fix connection-level abort handling net: ena: fix warning in rmmod caused by double iounmap net: ena: fix NULL dereference due to untimely napi initialization selftests: rtnetlink.sh explicitly requires bash. fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters() sch_netem: restore skb->dev after dequeuing from the rbtree mtd: spi-nor: Add support for is25wp series chips kvm: x86: fix WARN due to uninitialized guest FPU state ARM: dts: r8a7790: Correct critical CPU temperature media: uvcvideo: Fix driver reference counting ALSA: usx2y: Fix invalid stream URBs Revert "netfilter: ipv6: nf_defrag: drop skb dst before queueing" perf tools: Disable parallelism for 'make clean' drm/i915/gvt: fix memory leak of a cmd_entry struct on error exit path bridge: do not add port to router list when receives query with source 0.0.0.0 net: bridge: remove ipv6 zero address check in mcast queries ipv6: mcast: fix a use-after-free in inet6_mc_check ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called llc: set SOCK_RCU_FREE in llc_sap_add_socket() net: fec: don't dump RX FIFO register when not available net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs net: sched: gred: pass the right attribute to gred_change_table_def() net: socket: fix a missing-check bug net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules net: udp: fix handling of CHECKSUM_COMPLETE packets r8169: fix NAPI handling under high load sctp: fix race on sctp_id2asoc udp6: fix encap return code for resubmitting vhost: Fix Spectre V1 vulnerability virtio_net: avoid using netif_tx_disable() for serializing tx routine ethtool: fix a privilege escalation bug bonding: fix length of actor system ip6_tunnel: Fix encapsulation layout openvswitch: Fix push/pop ethernet validation net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type net: sched: Fix for duplicate class dump net: drop skb on failure in ip_check_defrag() net: fix pskb_trim_rcsum_slow() with odd trim offset net/mlx5e: fix csum adjustments caused by RXFCS rtnetlink: Disallow FDB configuration for non-Ethernet device net: ipmr: fix unresolved entry dumps net: bcmgenet: Poll internal PHY for GENETv5 net/sched: cls_api: add missing validation of netlink attributes net/mlx5: Fix build break when CONFIG_SMP=n Linux 4.14.79 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-11-08 07:44:15 -08:00
John Fastabend	3c0cff34e9	bpf: sockmap, map_release does not hold refcnt for pinned maps [ Upstream commit ba6b8de423f8d0dee48d6030288ed81c03ddf9f0 ] Relying on map_release hook to decrement the reference counts when a map is removed only works if the map is not being pinned. In the pinned case the ref is decremented immediately and the BPF programs released. After this BPF programs may not be in-use which is not what the user would expect. This patch moves the release logic into bpf_map_put_uref() and brings sockmap in-line with how a similar case is handled in prog array maps. Fixes: 3d9e952697de ("bpf: sockmap, fix leaking maps with attached but not detached progs") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2018-11-04 14:52:44 +01:00
Greg Kroah-Hartman	4576e0eca9	Merge 4.14.26 into android-4.14 Changes in 4.14.26 bpf: fix mlock precharge on arraymaps bpf: fix memory leak in lpm_trie map_free callback function bpf: fix rcu lockdep warning for lpm_trie map_free callback bpf, x64: implement retpoline for tail call bpf, arm64: fix out of bounds access in tail call bpf: add schedule points in percpu arrays management bpf: allow xadd only on aligned memory bpf, ppc64: fix out of bounds access in tail call KVM: x86: fix backward migration with async_PF Linux 4.14.26 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-03-11 17:37:01 +01:00
Eric Dumazet	e1760b3563	bpf: add schedule points in percpu arrays management [ upstream commit 32fff239de37ef226d5b66329dd133f64d63b22d ] syszbot managed to trigger RCU detected stalls in bpf_array_free_percpu() It takes time to allocate a huge percpu map, but even more time to free it. Since we run in process context, use cond_resched() to yield cpu if needed. Fixes: `a10423b87a` ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-03-11 16:23:22 +01:00
Daniel Borkmann	d9fd73c60b	bpf: fix mlock precharge on arraymaps [ upstream commit 9c2d63b843a5c8a8d0559cc067b5398aa5ec3ffc ] syzkaller recently triggered OOM during percpu map allocation; while there is work in progress by Dennis Zhou to add __GFP_NORETRY semantics for percpu allocator under pressure, there seems also a missing bpf_map_precharge_memlock() check in array map allocation. Given today the actual bpf_map_charge_memlock() happens after the find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock() is there to bail out early before we go and do the map setup work when we find that we hit the limits anyway. Therefore add this for array map as well. Fixes: `6c90598174` ("bpf: pre-allocate hash map elements") Fixes: `a10423b87a` ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Dennis Zhou <dennisszhou@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-03-11 16:23:21 +01:00
Greg Kroah-Hartman	9b68347c35	Merge 4.14.14 into android-4.14 Changes in 4.14.14 dm bufio: fix shrinker scans when (nr_to_scan < retain_target) KVM: Fix stack-out-of-bounds read in write_mmio can: vxcan: improve handling of missing peer name attribute can: gs_usb: fix return value of the "set_bittiming" callback IB/srpt: Disable RDMA access by the initiator IB/srpt: Fix ACL lookup during login MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task MIPS: Factor out NT_PRFPREG regset access helpers MIPS: Guard against any partial write attempt with PTRACE_SETREGSET MIPS: Consistently handle buffer counter with PTRACE_SETREGSET MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC kvm: vmx: Scrub hardware GPRs at VM-exit platform/x86: wmi: Call acpi_wmi_init() later iw_cxgb4: only call the cq comp_handler when the cq is armed iw_cxgb4: atomically flush the qp iw_cxgb4: only clear the ARMED bit if a notification is needed iw_cxgb4: reflect the original WR opcode in drain cqes iw_cxgb4: when flushing, complete all wrs in a chain x86/acpi: Handle SCI interrupts above legacy space gracefully ALSA: pcm: Remove incorrect snd_BUG_ON() usages ALSA: pcm: Workaround for weird PulseAudio behavior on rewind error ALSA: pcm: Add missing error checks in OSS emulation plugin builder ALSA: pcm: Abort properly at pending signal in OSS read/write loops ALSA: pcm: Allow aborting mutex lock at OSS read/write loops ALSA: aloop: Release cable upon open error path ALSA: aloop: Fix inconsistent format due to incomplete rule ALSA: aloop: Fix racy hw constraints adjustment x86/acpi: Reduce code duplication in mp_override_legacy_irq() 8021q: fix a memory leak for VLAN 0 device ip6_tunnel: disable dst caching if tunnel is dual-stack net: core: fix module type in sock_diag_bind phylink: ensure we report link down when LOS asserted RDS: Heap OOB write in rds_message_alloc_sgs() RDS: null pointer dereference in rds_atomic_free_op net: fec: restore dev_id in the cases of probe error net: fec: defer probe if regulator is not ready net: fec: free/restore resource in related probe error pathes sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled sctp: fix the handling of ICMP Frag Needed for too small MTUs sh_eth: fix TSU resource handling net: stmmac: enable EEE in MII, GMII or RGMII only sh_eth: fix SH7757 GEther initialization ipv6: fix possible mem leaks in ipv6_make_skb() ethtool: do not print warning for applications using legacy API mlxsw: spectrum_router: Fix NULL pointer deref net/sched: Fix update of lastuse in act modules implementing stats_update ipv6: sr: fix TLVs not being copied using setsockopt mlxsw: spectrum: Relax sanity checks during enslavement sfp: fix sfp-bus oops when removing socket/upstream membarrier: Disable preemption when calling smp_call_function_many() crypto: algapi - fix NULL dereference in crypto_remove_spawns() mmc: renesas_sdhi: Add MODULE_LICENSE rbd: reacquire lock should update lock owner client id rbd: set max_segments to USHRT_MAX iwlwifi: pcie: fix DMA memory mapping / unmapping x86/microcode/intel: Extend BDW late-loading with a revision check KVM: x86: Add memory barrier on vmcs field lookup KVM: PPC: Book3S PR: Fix WIMG handling under pHyp KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt KVM: PPC: Book3S HV: Fix use after free in case of multiple resize requests KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt() drm/vmwgfx: Don't cache framebuffer maps drm/vmwgfx: Potential off by one in vmw_view_add() drm/i915/gvt: Clear the shadow page table entry after post-sync drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. drm/i915: Move init_clock_gating() back to where it was drm/i915: Fix init_clock_gating for resume bpf: prevent out-of-bounds speculation bpf, array: fix overflow in max_entries and undefined behavior in index_mask bpf: arsh is not supported in 32 bit alu thus reject it USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ USB: serial: cp210x: add new device ID ELV ALC 8xxx usb: misc: usb3503: make sure reset is low for at least 100us USB: fix usbmon BUG trigger USB: UDC core: fix double-free in usb_add_gadget_udc_release usbip: remove kernel addresses from usb device and urb debug msgs usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl Bluetooth: Prevent stack info leak from the EFS element. uas: ignore UAS for Norelsys NS1068(X) chips mux: core: fix double get_device() kdump: write correct address of mem_section into vmcoreinfo apparmor: fix ptrace label match when matching stacked labels e1000e: Fix e1000_check_for_copper_link_ich8lan return value. x86/pti: Unbreak EFI old_memmap x86/Documentation: Add PTI description x86/cpufeatures: Add X86_BUG_SPECTRE_V[12] sysfs/cpu: Add vulnerability folder x86/cpu: Implement CPU vulnerabilites sysfs functions x86/tboot: Unbreak tboot with PTI enabled x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*() x86/cpu/AMD: Make LFENCE a serializing instruction x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC sysfs/cpu: Fix typos in vulnerability documentation x86/alternatives: Fix optimize_nops() checking x86/pti: Make unpoison of pgd for trusted boot work for real objtool: Detect jumps to retpoline thunks objtool: Allow alternatives to be ignored x86/retpoline: Add initial retpoline support x86/spectre: Add boot time option to select Spectre v2 mitigation x86/retpoline/crypto: Convert crypto assembler indirect jumps x86/retpoline/entry: Convert entry assembler indirect jumps x86/retpoline/ftrace: Convert ftrace assembler indirect jumps x86/retpoline/hyperv: Convert assembler indirect jumps x86/retpoline/xen: Convert Xen hypercall indirect jumps x86/retpoline/checksum32: Convert assembler indirect jumps x86/retpoline/irq32: Convert assembler indirect jumps x86/retpoline: Fill return stack buffer on vmexit selftests/x86: Add test_vsyscall x86/pti: Fix !PCID and sanitize defines security/Kconfig: Correct the Documentation reference for PTI x86,perf: Disable intel_bts when PTI x86/retpoline: Remove compile time warning Linux 4.14.14 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>	2018-01-17 10:33:24 +01:00
Daniel Borkmann	67c05d9414	bpf, array: fix overflow in max_entries and undefined behavior in index_mask commit bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1 upstream. syzkaller tried to alloc a map with 0xfffffffd entries out of a userns, and thus unprivileged. With the recently added logic in b2157399cc98 ("bpf: prevent out-of-bounds speculation") we round this up to the next power of two value for max_entries for unprivileged such that we can apply proper masking into potentially zeroed out map slots. However, this will generate an index_mask of 0xffffffff, and therefore a + 1 will let this overflow into new max_entries of 0. This will pass allocation, etc, and later on map access we still enforce on the original attr->max_entries value which was 0xfffffffd, therefore triggering GPF all over the place. Thus bail out on overflow in such case. Moreover, on 32 bit archs roundup_pow_of_two() can also not be used, since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit space is undefined. Therefore, do this by hand in a 64 bit variable. This fixes all the issues triggered by syzkaller's reproducers. Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-17 09:45:25 +01:00
Alexei Starovoitov	a5dbaf8768	bpf: prevent out-of-bounds speculation commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream. Under speculation, CPUs may mis-predict branches in bounds checks. Thus, memory accesses under a bounds check may be speculated even if the bounds check fails, providing a primitive for building a side channel. To avoid leaking kernel data round up array-based maps and mask the index after bounds check, so speculated load with out of bounds index will load either valid value from the array or zero from the padded area. Unconditionally mask index for all array types even when max_entries are not rounded to power of 2 for root user. When map is created by unpriv user generate a sequence of bpf insns that includes AND operation to make sure that JITed code includes the same 'index & index_mask' operation. If prog_array map is created by unpriv user replace bpf_tail_call(ctx, map, index); with if (index >= max_entries) { index &= map->index_mask; bpf_tail_call(ctx, map, index); } (along with roundup to power 2) to prevent out-of-bounds speculation. There is secondary redundant 'if (index >= max_entries)' in the interpreter and in all JITs, but they can be optimized later if necessary. Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array) cannot be used by unpriv, so no changes there. That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on all architectures with and without JIT. v2->v3: Daniel noticed that attack potentially can be crafted via syscall commands without loading the program, so add masking to those paths as well. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-01-17 09:45:25 +01:00
Chenbo Feng	cace572e16	BACKPORT: bpf: Add file mode configuration into bpf maps Introduce the map read/write flags to the eBPF syscalls that returns the map fd. The flags is used to set up the file mode when construct a new file descriptor for bpf maps. To not break the backward capability, the f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise it should be O_RDONLY or O_WRONLY. When the userspace want to modify or read the map content, it will check the file mode to see if it is allowed to make the change. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Bug: 30950746 Change-Id: Icfad20f1abb77f91068d244fb0d87fa40824dd1b (cherry picked from commit 6e71b04a82248ccf13a94b85cbc674a9fefe53f5) Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2017-12-18 21:11:22 +05:30
Daniel Borkmann	bc6d5031b4	bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu allocator. Given we support __GFP_NOWARN now, lets just let the allocation request fail naturally instead. The two call sites from BPF mistakenly assumed __GFP_NOWARN would work, so no changes needed to their actual __alloc_percpu_gfp() calls which use the flag already. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-10-19 13:13:50 +01:00
Daniel Borkmann	7b0c2a0508	bpf: inline map in map lookup functions for array and htab Avoid two successive functions calls for the map in map lookup, first is the bpf_map_lookup_elem() helper call, and second the callback via map->ops->map_lookup_elem() to get to the map in map implementation. Implementation inlines array and htab flavor for map in map lookups. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-19 21:56:34 -07:00
Martin KaFai Lau	96eabe7a40	bpf: Allow selecting numa node during map creation The current map creation API does not allow to provide the numa-node preference. The memory usually comes from where the map-creation-process is running. The performance is not ideal if the bpf_prog is known to always run in a numa node different from the map-creation-process. One of the use case is sharding on CPU to different LRU maps (i.e. an array of LRU maps). Here is the test result of map_perf_test on the INNER_LRU_HASH_PREALLOC test if we force the lru map used by CPU0 to be allocated from a remote numa node: [ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ] ># taskset -c 10 ./map_perf_test 512 8 1260000 8000000 5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec 4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec 3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec 6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec 2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec 1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec 7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec 0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec #<<< After specifying numa node: ># taskset -c 10 ./map_perf_test 512 8 1260000 8000000 5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec 3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec 1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec 6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec 2:inner_lru_hash_map_perf pre-alloc `1614630` events per sec 4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec 7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec 0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<< This patch adds one field, numa_node, to the bpf_attr. Since numa node 0 is a valid node, a new flag BPF_F_NUMA_NODE is also added. The numa_node field is honored if and only if the BPF_F_NUMA_NODE flag is set. Numa node selection is not supported for percpu map. This patch does not change all the kmalloc. F.e. 'htab = kzalloc()' is not changed since the object is small enough to stay in the cache. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-19 21:35:43 -07:00
Martin KaFai Lau	14dc6f04f4	bpf: Add syscall lookup support for fd array and htab This patch allows userspace to do BPF_MAP_LOOKUP_ELEM on BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS. The lookup returns a prog-id or map-id to the userspace. The userspace can then use the BPF_PROG_GET_FD_BY_ID or BPF_MAP_GET_FD_BY_ID to get a fd. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-29 13:13:25 -04:00
Alexei Starovoitov	f91840a32d	perf, bpf: Add BPF support to all perf_event types Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all perf_event types, including HW_CACHE, RAW, and dynamic pmu events. Only tracepoint/kprobe events are treated differently which require BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly. Also add support for reading all event counters using bpf_perf_event_read() helper. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-04 21:58:01 -04:00
Daniel Borkmann	a316338cb7	bpf: fix wrong exposure of map_flags into fdinfo for lpm trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via attr->map_flags, since it does not support preallocation yet. We check the flag, but we never copy the flag into trie->map.map_flags, which is later on exposed into fdinfo and used by loaders such as iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test whether a pinned map has the same spec as the one from the BPF obj file and if not, bails out, which is currently the case for lpm since it exposes always 0 as flags. Also copy over flags in array_map_alloc() and stack_map_alloc(). They always have to be 0 right now, but we should make sure to not miss to copy them over at a later point in time when we add actual flags for them to use. Fixes: `b95a5c4db0` ("bpf: add a longest prefix match trie map implementation") Reported-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-05-25 13:44:28 -04:00
Teng Qin	8fe4592438	bpf: map_get_next_key to return first key on NULL When iterating through a map, we need to find a key that does not exist in the map so map_get_next_key will give us the first key of the map. This often requires a lot of guessing in production systems. This patch makes map_get_next_key return the first key when the key pointer in the parameter is NULL. Signed-off-by: Teng Qin <qinteng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-25 11:57:45 -04:00
Johannes Berg	40077e0cf6	bpf: remove struct bpf_map_type_list There's no need to have struct bpf_map_type_list since it just contains a list_head, the type, and the ops pointer. Since the types are densely packed and not actually dynamically registered, it's much easier and smaller to have an array of type->ops pointer. Also initialize this array statically to remove code needed to initialize it. In order to save duplicating the list, move it to the types header file added by the previous patch and include it in the same fashion. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-11 14:38:43 -04:00
Martin KaFai Lau	56f668dfe0	bpf: Add array of maps support This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-03-22 15:45:45 -07:00

1 2

78 Commits