bka
71 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
5f397569e0 |
UPSTREAM: Revert "bpf: Add map and need_defer parameters to .map_fd_put_ptr()"
This reverts commit eb6f68ec92ab60b0540ebf64fe851e99d846e086 which is commit 20c20bd11a0702ce4dc9300c3da58acf551d9725 upstream. It breaks the Android kernel abi and can be brought back in the future in an abi-safe way if it is really needed. Bug: 161946584 Change-Id: I4611eed3677738ab29469733e2b4f6734ef3d605 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
543ff330a5 |
BACKPORT: treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of version 2 of the gnu general public license as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 64 file(s). Change-Id: Ic7cca08bbba3c38e0d53d3374c43ee8bf1e24172 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
734cff0f05 |
UPSTREAM: bpf: move memory size checks to bpf_map_charge_init()
Most bpf map types doing similar checks and bytes to pages conversion during memory allocation and charging. Let's unify these checks by moving them into bpf_map_charge_init(). Change-Id: I55ceded2303102feba9e485042e8f5169f490609 Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
f2d4363b7c |
UPSTREAM: bpf: rework memlock-based memory accounting for maps
In order to unify the existing memlock charging code with the
memcg-based memory accounting, which will be added later, let's
rework the current scheme.
Currently the following design is used:
1) .alloc() callback optionally checks if the allocation will likely
succeed using bpf_map_precharge_memlock()
2) .alloc() performs actual allocations
3) .alloc() callback calculates map cost and sets map.memory.pages
4) map_create() calls bpf_map_init_memlock() which sets map.memory.user
and performs actual charging; in case of failure the map is
destroyed
<map is in use>
1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which
performs uncharge and releases the user
2) .map_free() callback releases the memory
The scheme can be simplified and made more robust:
1) .alloc() calculates map cost and calls bpf_map_charge_init()
2) bpf_map_charge_init() sets map.memory.user and performs actual
charge
3) .alloc() performs actual allocations
<map is in use>
1) .map_free() callback releases the memory
2) bpf_map_charge_finish() performs uncharge and releases the user
The new scheme also allows to reuse bpf_map_charge_init()/finish()
functions for memcg-based accounting. Because charges are performed
before actual allocations and uncharges after freeing the memory,
no bogus memory pressure can be created.
In cases when the map structure is not available (e.g. it's not
created yet, or is already destroyed), on-stack bpf_map_memory
structure is used. The charge can be transferred with the
bpf_map_charge_move() function.
Change-Id: I299bfa9d3e74f366861b6de3bf17951a1374824b
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
||
|
|
931193c1c1 |
UPSTREAM: bpf: group memory related fields in struct bpf_map_memory
Group "user" and "pages" fields of bpf_map into the bpf_map_memory structure. Later it can be extended with "memcg" and other related information. The main reason for a such change (beside cosmetics) is to pass bpf_map_memory structure to charging functions before the actual allocation of bpf_map. Change-Id: I04e4edf805bfe4c26fce45f7166317fe00dd0dfa Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
7bd4799ebb |
UPSTREAM: bpf: allow for key-less BTF in array map
Given we'll be reusing BPF array maps for global data/bss/rodata
sections, we need a way to associate BTF DataSec type as its map
value type. In usual cases we have this ugly BPF_ANNOTATE_KV_PAIR()
macro hack e.g. via 38d5d3b3d5db ("bpf: Introduce BPF_ANNOTATE_KV_PAIR")
to get initial map to type association going. While more use cases
for it are discouraged, this also won't work for global data since
the use of array map is a BPF loader detail and therefore unknown
at compilation time. For array maps with just a single entry we make
an exception in terms of BTF in that key type is declared optional
if value type is of DataSec type. The latter LLVM is guaranteed to
emit and it also aligns with how we regard global data maps as just
a plain buffer area reusing existing map facilities for allowing
things like introspection with existing tools.
Change-Id: I6fd7e20b453529e07aa1c77beacff4e62c7500bd
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
||
|
|
df4c99d80b |
UPSTREAM: bpf: add program side {rd, wr}only support for maps
This work adds two new map creation flags BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG in order to allow for read-only or write-only BPF maps from a BPF program side. Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only applies to system call side, meaning the BPF program has full read/write access to the map as usual while bpf(2) calls with map fd can either only read or write into the map depending on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows for the exact opposite such that verifier is going to reject program loads if write into a read-only map or a read into a write-only map is detected. For read-only map case also some helpers are forbidden for programs that would alter the map state such as map deletion, update, etc. As opposed to the two BPF_F_RDONLY / BPF_F_WRONLY flags, BPF_F_RDONLY_PROG as well as BPF_F_WRONLY_PROG really do correspond to the map lifetime. We've enabled this generic map extension to various non-special maps holding normal user data: array, hash, lru, lpm, local storage, queue and stack. Further generic map types could be followed up in future depending on use-case. Main use case here is to forbid writes into .rodata map values from verifier side. Change-Id: Iad96790cec92137902fe3ad12f53f1a94d58bc61 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
4e82568bf0 |
BACKPORT: bpf: implement lookup-free direct value access for maps
This generic extension to BPF maps allows for directly loading an address residing inside a BPF map value as a single BPF ldimm64 instruction! The idea is similar to what BPF_PSEUDO_MAP_FD does today, which is a special src_reg flag for ldimm64 instruction that indicates that inside the first part of the double insns's imm field is a file descriptor which the verifier then replaces as a full 64bit address of the map into both imm parts. For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following: the first part of the double insns's imm field is again a file descriptor corresponding to the map, and the second part of the imm field is an offset into the value. The verifier will then replace both imm parts with an address that points into the BPF map value at the given value offset for maps that support this operation. Currently supported is array map with single entry. It is possible to support more than just single map element by reusing both 16bit off fields of the insns as a map index, so full array map lookup could be expressed that way. It hasn't been implemented here due to lack of concrete use case, but could easily be done so in future in a compatible way, since both off fields right now have to be 0 and would correctly denote a map index 0. The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of map pointer versus load of map's value at offset 0, and changing BPF_PSEUDO_MAP_FD's encoding into off by one to differ between regular map pointer and map value pointer would add unnecessary complexity and increases barrier for debugability thus less suitable. Using the second part of the imm field as an offset into the value does /not/ come with limitations since maximum possible value size is in u32 universe anyway. This optimization allows for efficiently retrieving an address to a map value memory area without having to issue a helper call which needs to prepare registers according to calling convention, etc, without needing the extra NULL test, and without having to add the offset in an additional instruction to the value base pointer. The verifier then treats the destination register as PTR_TO_MAP_VALUE with constant reg->off from the user passed offset from the second imm field, and guarantees that this is within bounds of the map value. Any subsequent operations are normally treated as typical map value handling without anything extra needed from verification side. The two map operations for direct value access have been added to array map for now. In future other types could be supported as well depending on the use case. The main use case for this commit is to allow for BPF loader support for global variables that reside in .data/.rodata/.bss sections such that we can directly load the address of them with minimal additional infrastructure required. Loader support has been added in subsequent commits for libbpf library. Change-Id: I51974f2fe227ba837b338b8b3ebb44c145583673 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
cd34c35941 |
UPSTREAM: bpf: introduce BPF_F_LOCK flag
Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands and for map_update() helper function. In all these cases take a lock of existing element (which was provided in BTF description) before copying (in or out) the rest of map value. Implementation details that are part of uapi: Array: The array map takes the element lock for lookup/update. Hash: hash map also takes the lock for lookup/update and tries to avoid the bucket lock. If old element exists it takes the element lock and updates the element in place. If element doesn't exist it allocates new one and inserts into hash table while holding the bucket lock. In rare case the hashmap has to take both the bucket lock and the element lock to update old value in place. Cgroup local storage: It is similar to array. update in place and lookup are done with lock taken. Change-Id: I76b13e23e1f6241c1f919a1c24650530f7705d9e Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
faa07c7e59 |
BACKPORT: bpf: introduce bpf_spin_lock
Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
bpf program serialize access to other variables.
Example:
struct hash_elem {
int cnt;
struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
if (val) {
bpf_spin_lock(&val->lock);
val->cnt++;
bpf_spin_unlock(&val->lock);
}
Restrictions and safety checks:
- bpf_spin_lock is only allowed inside HASH and ARRAY maps.
- BTF description of the map is mandatory for safety analysis.
- bpf program can take one bpf_spin_lock at a time, since two or more can
cause dead locks.
- only one 'struct bpf_spin_lock' is allowed per map element.
It drastically simplifies implementation yet allows bpf program to use
any number of bpf_spin_locks.
- when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
- bpf program must bpf_spin_unlock() before return.
- bpf program can access 'struct bpf_spin_lock' only via
bpf_spin_lock()/bpf_spin_unlock() helpers.
- load/store into 'struct bpf_spin_lock lock;' field is not allowed.
- to use bpf_spin_lock() helper the BTF description of map value must be
a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
Nested lock inside another struct is not allowed.
- syscall map_lookup doesn't copy bpf_spin_lock field to user space.
- syscall map_update and program map_update do not update bpf_spin_lock field.
- bpf_spin_lock cannot be on the stack or inside networking packet.
bpf_spin_lock can only be inside HASH or ARRAY map value.
- bpf_spin_lock is available to root only and to all program types.
- bpf_spin_lock is not allowed in inner maps of map-in-map.
- ld_abs is not allowed inside spin_lock-ed region.
- tracing progs and socket filter progs cannot use bpf_spin_lock due to
insufficient preemption checks
Implementation details:
- cgroup-bpf class of programs can nest with xdp/tc programs.
Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
Other solutions to avoid nested bpf_spin_lock are possible.
Like making sure that all networking progs run with softirq disabled.
spin_lock_irqsave is the simplest and doesn't add overhead to the
programs that don't use it.
- arch_spinlock_t is used when its implemented as queued_spin_lock
- archs can force their own arch_spinlock_t
- on architectures where queued_spin_lock is not available and
sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
- presence of bpf_spin_lock inside map value could have been indicated via
extra flag during map_create, but specifying it via BTF is cleaner.
It provides introspection for map key/value and reduces user mistakes.
Next steps:
- allow bpf_spin_lock in other map types (like cgroup local storage)
- introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
to request kernel to grab bpf_spin_lock before rewriting the value.
That will serialize access to map elements.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: Id03322189a8f05c006a05479f7078b23c8c020ea
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
||
|
|
5d704c9b1a |
UPSTREAM: bpf: pass struct btf pointer to the map_check_btf() callback
If key_type or value_type are of non-trivial data types (e.g. structure or typedef), it's not possible to check them without the additional information, which can't be obtained without a pointer to the btf structure. So, let's pass btf pointer to the map_check_btf() callbacks. Change-Id: I95716060b450288d4ffcbe231d1cf5fdb530e292 Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
f13ca2b0ba |
UPSTREAM: bpf: return EOPNOTSUPP when map lookup isn't supported
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below map types: - BPF_MAP_TYPE_PROG_ARRAY - BPF_MAP_TYPE_STACK_TRACE - BPF_MAP_TYPE_XSKMAP - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH Change-Id: I13937c36055b419f4446d8bfa06f139c757480c9 Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
5ae0a28ff6 |
UPSTREAM: bpf: add bpffs pretty print for program array map
Added bpffs pretty print for program array map. For a particular array index, if the program array points to a valid program, the "<index>: <prog_id>" will be printed out like 0: 6 which means bpf program with id "6" is installed at index "0". Change-Id: Ibfeac1777df6dc8742debe574ba259d212e7ecea Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
0ef3db4770 |
UPSTREAM: bpf: add bpffs pretty print for percpu arraymap/hash/lru_hash
Added bpffs pretty print for percpu arraymap, percpu hashmap
and percpu lru hashmap.
For each map <key, value> pair, the format is:
<key_value>: {
cpu0: <value_on_cpu0>
cpu1: <value_on_cpu1>
...
cpun: <value_on_cpun>
}
For example, on my VM, there are 4 cpus, and
for test_btf test in the next patch:
cat /sys/fs/bpf/pprint_test_percpu_hash
You may get:
...
43602: {
cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
}
72847: {
cpu0: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
cpu1: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
cpu2: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
cpu3: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
}
...
Change-Id: I286e7505765aa92ea9a8919ddecf8434a24fc187
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
||
|
|
60d2469ae3 |
UPSTREAM: bpf: Check percpu map value size first
[ Upstream commit 1d244784be6b01162b732a5a7d637dfc024c3203 ] Percpu map is often used, but the map value size limit often ignored, like issue: https://github.com/iovisor/bcc/issues/2519. Actually, percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first, like percpu map of local_storage. Maybe the error message seems clearer compared with "cannot allocate memory". Change-Id: Iadd0604341b9af7bbada45665f06ae1e529a7882 Signed-off-by: Jinke Han <jinkehan@didiglobal.com> Signed-off-by: Tao Chen <chen.dylane@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240910144111.1464912-2-chen.dylane@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
|
2819daae82 |
UPSTREAM: bpf: decouple btf from seq bpf fs dump and enable more maps
Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
the basic arraymap") and 699c86d6ec21 ("bpf: btf: add pretty
print for hash/lru_hash maps") enabled support for BTF and
dumping via BPF fs for array and hash/lru map. However, both
can be decoupled from each other such that regular BPF maps
can be supported for attaching BTF key/value information,
while not all maps necessarily need to dump via map_seq_show_elem()
callback.
The basic sanity check which is a prerequisite for all maps
is that key/value size has to match in any case, and some maps
can have extra checks via map_check_btf() callback, e.g.
probing certain types or indicating no support in general. With
that we can also enable retrieving BTF info for per-cpu map
types and lpm.
Change-Id: Ic26e072b39b443f3ffd9c1b53a1cd45a3ecea360
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
|
||
|
|
a95060b1cd |
BACKPORT: bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY
This patch introduces a new map type BPF_MAP_TYPE_REUSEPORT_SOCKARRAY. To unleash the full potential of a bpf prog, it is essential for the userspace to be capable of directly setting up a bpf map which can then be consumed by the bpf prog to make decision. In this case, decide which SO_REUSEPORT sk to serve the incoming request. By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control and visibility on where a SO_REUSEPORT sk should be located in a bpf map. The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that the bpf prog can directly select a sk from the bpf map. That will raise the programmability of the bpf prog attached to a reuseport group (a group of sk serving the same IP:PORT). For example, in UDP, the bpf prog can peek into the payload (e.g. through the "data" pointer introduced in the later patch) to learn the application level's connection information and then decide which sk to pick from a bpf map. The userspace can tightly couple the sk's location in a bpf map with the application logic in generating the UDP payload's connection information. This connection info contact/API stays within the userspace. Also, when used with map-in-map, the userspace can switch the old-server-process's inner map to a new-server-process's inner map in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)". The bpf prog will then direct incoming requests to the new process instead of the old process. The old process can finish draining the pending requests (e.g. by "accept()") before closing the old-fds. [Note that deleting a fd from a bpf map does not necessary mean the fd is closed] During map_update_elem(), Only SO_REUSEPORT sk (i.e. which has already been added to a reuse->socks[]) can be used. That means a SO_REUSEPORT sk that is "bind()" for UDP or "bind()+listen()" for TCP. These conditions are ensured in "reuseport_array_update_check()". A SO_REUSEPORT sk can only be added once to a map (i.e. the same sk cannot be added twice even to the same map). SO_REUSEPORT already allows another sk to be created for the same IP:PORT. There is no need to re-create a similar usage in the BPF side. When a SO_REUSEPORT is deleted from the "reuse->socks[]" (e.g. "close()"), it will notify the bpf map to remove it from the map also. It is done through "bpf_sk_reuseport_detach()" and it will only be called if >=1 of the "reuse->sock[]" has ever been added to a bpf map. The map_update()/map_delete() has to be in-sync with the "reuse->socks[]". Hence, the same "reuseport_lock" used by "reuse->socks[]" has to be used here also. Care has been taken to ensure the lock is only acquired when the adding sk passes some strict tests. and freeing the map does not require the reuseport_lock. The reuseport_array will also support lookup from the syscall side. It will return a sock_gen_cookie(). The sock_gen_cookie() is on-demand (i.e. a sk's cookie is not generated until the very first map_lookup_elem()). The lookup cookie is 64bits but it goes against the logical userspace expectation on 32bits sizeof(fd) (and as other fd based bpf maps do also). It may catch user in surprise if we enforce value_size=8 while userspace still pass a 32bits fd during update. Supporting different value_size between lookup and update seems unintuitive also. We also need to consider what if other existing fd based maps want to return 64bits value from syscall's lookup in the future. Hence, reuseport_array supports both value_size 4 and 8, and assuming user will usually use value_size=4. The syscall's lookup will return ENOSPC on value_size=4. It will will only return 64bits value from sock_gen_cookie() when user consciously choose value_size=8 (as a signal that lookup is desired) which then requires a 64bits value in both lookup and update. Change-Id: Iacf628194cf89b00b45a75ced7d2dbddfb4c6a8a Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
42f28b95a6 |
BACKPORT: bpf: btf: Use exact btf value_size match in map_check_btf()
The current map_check_btf() in BPF_MAP_TYPE_ARRAY rejects
'> map->value_size' to ensure map_seq_show_elem() will not
access things beyond an array element.
Yonghong suggested that using '!=' is a more correct
check. The 8 bytes round_up on value_size is stored
in array->elem_size. Hence, using '!=' on map->value_size
is a proper check.
This patch also adds new tests to check the btf array
key type and value type. Two of these new tests verify
the btf's value_size (the change in this patch).
It also fixes two existing tests that wrongly encoded
a btf's type size (pprint_test) and the value_type_id (in one
of the raw_tests[]). However, that do not affect these two
BTF verification tests before or after this test changes.
These two tests mainly failed at array creation time after
this patch.
Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap")
Suggested-by: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Change-Id: I0ded0933cb7ba85373a700f746e9a2901db791e1
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
||
|
|
71df69a668 |
UPSTREAM: bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info
In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id" could cause confusion because the "id" of "btf_id" means the BPF obj id given to the BTF object while "btf_key_id" and "btf_value_id" means the BTF type id within that BTF object. To make it clear, btf_key_id and btf_value_id are renamed to btf_key_type_id and btf_value_type_id. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Change-Id: Ib10a5ee00041fb5b3ce1c80a407bb9277386c85b Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
cec9a42456 |
UPSTREAM: bpf: btf: Add pretty print support to the basic arraymap
This patch adds pretty print support to the basic arraymap.
Support for other bpf maps can be added later.
This patch adds new attrs to the BPF_MAP_CREATE command to allow
specifying the btf_fd, btf_key_id and btf_value_id. The
BPF_MAP_CREATE can then associate the btf to the map if
the creating map supports BTF.
A BTF supported map needs to implement two new map ops,
map_seq_show_elem() and map_check_btf(). This patch has
implemented these new map ops for the basic arraymap.
It also adds file_operations, bpffs_map_fops, to the pinned
map such that the pinned map can be opened and read.
After that, the user has an intuitive way to do
"cat bpffs/pathto/a-pinned-map" instead of getting
an error.
bpffs_map_fops should not be extended further to support
other operations. Other operations (e.g. write/key-lookup...)
should be realized by the userspace tools (e.g. bpftool) through
the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
Follow up patches will allow the userspace to obtain
the BTF from a map-fd.
Here is a sample output when reading a pinned arraymap
with the following map's value:
struct map_value {
int count_a;
int count_b;
};
cat /sys/fs/bpf/pinned_array_map:
0: {1,2}
1: {3,4}
2: {5,6}
...
Change-Id: I57605a80ce783f8545f7947d0a3445ccd81147e4
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
||
|
|
ffcf8f77e1 |
BACKPORT: bpf: arraymap: use bpf_map_init_from_attr()
Arraymap was not converted to use bpf_map_init_from_attr() to avoid merge conflicts with emergency fixes. Do it now. Change-Id: I6e385e0af32a1837b23d352c19824492f8c86764 Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
4a3928272b |
UPSTREAM: bpf: arraymap: move checks out of alloc function
Use the new callback to perform allocation checks for array maps. The fd maps don't need a special allocation callback, they only need a special check callback. Change-Id: Idaed60c9b368f29ce7994a37d8bb381958ab0aeb Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
6e5e3fdc13 |
BACKPORT: bpf: perf event change needed for subsequent bpf helpers
This patch does not impact existing functionalities. It contains the changes in perf event area needed for subsequent bpf_perf_event_read_value and bpf_perf_prog_read_value helpers. Change-Id: I3cb46eff8eb241ad744ab796bc3300e7560fe05d Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
f5a1de2a76 |
Merge tag 'v4.14.340-openela' into android13-4.14-msmnile
This is the 4.14.340 OpenELA-Extended LTS stable release
* tag 'v4.14.340-openela':
LTS: Update to 4.14.340
fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()
PCI/MSI: Prevent MSI hardware interrupt number truncation
s390: use the correct count for __iowrite64_copy()
packet: move from strlcpy with unused retval to strscpy
ipv6: sr: fix possible use-after-free and null-ptr-deref
nouveau: fix function cast warnings
scsi: jazz_esp: Only build if SCSI core is builtin
RDMA/srpt: fix function pointer cast warnings
RDMA/srpt: Support specifying the srpt_service_guid parameter
IB/hfi1: Fix a memleak in init_credit_return
usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
l2tp: pass correct message length to ip6_append_data
gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp()
dm-crypt: don't modify the data when using authenticated encryption
mm: memcontrol: switch to rcu protection in drain_all_stock()
s390/qeth: Fix potential loss of L3-IP@ in case of network issues
virtio-blk: Ensure no requests in virtqueues before deleting vqs.
firewire: core: send bus reset promptly on gap count error
hwmon: (coretemp) Enlarge per package core count limit
regulator: pwm-regulator: Add validity checks in continuous .get_voltage
ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
ahci: asm1166: correct count of reported ports
fbdev: sis: Error out if pixclock equals zero
fbdev: savage: Error out if pixclock equals zero
wifi: mac80211: fix race condition on enabling fast-xmit
wifi: cfg80211: fix missing interfaces when dumping
dmaengine: shdma: increase size of 'dev_id'
scsi: target: core: Add TMF to tmr_list handling
sched/rt: Disallow writing invalid values to sched_rt_period_us
sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset
sched/rt: Fix sysctl_sched_rr_timeslice intial value
nilfs2: replace WARN_ONs for invalid DAT metadata block requests
memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock()
net/sched: Retire dsmark qdisc
net/sched: Retire ATM qdisc
net/sched: Retire CBQ qdisc
LTS: Update to 4.14.339
netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()
lsm: new security_file_ioctl_compat() hook
nilfs2: fix potential bug in end_buffer_async_write
sched/membarrier: reduce the ability to hammer on sys_membarrier
Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"
pmdomain: core: Move the unused cleanup to a _sync initcall
irqchip/irq-brcmstb-l2: Add write memory barrier before exit
nfp: use correct macro for LengthSelect in BAR config
nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
nilfs2: fix data corruption in dsync block recovery for small block sizes
ALSA: hda/conexant: Add quirk for SWS JS201D
x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
staging: iio: ad5933: fix type mismatch regression
ext4: fix double-free of blocks due to wrong extents moved_len
xen-netback: properly sync TX responses
nfc: nci: free rx_data_reassembly skb on NCI device cleanup
firewire: core: correct documentation of fw_csr_string() kernel API
scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
usb: f_mass_storage: forbid async queue when shutdown happen
USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
HID: wacom: Do not register input devices until after hid_hw_start
HID: wacom: generic: Avoid reporting a serial of '0' to userspace
mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
tracing/trigger: Fix to return error if failed to alloc snapshot
i40e: Fix waiting for queues of all VSIs to be disabled
MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
net: sysfs: Fix /sys/class/net/<iface> path for statistics
Documentation: net-sysfs: describe missing statistics
ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
spi: ppc4xx: Drop write-only variable
btrfs: send: return EOPNOTSUPP on unknown flags
vhost: use kzalloc() instead of kmalloc() followed by memset()
Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
USB: serial: cp210x: add ID for IMST iM871A-USB
USB: serial: option: add Fibocom FM101-GL variant
USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
net/af_iucv: clean up a try_then_request_module()
netfilter: nft_compat: restrict match/target protocol to u16
netfilter: nft_compat: reject unused compat flag
ppp_async: limit MRU to 64K
tipc: Check the bearer type before calling tipc_udp_nl_bearer_add()
rxrpc: Fix response to PING RESPONSE ACKs to a dead call
inet: read sk->sk_family once in inet_recv_error()
hwmon: (aspeed-pwm-tacho) mutex for tach reading
atm: idt77252: fix a memleak in open_card_ubr0
phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP
dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV
bonding: remove print in bond_verify_device_path
HID: apple: Add 2021 magic keyboard FN key mapping
HID: apple: Add support for the 2021 Magic Keyboard
HID: apple: Swap the Fn and Left Control keys on Apple keyboards
net: sysfs: Fix /sys/class/net/<iface> path
af_unix: fix lockdep positive in sk_diag_dump_icons()
net: ipv4: fix a memleak in ip_setup_cork
net: Fix one possible memleak in ip_setup_cork
netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
llc: call sock_orphan() at release time
ipv6: Ensure natural alignment of const ipv6 loopback and router addresses
ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550()
ixgbe: Refactor overtemp event handling
ixgbe: Remove non-inclusive language
net: remove unneeded break
scsi: isci: Fix an error code problem in isci_io_request_build()
wifi: cfg80211: fix RCU dereference in __cfg80211_bss_update
drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
ceph: fix deadlock or deadcode of misusing dget()
virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings
libsubcmd: Fix memory leak in uniq()
usb: hub: Replace hardcoded quirk value with BIT() macro
PCI: Only override AMD USB controller if required
mfd: ti_am335x_tscadc: Fix TI SoC dependencies
um: net: Fix return type of uml_net_start_xmit()
um: Don't use vfprintf() for os_info()
um: Fix naming clash between UML and scheduler
leds: trigger: panic: Don't register panic notifier if creating the trigger failed
clk: mmp: pxa168: Fix memory leak in pxa168_clk_init()
clk: hi3620: Fix memory leak in hi3620_mmc_clk_init()
media: ddbridge: fix an error code problem in ddb_probe
IB/ipoib: Fix mcast list locking
drm/exynos: Call drm_atomic_helper_shutdown() at shutdown/unbind time
ALSA: hda: Intel: add HDA_ARL PCI ID support
ALSA: hda: Add Icelake PCI ID
PCI: add INTEL_HDA_ARL to pci_ids.h
media: stk1160: Fixed high volume of stk1160_dbg messages
drm/mipi-dsi: Fix detach call without attach
drm/framebuffer: Fix use of uninitialized variable
drm/drm_file: fix use of uninitialized variable
RDMA/IPoIB: Fix error code return in ipoib_mcast_join
fast_dput(): handle underflows gracefully
ASoC: doc: Fix undefined SND_SOC_DAPM_NOPM argument
wifi: cfg80211: free beacon_ies when overridden from hidden BSS
wifi: rtlwifi: rtl8723{be,ae}: using calculate_bit_shift()
wifi: rtl8xxxu: Add additional USB IDs for RTL8192EU devices
md: Whenassemble the array, consult the superblock of the freshest device
ARM: dts: imx23/28: Fix the DMA controller node name
ARM: dts: imx23-sansa: Use preferred i2c-gpios properties
ARM: dts: imx27-apf27dev: Fix LED name
ARM: dts: imx1: Fix sram node
ARM: dts: imx27: Fix sram node
ARM: dts: imx: Use flash@0,0 pattern
ARM: dts: imx25/27-eukrea: Fix RTC node name
ARM: dts: rockchip: fix rk3036 hdmi ports node
scsi: libfc: Fix up timeout error in fc_fcp_rec_error()
scsi: libfc: Don't schedule abort twice
bpf: Add map and need_defer parameters to .map_fd_put_ptr()
wifi: ath9k: Fix potential array-index-out-of-bounds read in ath9k_htc_txstatus()
ARM: dts: imx7s: Fix nand-controller #size-cells
ARM: dts: imx7s: Fix lcdif compatible
bonding: return -ENOMEM instead of BUG in alb_upper_dev_walk
PCI: Add no PM reset quirk for NVIDIA Spectrum devices
scsi: lpfc: Fix possible file string name overflow when updating firmware
ext4: unify the type of flexbg_size to unsigned int
SUNRPC: Fix a suspicious RCU usage warning
KVM: s390: fix setting of fpc register
s390/ptrace: handle setting of fpc register correctly
jfs: fix array-index-out-of-bounds in diNewExt
rxrpc_find_service_conn_rcu: fix the usage of read_seqbegin_or_lock()
pstore/ram: Fix crash when setting number of cpus to an odd number
jfs: fix uaf in jfs_evict_inode
jfs: fix array-index-out-of-bounds in dbAdjTree
jfs: fix slab-out-of-bounds Read in dtSearch
UBSAN: array-index-out-of-bounds in dtSplitRoot
FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree
ACPI: extlog: fix NULL pointer dereference check
PNP: ACPI: fix fortify warning
ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop
audit: Send netlink ACK before setting connection in auditd_set
powerpc/lib: Validate size for vector operations
powerpc/mm: Fix build failures due to arch_reserved_kernel_pages()
powerpc: Fix build error due to is_valid_bugaddr()
powerpc/mm: Fix null-pointer dereference in pgtable_cache_add
tick/sched: Preserve number of idle sleeps across CPU hotplug events
mips: Call lose_fpu(0) before initializing fcr31 in mips_set_personality_nan
drm/bridge: nxp-ptn3460: simplify some error checking
drm/bridge: nxp-ptn3460: fix i2c_master_send() error checking
drm: Don't unref the same fb many times by mistake due to deadlock handling
gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-04
netfilter: nf_tables: reject QUEUE/DROP verdict parameters
btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args
btrfs: don't warn if discard range is not aligned to sector
net: fec: fix the unhandled context fault from smmu
fjes: fix memleaks in fjes_hw_setup
netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
net/mlx5e: fix a double-free in arfs_create_groups
net/mlx5: Use kfree(ft->g) in arfs_create_groups()
netlink: fix potential sleeping issue in mqueue_flush_file
tcp: Add memory barrier to tcp_push()
net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv
llc: Drop support for ETH_P_TR_802_2.
llc: make llc_ui_sendmsg() more robust against bonding changes
vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING
drivers: core: fix kernel-doc markup for dev_err_probe()
driver code: print symbolic error code
Revert "driver core: Annotate dev_err_probe() with __must_check"
driver core: Annotate dev_err_probe() with __must_check
x86/CPU/AMD: Fix disabling XSAVES on AMD family 0x17 due to erratum
powerpc: Use always instead of always-y in for crtsavres.o
block: Remove special-casing of compound pages
parisc/firmware: Fix F-extend for PDC addresses
rpmsg: virtio: Free driver_override when rpmsg_remove()
hwrng: core - Fix page fault dead lock on mmap-ed hwrng
PM: hibernate: Enforce ordering during image compression/decompression
crypto: api - Disallow identical driver names
serial: sc16is7xx: add check for unsupported SPI modes during probe
spi: introduce SPI_MODE_X_MASK macro
driver core: add device probe log helper
serial: sc16is7xx: set safe default SPI clock frequency
units: add the HZ macros
units: change from 'L' to 'UL'
units: Add Watt units
include/linux/units.h: add helpers for kelvin to/from Celsius conversion
PCI: mediatek: Clear interrupt status before dispatching handler
LTS: Update to 4.14.338
crypto: scompress - initialize per-CPU variables on each CPU
Revert "NFSD: Fix possible sleep during nfsd4_release_lockowner()"
i2c: s3c24xx: fix transferring more than one message in polling mode
i2c: s3c24xx: fix read transfers in polling mode
kdb: Fix a potential buffer overflow in kdb_local()
kdb: Censor attempts to set PROMPT without ENABLE_MEM_READ
ipvs: avoid stat macros calls from preemptible context
net: ravb: Fix dma_addr_t truncation in error case
serial: imx: Correct clock error message in function probe()
apparmor: avoid crash when parsed profile name is empty
MIPS: Alchemy: Fix an out-of-bound access in db1550_dev_setup()
MIPS: Alchemy: Fix an out-of-bound access in db1200_dev_setup()
HID: wacom: Correct behavior when processing some confidence == false touches
wifi: mwifiex: configure BSSID consistently when starting AP
wifi: rtlwifi: Convert LNKCTL change to PCIe cap RMW accessors
wifi: rtlwifi: Remove bogus and dangerous ASPM disable/enable code
fbdev: flush deferred work in fb_deferred_io_fsync()
ALSA: oxygen: Fix right channel of capture volume mixer
usb: mon: Fix atomicity violation in mon_bin_vma_fault
usb: chipidea: wait controller resume finished for wakeup irq
usb: dwc: ep0: Update request status in dwc3_ep0_stall_restart
usb: phy: mxs: remove CONFIG_USB_OTG condition for mxs_phy_is_otg_host()
tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
binder: fix unused alloc->free_async_space
binder: fix race between mmput() and do_exit()
xen-netback: don't produce zero-size SKB frags
Input: atkbd - use ab83 as id when skipping the getid command
binder: fix async space check for 0-sized buffers
watchdog: bcm2835_wdt: Fix WDIOC_SETTIMEOUT handling
watchdog: set cdev owner before adding
gpu/drm/radeon: fix two memleaks in radeon_vm_init
drivers/amd/pm: fix a use-after-free in kv_parse_power_table
drm/amd/pm: fix a double-free in si_dpm_init
media: dvbdev: drop refcount on error path in dvb_device_open()
media: cx231xx: fix a memleak in cx231xx_init_isoc
drm/radeon/trinity_dpm: fix a memleak in trinity_parse_power_table
drm/radeon/dpm: fix a memleak in sumo_parse_power_table
drm/radeon: check the alloc_workqueue return value in radeon_crtc_init()
drm/drv: propagate errors from drm_modeset_register_all()
drm/msm/mdp4: flush vblank event on disable
ASoC: cs35l34: Fix GPIO name and drop legacy include
ASoC: cs35l33: Fix GPIO name and drop legacy include
drm/radeon: check return value of radeon_ring_lock()
drm/radeon/r100: Fix integer overflow issues in r100_cs_track_check()
drm/radeon/r600_cs: Fix possible int overflows in r600_cs_check_reg()
f2fs: fix to avoid dirent corruption
drm/bridge: Fix typo in post_disable() description
media: pvrusb2: fix use after free on context disconnection
RDMA/usnic: Silence uninitialized symbol smatch warnings
ip6_tunnel: fix NEXTHDR_FRAGMENT handling in ip6_tnl_parse_tlv_enc_lim()
Bluetooth: Fix bogus check for re-auth no supported with non-ssp
wifi: rtlwifi: rtl8192se: using calculate_bit_shift()
wifi: rtlwifi: rtl8192ee: using calculate_bit_shift()
wifi: rtlwifi: rtl8192de: using calculate_bit_shift()
rtlwifi: rtl8192de: make arrays static const, makes object smaller
wifi: rtlwifi: rtl8192ce: using calculate_bit_shift()
wifi: rtlwifi: rtl8192cu: using calculate_bit_shift()
wifi: rtlwifi: rtl8192c: using calculate_bit_shift()
wifi: rtlwifi: rtl8188ee: phy: using calculate_bit_shift()
wifi: rtlwifi: add calculate_bit_shift()
wifi: rtlwifi: rtl8821ae: phy: fix an undefined bitwise shift behavior
rtlwifi: Use ffs in <foo>_phy_calculate_bit_shift
firmware: ti_sci: Fix an off-by-one in ti_sci_debugfs_create()
net/ncsi: Fix netlink major/minor version numbers
ncsi: internal.h: Fix a spello
wifi: libertas: stop selecting wext
bpf, lpm: Fix check prefixlen before walking trie
NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICT
crypto: scomp - fix req->dst buffer overflow
crypto: scompress - Use per-CPU struct instead multiple variables
crypto: scompress - return proper error code for allocation failure
crypto: sahara - do not resize req->src when doing hash operations
crypto: sahara - fix processing hash requests with req->nbytes < sg->length
crypto: sahara - improve error handling in sahara_sha_process()
crypto: sahara - fix wait_for_completion_timeout() error handling
crypto: sahara - fix ahash reqsize
crypto: virtio - Wait for tasklet to complete on device remove
pstore: ram_core: fix possible overflow in persistent_ram_init_ecc()
crypto: sahara - fix error handling in sahara_hw_descriptor_create()
crypto: sahara - fix processing requests with cryptlen < sg->length
crypto: sahara - fix ahash selftest failure
crypto: sahara - remove FLAGS_NEW_KEY logic
crypto: af_alg - Disallow multiple in-flight AIO requests
crypto: ccp - fix memleak in ccp_init_dm_workarea
crypto: virtio - Handle dataq logic with tasklet
mtd: Fix gluebi NULL pointer dereference caused by ftl notifier
calipso: fix memory leak in netlbl_calipso_add_pass()
netlabel: remove unused parameter in netlbl_netlink_auditinfo()
net: netlabel: Fix kerneldoc warnings
ACPI: video: check for error while searching for backlight device parent
mtd: rawnand: Increment IFC_TIMEOUT_MSECS for nand controller response
powerpc/imc-pmu: Add a null pointer check in update_events_in_group()
powerpc/powernv: Add a null pointer check in opal_event_init()
selftests/powerpc: Fix error handling in FPU/VMX preemption tests
powerpc/pseries/memhp: Fix access beyond end of drmem array
powerpc/pseries/memhotplug: Quieten some DLPAR operations
powerpc/44x: select I2C for CURRITUCK
powerpc: remove redundant 'default n' from Kconfig-s
powerpc: add crtsavres.o to always-y instead of extra-y
EDAC/thunderx: Fix possible out-of-bounds string access
x86/lib: Fix overflow when counting digits
coresight: etm4x: Fix width of CCITMIN field
uio: Fix use-after-free in uio_open
binder: fix comment on binder_alloc_new_buf() return value
drm/crtc: fix uninitialized variable use
Input: xpad - add Razer Wolverine V2 support
ARC: fix spare error
s390/scm: fix virtual vs physical address confusion
Input: atkbd - skip ATKBD_CMD_GETID in translated mode
reset: hisilicon: hi6220: fix Wvoid-pointer-to-enum-cast warning
ring-buffer: Do not record in NMI if the arch does not support cmpxchg in NMI
tracing: Add size check when printing trace_marker output
tracing: Have large events show up as '[LINE TOO BIG]' instead of nothing
drm/crtc: Fix uninit-value bug in drm_mode_setcrtc
jbd2: correct the printing of write_flags in jbd2_write_superblock()
clk: rockchip: rk3128: Fix HCLK_OTG gate register
drm/exynos: fix a potential error pointer dereference
ASoC: da7219: Support low DC impedance headset
net/tg3: fix race condition in tg3_reset_task()
ASoC: rt5650: add mutex to avoid the jack detection failure
ASoC: cs43130: Fix incorrect frame delay configuration
ASoC: cs43130: Fix the position of const qualifier
f2fs: explicitly null-terminate the xattr list
LTS: Update to 4.14.337
ipv6: remove max_size check inline with ipv4
ipv6: make ip6_rt_gc_expire an atomic_t
net/dst: use a smaller percpu_counter batch for dst entries accounting
net: add a route cache full diagnostic message
netfilter: nf_tables: Reject tables of unsupported family
fuse: nlookup missing decrement in fuse_direntplus_link
mm: fix unmap_mapping_range high bits shift bug
mm/memory-failure: check the mapcount of the precise page
bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
asix: Add check for usbnet_get_endpoints
net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
net/qla3xxx: switch from 'pci_' to 'dma_' API
LTS: create metadata for 4.14.y
Conflicts:
drivers/android/binder_alloc.c
drivers/infiniband/ulp/srpt/ib_srpt.c
fs/aio.c
fs/f2fs/namei.c
include/linux/fs.h
kernel/power/swap.c
mm/memory-failure.c
Change-Id: I559d04dc6e27861ffd63ac8ae8ae9db8ff498e24
|
||
|
|
7474abe2c0 |
bpf: Add map and need_defer parameters to .map_fd_put_ptr()
[ Upstream commit 20c20bd11a0702ce4dc9300c3da58acf551d9725 ] map is the pointer of outer map, and need_defer needs some explanation. need_defer tells the implementation to defer the reference release of the passed element and ensure that the element is still alive before the bpf program, which may manipulate it, exits. The following three cases will invoke map_fd_put_ptr() and different need_defer values will be passed to these callers: 1) release the reference of the old element in the map during map update or map deletion. The release must be deferred, otherwise the bpf program may incur use-after-free problem, so need_defer needs to be true. 2) release the reference of the to-be-added element in the error path of map update. The to-be-added element is not visible to any bpf program, so it is OK to pass false for need_defer parameter. 3) release the references of all elements in the map during map release. Any bpf program which has access to the map must have been exited and released, so need_defer=false will be OK. These two parameters will be used by the following patches to fix the potential use-after-free problem for map-in-map. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20231204140425.1480317-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 5aa1e7d3f6d0db96c7139677d9e898bbbd6a7dcf) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> |
||
|
|
f9cf23e1ff |
Merge 4.14.79 into android-4.14-p
Changes in 4.14.79 xfrm: Validate address prefix lengths in the xfrm selector. xfrm6: call kfree_skb when skb is toobig xfrm: reset transport header back to network header after all input transforms ahave been applied xfrm: reset crypto_done when iterating over multiple input xfrms mac80211: Always report TX status cfg80211: reg: Init wiphy_idx in regulatory_hint_core() mac80211: fix pending queue hang due to TX_DROP cfg80211: Address some corner cases in scan result channel updating mac80211: TDLS: fix skb queue/priority assignment mac80211: fix TX status reporting for ieee80211s xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry. ARM: 8799/1: mm: fix pci_ioremap_io() offset check xfrm: validate template mode netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev arm64: hugetlb: Fix handling of young ptes ARM: dts: BCM63xx: Fix incorrect interrupt specifiers net: macb: Clean 64b dma addresses if they are not detected soc: fsl: qbman: qman: avoid allocating from non existing gen_pool soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift() nl80211: Fix possible Spectre-v1 for NL80211_TXRATE_HT mac80211_hwsim: do not omit multicast announce of first added radio Bluetooth: SMP: fix crash in unpairing pxa168fb: prepare the clock qed: Avoid implicit enum conversion in qed_set_tunn_cls_info qed: Fix mask parameter in qed_vf_prep_tunn_req_tlv qed: Avoid implicit enum conversion in qed_roce_mode_to_flavor qed: Avoid constant logical operation warning in qed_vf_pf_acquire qed: Avoid implicit enum conversion in qed_iwarp_parse_rx_pkt nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds asix: Check for supported Wake-on-LAN modes ax88179_178a: Check for supported Wake-on-LAN modes lan78xx: Check for supported Wake-on-LAN modes sr9800: Check for supported Wake-on-LAN modes r8152: Check for supported Wake-on-LAN Modes smsc75xx: Check for Wake-on-LAN modes smsc95xx: Check for Wake-on-LAN modes cfg80211: fix use-after-free in reg_process_hint() perf/core: Fix perf_pmu_unregister() locking perf/ring_buffer: Prevent concurent ring buffer access perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events net: fec: fix rare tx timeout declance: Fix continuation with the adapter identification message net: qualcomm: rmnet: Skip processing loopback packets locking/ww_mutex: Fix runtime warning in the WW mutex selftest be2net: don't flip hw_features when VXLANs are added/deleted net: cxgb3_main: fix a missing-check bug yam: fix a missing-check bug ocfs2: fix crash in ocfs2_duplicate_clusters_by_page() iwlwifi: mvm: check for short GI only for OFDM iwlwifi: dbg: allow wrt collection before ALIVE iwlwifi: fix the ALIVE notification layout tools/testing/nvdimm: unit test clear-error commands usbip: vhci_hcd: update 'status' file header and format scsi: aacraid: address UBSAN warning regression IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush IB/rxe: put the pool on allocation failure s390/qeth: fix error handling in adapter command callbacks net/mlx5: Fix mlx5_get_vector_affinity function powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=n dm integrity: fail early if required HMAC key is not available net: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b net: phy: Add general dummy stubs for MMD register access net/mlx5e: Refine ets validation function scsi: qla2xxx: Avoid double completion of abort command kbuild: set no-integrated-as before incl. arch Makefile IB/mlx5: Avoid passing an invalid QP type to firmware ARM: tegra: Fix ULPI regression on Tegra20 l2tp: remove configurable payload offset cifs: Use ULL suffix for 64-bit constant test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches KVM: x86: Update the exit_qualification access bits while walking an address sparc64: Fix regression in pmdp_invalidate(). tpm: move the delay_msec increment after sleep in tpm_transmit() bpf: sockmap, map_release does not hold refcnt for pinned maps tpm: tpm_crb: relinquish locality on error path. xen-netfront: Update features after registering netdev xen-netfront: Fix mismatched rtnl_unlock IB/usnic: Update with bug fixes from core code mmc: dw_mmc-rockchip: correct property names in debug MIPS: Workaround GCC __builtin_unreachable reordering bug lan78xx: Don't reset the interface on open enic: do not overwrite error code iio: buffer: fix the function signature to match implementation selftests/powerpc: Add ptrace hw breakpoint test scsi: ibmvfc: Avoid unnecessary port relogin scsi: sd: Remember that READ CAPACITY(16) succeeded btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf net: phy: phylink: Don't release NULL GPIO x86/paravirt: Fix some warning messages net: stmmac: mark PM functions as __maybe_unused kconfig: fix the rule of mainmenu_stmt symbol libertas: call into generic suspend code before turning off power perf tests: Fix indexing when invoking subtests compiler.h: Allow arch-specific asm/compiler.h ARM: dts: imx53-qsb: disable 1.2GHz OPP perf python: Use -Wno-redundant-decls to build with PYTHON=python3 rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window() rxrpc: Only take the rwind and mtu values from latest ACK rxrpc: Fix connection-level abort handling net: ena: fix warning in rmmod caused by double iounmap net: ena: fix NULL dereference due to untimely napi initialization selftests: rtnetlink.sh explicitly requires bash. fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters() sch_netem: restore skb->dev after dequeuing from the rbtree mtd: spi-nor: Add support for is25wp series chips kvm: x86: fix WARN due to uninitialized guest FPU state ARM: dts: r8a7790: Correct critical CPU temperature media: uvcvideo: Fix driver reference counting ALSA: usx2y: Fix invalid stream URBs Revert "netfilter: ipv6: nf_defrag: drop skb dst before queueing" perf tools: Disable parallelism for 'make clean' drm/i915/gvt: fix memory leak of a cmd_entry struct on error exit path bridge: do not add port to router list when receives query with source 0.0.0.0 net: bridge: remove ipv6 zero address check in mcast queries ipv6: mcast: fix a use-after-free in inet6_mc_check ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called llc: set SOCK_RCU_FREE in llc_sap_add_socket() net: fec: don't dump RX FIFO register when not available net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs net: sched: gred: pass the right attribute to gred_change_table_def() net: socket: fix a missing-check bug net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules net: udp: fix handling of CHECKSUM_COMPLETE packets r8169: fix NAPI handling under high load sctp: fix race on sctp_id2asoc udp6: fix encap return code for resubmitting vhost: Fix Spectre V1 vulnerability virtio_net: avoid using netif_tx_disable() for serializing tx routine ethtool: fix a privilege escalation bug bonding: fix length of actor system ip6_tunnel: Fix encapsulation layout openvswitch: Fix push/pop ethernet validation net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type net: sched: Fix for duplicate class dump net: drop skb on failure in ip_check_defrag() net: fix pskb_trim_rcsum_slow() with odd trim offset net/mlx5e: fix csum adjustments caused by RXFCS rtnetlink: Disallow FDB configuration for non-Ethernet device net: ipmr: fix unresolved entry dumps net: bcmgenet: Poll internal PHY for GENETv5 net/sched: cls_api: add missing validation of netlink attributes net/mlx5: Fix build break when CONFIG_SMP=n Linux 4.14.79 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
3c0cff34e9 |
bpf: sockmap, map_release does not hold refcnt for pinned maps
[ Upstream commit ba6b8de423f8d0dee48d6030288ed81c03ddf9f0 ]
Relying on map_release hook to decrement the reference counts when a
map is removed only works if the map is not being pinned. In the
pinned case the ref is decremented immediately and the BPF programs
released. After this BPF programs may not be in-use which is not
what the user would expect.
This patch moves the release logic into bpf_map_put_uref() and brings
sockmap in-line with how a similar case is handled in prog array maps.
Fixes: 3d9e952697de ("bpf: sockmap, fix leaking maps with attached but not detached progs")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
||
|
|
4576e0eca9 |
Merge 4.14.26 into android-4.14
Changes in 4.14.26 bpf: fix mlock precharge on arraymaps bpf: fix memory leak in lpm_trie map_free callback function bpf: fix rcu lockdep warning for lpm_trie map_free callback bpf, x64: implement retpoline for tail call bpf, arm64: fix out of bounds access in tail call bpf: add schedule points in percpu arrays management bpf: allow xadd only on aligned memory bpf, ppc64: fix out of bounds access in tail call KVM: x86: fix backward migration with async_PF Linux 4.14.26 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
e1760b3563 |
bpf: add schedule points in percpu arrays management
[ upstream commit 32fff239de37ef226d5b66329dd133f64d63b22d ]
syszbot managed to trigger RCU detected stalls in
bpf_array_free_percpu()
It takes time to allocate a huge percpu map, but even more time to free
it.
Since we run in process context, use cond_resched() to yield cpu if
needed.
Fixes:
|
||
|
|
d9fd73c60b |
bpf: fix mlock precharge on arraymaps
[ upstream commit 9c2d63b843a5c8a8d0559cc067b5398aa5ec3ffc ] syzkaller recently triggered OOM during percpu map allocation; while there is work in progress by Dennis Zhou to add __GFP_NORETRY semantics for percpu allocator under pressure, there seems also a missing bpf_map_precharge_memlock() check in array map allocation. Given today the actual bpf_map_charge_memlock() happens after the find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock() is there to bail out early before we go and do the map setup work when we find that we hit the limits anyway. Therefore add this for array map as well. Fixes: |
||
|
|
9b68347c35 |
Merge 4.14.14 into android-4.14
Changes in 4.14.14 dm bufio: fix shrinker scans when (nr_to_scan < retain_target) KVM: Fix stack-out-of-bounds read in write_mmio can: vxcan: improve handling of missing peer name attribute can: gs_usb: fix return value of the "set_bittiming" callback IB/srpt: Disable RDMA access by the initiator IB/srpt: Fix ACL lookup during login MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task MIPS: Factor out NT_PRFPREG regset access helpers MIPS: Guard against any partial write attempt with PTRACE_SETREGSET MIPS: Consistently handle buffer counter with PTRACE_SETREGSET MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC kvm: vmx: Scrub hardware GPRs at VM-exit platform/x86: wmi: Call acpi_wmi_init() later iw_cxgb4: only call the cq comp_handler when the cq is armed iw_cxgb4: atomically flush the qp iw_cxgb4: only clear the ARMED bit if a notification is needed iw_cxgb4: reflect the original WR opcode in drain cqes iw_cxgb4: when flushing, complete all wrs in a chain x86/acpi: Handle SCI interrupts above legacy space gracefully ALSA: pcm: Remove incorrect snd_BUG_ON() usages ALSA: pcm: Workaround for weird PulseAudio behavior on rewind error ALSA: pcm: Add missing error checks in OSS emulation plugin builder ALSA: pcm: Abort properly at pending signal in OSS read/write loops ALSA: pcm: Allow aborting mutex lock at OSS read/write loops ALSA: aloop: Release cable upon open error path ALSA: aloop: Fix inconsistent format due to incomplete rule ALSA: aloop: Fix racy hw constraints adjustment x86/acpi: Reduce code duplication in mp_override_legacy_irq() 8021q: fix a memory leak for VLAN 0 device ip6_tunnel: disable dst caching if tunnel is dual-stack net: core: fix module type in sock_diag_bind phylink: ensure we report link down when LOS asserted RDS: Heap OOB write in rds_message_alloc_sgs() RDS: null pointer dereference in rds_atomic_free_op net: fec: restore dev_id in the cases of probe error net: fec: defer probe if regulator is not ready net: fec: free/restore resource in related probe error pathes sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled sctp: fix the handling of ICMP Frag Needed for too small MTUs sh_eth: fix TSU resource handling net: stmmac: enable EEE in MII, GMII or RGMII only sh_eth: fix SH7757 GEther initialization ipv6: fix possible mem leaks in ipv6_make_skb() ethtool: do not print warning for applications using legacy API mlxsw: spectrum_router: Fix NULL pointer deref net/sched: Fix update of lastuse in act modules implementing stats_update ipv6: sr: fix TLVs not being copied using setsockopt mlxsw: spectrum: Relax sanity checks during enslavement sfp: fix sfp-bus oops when removing socket/upstream membarrier: Disable preemption when calling smp_call_function_many() crypto: algapi - fix NULL dereference in crypto_remove_spawns() mmc: renesas_sdhi: Add MODULE_LICENSE rbd: reacquire lock should update lock owner client id rbd: set max_segments to USHRT_MAX iwlwifi: pcie: fix DMA memory mapping / unmapping x86/microcode/intel: Extend BDW late-loading with a revision check KVM: x86: Add memory barrier on vmcs field lookup KVM: PPC: Book3S PR: Fix WIMG handling under pHyp KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt KVM: PPC: Book3S HV: Fix use after free in case of multiple resize requests KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt() drm/vmwgfx: Don't cache framebuffer maps drm/vmwgfx: Potential off by one in vmw_view_add() drm/i915/gvt: Clear the shadow page table entry after post-sync drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake. drm/i915: Move init_clock_gating() back to where it was drm/i915: Fix init_clock_gating for resume bpf: prevent out-of-bounds speculation bpf, array: fix overflow in max_entries and undefined behavior in index_mask bpf: arsh is not supported in 32 bit alu thus reject it USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ USB: serial: cp210x: add new device ID ELV ALC 8xxx usb: misc: usb3503: make sure reset is low for at least 100us USB: fix usbmon BUG trigger USB: UDC core: fix double-free in usb_add_gadget_udc_release usbip: remove kernel addresses from usb device and urb debug msgs usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl Bluetooth: Prevent stack info leak from the EFS element. uas: ignore UAS for Norelsys NS1068(X) chips mux: core: fix double get_device() kdump: write correct address of mem_section into vmcoreinfo apparmor: fix ptrace label match when matching stacked labels e1000e: Fix e1000_check_for_copper_link_ich8lan return value. x86/pti: Unbreak EFI old_memmap x86/Documentation: Add PTI description x86/cpufeatures: Add X86_BUG_SPECTRE_V[12] sysfs/cpu: Add vulnerability folder x86/cpu: Implement CPU vulnerabilites sysfs functions x86/tboot: Unbreak tboot with PTI enabled x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*() x86/cpu/AMD: Make LFENCE a serializing instruction x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC sysfs/cpu: Fix typos in vulnerability documentation x86/alternatives: Fix optimize_nops() checking x86/pti: Make unpoison of pgd for trusted boot work for real objtool: Detect jumps to retpoline thunks objtool: Allow alternatives to be ignored x86/retpoline: Add initial retpoline support x86/spectre: Add boot time option to select Spectre v2 mitigation x86/retpoline/crypto: Convert crypto assembler indirect jumps x86/retpoline/entry: Convert entry assembler indirect jumps x86/retpoline/ftrace: Convert ftrace assembler indirect jumps x86/retpoline/hyperv: Convert assembler indirect jumps x86/retpoline/xen: Convert Xen hypercall indirect jumps x86/retpoline/checksum32: Convert assembler indirect jumps x86/retpoline/irq32: Convert assembler indirect jumps x86/retpoline: Fill return stack buffer on vmexit selftests/x86: Add test_vsyscall x86/pti: Fix !PCID and sanitize defines security/Kconfig: Correct the Documentation reference for PTI x86,perf: Disable intel_bts when PTI x86/retpoline: Remove compile time warning Linux 4.14.14 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
|
67c05d9414 |
bpf, array: fix overflow in max_entries and undefined behavior in index_mask
commit bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1 upstream.
syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc98
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.
However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.
Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.
This fixes all the issues triggered by syzkaller's reproducers.
Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
a5dbaf8768 |
bpf: prevent out-of-bounds speculation
commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.
Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.
To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.
Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.
If prog_array map is created by unpriv user replace
bpf_tail_call(ctx, map, index);
with
if (index >= max_entries) {
index &= map->index_mask;
bpf_tail_call(ctx, map, index);
}
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.
Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.
That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.
v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
|
cace572e16 |
BACKPORT: bpf: Add file mode configuration into bpf maps
Introduce the map read/write flags to the eBPF syscalls that returns the map fd. The flags is used to set up the file mode when construct a new file descriptor for bpf maps. To not break the backward capability, the f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise it should be O_RDONLY or O_WRONLY. When the userspace want to modify or read the map content, it will check the file mode to see if it is allowed to make the change. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> Bug: 30950746 Change-Id: Icfad20f1abb77f91068d244fb0d87fa40824dd1b (cherry picked from commit 6e71b04a82248ccf13a94b85cbc674a9fefe53f5) Signed-off-by: Amit Pundir <amit.pundir@linaro.org> |
||
|
|
bc6d5031b4 |
bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu allocator. Given we support __GFP_NOWARN now, lets just let the allocation request fail naturally instead. The two call sites from BPF mistakenly assumed __GFP_NOWARN would work, so no changes needed to their actual __alloc_percpu_gfp() calls which use the flag already. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
7b0c2a0508 |
bpf: inline map in map lookup functions for array and htab
Avoid two successive functions calls for the map in map lookup, first is the bpf_map_lookup_elem() helper call, and second the callback via map->ops->map_lookup_elem() to get to the map in map implementation. Implementation inlines array and htab flavor for map in map lookups. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
96eabe7a40 |
bpf: Allow selecting numa node during map creation
The current map creation API does not allow to provide the numa-node
preference. The memory usually comes from where the map-creation-process
is running. The performance is not ideal if the bpf_prog is known to
always run in a numa node different from the map-creation-process.
One of the use case is sharding on CPU to different LRU maps (i.e.
an array of LRU maps). Here is the test result of map_perf_test on
the INNER_LRU_HASH_PREALLOC test if we force the lru map used by
CPU0 to be allocated from a remote numa node:
[ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ]
># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec
4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec
3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec
6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec
2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec
1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec
7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec
0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec #<<<
After specifying numa node:
># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec
3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec
1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec
6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec
2:inner_lru_hash_map_perf pre-alloc
|
||
|
|
14dc6f04f4 |
bpf: Add syscall lookup support for fd array and htab
This patch allows userspace to do BPF_MAP_LOOKUP_ELEM on BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_ARRAY_OF_MAPS and BPF_MAP_TYPE_HASH_OF_MAPS. The lookup returns a prog-id or map-id to the userspace. The userspace can then use the BPF_PROG_GET_FD_BY_ID or BPF_MAP_GET_FD_BY_ID to get a fd. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
f91840a32d |
perf, bpf: Add BPF support to all perf_event types
Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all perf_event types, including HW_CACHE, RAW, and dynamic pmu events. Only tracepoint/kprobe events are treated differently which require BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly. Also add support for reading all event counters using bpf_perf_event_read() helper. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
a316338cb7 |
bpf: fix wrong exposure of map_flags into fdinfo for lpm
trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.
Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.
Fixes:
|
||
|
|
8fe4592438 |
bpf: map_get_next_key to return first key on NULL
When iterating through a map, we need to find a key that does not exist in the map so map_get_next_key will give us the first key of the map. This often requires a lot of guessing in production systems. This patch makes map_get_next_key return the first key when the key pointer in the parameter is NULL. Signed-off-by: Teng Qin <qinteng@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
40077e0cf6 |
bpf: remove struct bpf_map_type_list
There's no need to have struct bpf_map_type_list since it just contains a list_head, the type, and the ops pointer. Since the types are densely packed and not actually dynamically registered, it's much easier and smaller to have an array of type->ops pointer. Also initialize this array statically to remove code needed to initialize it. In order to save duplicating the list, move it to the types header file added by the previous patch and include it in the same fashion. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
56f668dfe0 |
bpf: Add array of maps support
This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
fad73a1a35 |
bpf: Fix and simplifications on inline map lookup
Fix in verifier:
For the same bpf_map_lookup_elem() instruction (i.e. "call 1"),
a broken case is "a different type of map could be used for the
same lookup instruction". For example, an array in one case and a
hashmap in another. We have to resort to the old dynamic call behavior
in this case. The fix is to check for collision on insn_aux->map_ptr.
If there is collision, don't inline the map lookup.
Please see the "do_reg_lookup()" in test_map_in_map_kern.c in the later
patch for how-to trigger the above case.
Simplifications on array_map_gen_lookup():
1. Calculate elem_size from map->value_size. It removes the
need for 'struct bpf_array' which makes the later map-in-map
implementation easier.
2. Remove the 'elem_size == 1' test
Fixes:
|
||
|
|
81ed18ab30 |
bpf: add helper inlining infra and optimize map_array lookup
Optimize bpf_call -> bpf_map_lookup_elem() -> array_map_lookup_elem() into a sequence of bpf instructions. When JIT is on the sequence of bpf instructions is the sequence of native cpu instructions with significantly faster performance than indirect call and two function's prologue/epilogue. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
c78f8bdfa1 |
bpf: mark all registered map/prog types as __ro_after_init
All map types and prog types are registered to the BPF core through
bpf_register_map_type() and bpf_register_prog_type() during init and
remain unchanged thereafter. As by design we don't (and never will)
have any pluggable code that can register to that at any later point
in time, lets mark all the existing bpf_{map,prog}_type_list objects
in the tree as __ro_after_init, so they can be moved to read-only
section from then onwards.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
||
|
|
d407bd25a2 |
bpf: don't trigger OOM killer under pressure with map alloc
This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(), that are to be used for map allocations. Using kmalloc() for very large allocations can cause excessive work within the page allocator, so i) fall back earlier to vmalloc() when the attempt is considered costly anyway, and even more importantly ii) don't trigger OOM killer with any of the allocators. Since this is based on a user space request, for example, when creating maps with element pre-allocation, we really want such requests to fail instead of killing other user space processes. Also, don't spam the kernel log with warnings should any of the allocations fail under pressure. Given that, we can make backend selection in bpf_map_area_alloc() generic, and convert all maps over to use this API for spots with potentially large allocation requests. Note, replacing the one kmalloc_array() is fine as overflow checks happen earlier in htab_map_alloc(), since it must also protect the multiplication for vmalloc() should kmalloc_array() fail. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
7984c27c2c |
bpf: do not use KMALLOC_SHIFT_MAX
Commit
|
||
|
|
60d20f9195 |
bpf: Add bpf_current_task_under_cgroup helper
This adds a bpf helper that's similar to the skb_in_cgroup helper to check whether the probe is currently executing in the context of a specific subset of the cgroupsv2 hierarchy. It does this based on membership test for a cgroup arraymap. It is invalid to call this in an interrupt, and it'll return an error. The helper is primarily to be used in debugging activities for containers, where you may have multiple programs running in a given top-level "container". Signed-off-by: Sargun Dhillon <sargun@sargun.me> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
858d68f102 |
bpf: bpf_event_entry_gen's alloc needs to be in atomic context
Should have been obvious, only called from bpf() syscall via map_update_elem()
that calls bpf_fd_array_map_update_elem() under RCU read lock and thus this
must also be in GFP_ATOMIC, of course.
Fixes:
|