71 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
5f397569e0 UPSTREAM: Revert "bpf: Add map and need_defer parameters to .map_fd_put_ptr()"
This reverts commit eb6f68ec92ab60b0540ebf64fe851e99d846e086 which is
commit 20c20bd11a0702ce4dc9300c3da58acf551d9725 upstream.

It breaks the Android kernel abi and can be brought back in the future
in an abi-safe way if it is really needed.

Bug: 161946584
Change-Id: I4611eed3677738ab29469733e2b4f6734ef3d605
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2025-10-02 18:29:02 -07:00
Thomas Gleixner
543ff330a5 BACKPORT: treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of version 2 of the gnu general public license as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 64 file(s).

Change-Id: Ic7cca08bbba3c38e0d53d3374c43ee8bf1e24172
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-02 18:28:21 -07:00
Roman Gushchin
734cff0f05 UPSTREAM: bpf: move memory size checks to bpf_map_charge_init()
Most bpf map types doing similar checks and bytes to pages
conversion during memory allocation and charging.

Let's unify these checks by moving them into bpf_map_charge_init().

Change-Id: I55ceded2303102feba9e485042e8f5169f490609
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 18:28:21 -07:00
Roman Gushchin
f2d4363b7c UPSTREAM: bpf: rework memlock-based memory accounting for maps
In order to unify the existing memlock charging code with the
memcg-based memory accounting, which will be added later, let's
rework the current scheme.

Currently the following design is used:
  1) .alloc() callback optionally checks if the allocation will likely
     succeed using bpf_map_precharge_memlock()
  2) .alloc() performs actual allocations
  3) .alloc() callback calculates map cost and sets map.memory.pages
  4) map_create() calls bpf_map_init_memlock() which sets map.memory.user
     and performs actual charging; in case of failure the map is
     destroyed
  <map is in use>
  1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which
     performs uncharge and releases the user
  2) .map_free() callback releases the memory

The scheme can be simplified and made more robust:
  1) .alloc() calculates map cost and calls bpf_map_charge_init()
  2) bpf_map_charge_init() sets map.memory.user and performs actual
    charge
  3) .alloc() performs actual allocations
  <map is in use>
  1) .map_free() callback releases the memory
  2) bpf_map_charge_finish() performs uncharge and releases the user

The new scheme also allows to reuse bpf_map_charge_init()/finish()
functions for memcg-based accounting. Because charges are performed
before actual allocations and uncharges after freeing the memory,
no bogus memory pressure can be created.

In cases when the map structure is not available (e.g. it's not
created yet, or is already destroyed), on-stack bpf_map_memory
structure is used. The charge can be transferred with the
bpf_map_charge_move() function.

Change-Id: I299bfa9d3e74f366861b6de3bf17951a1374824b
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 18:28:20 -07:00
Roman Gushchin
931193c1c1 UPSTREAM: bpf: group memory related fields in struct bpf_map_memory
Group "user" and "pages" fields of bpf_map into the bpf_map_memory
structure. Later it can be extended with "memcg" and other related
information.

The main reason for a such change (beside cosmetics) is to pass
bpf_map_memory structure to charging functions before the actual
allocation of bpf_map.

Change-Id: I04e4edf805bfe4c26fce45f7166317fe00dd0dfa
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 18:28:20 -07:00
Daniel Borkmann
7bd4799ebb UPSTREAM: bpf: allow for key-less BTF in array map
Given we'll be reusing BPF array maps for global data/bss/rodata
sections, we need a way to associate BTF DataSec type as its map
value type. In usual cases we have this ugly BPF_ANNOTATE_KV_PAIR()
macro hack e.g. via 38d5d3b3d5db ("bpf: Introduce BPF_ANNOTATE_KV_PAIR")
to get initial map to type association going. While more use cases
for it are discouraged, this also won't work for global data since
the use of array map is a BPF loader detail and therefore unknown
at compilation time. For array maps with just a single entry we make
an exception in terms of BTF in that key type is declared optional
if value type is of DataSec type. The latter LLVM is guaranteed to
emit and it also aligns with how we regard global data maps as just
a plain buffer area reusing existing map facilities for allowing
things like introspection with existing tools.

Change-Id: I6fd7e20b453529e07aa1c77beacff4e62c7500bd
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 18:28:10 -07:00
Daniel Borkmann
df4c99d80b UPSTREAM: bpf: add program side {rd, wr}only support for maps
This work adds two new map creation flags BPF_F_RDONLY_PROG
and BPF_F_WRONLY_PROG in order to allow for read-only or
write-only BPF maps from a BPF program side.

Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
applies to system call side, meaning the BPF program has full
read/write access to the map as usual while bpf(2) calls with
map fd can either only read or write into the map depending
on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
for the exact opposite such that verifier is going to reject
program loads if write into a read-only map or a read into a
write-only map is detected. For read-only map case also some
helpers are forbidden for programs that would alter the map
state such as map deletion, update, etc. As opposed to the two
BPF_F_RDONLY / BPF_F_WRONLY flags, BPF_F_RDONLY_PROG as well
as BPF_F_WRONLY_PROG really do correspond to the map lifetime.

We've enabled this generic map extension to various non-special
maps holding normal user data: array, hash, lru, lpm, local
storage, queue and stack. Further generic map types could be
followed up in future depending on use-case. Main use case
here is to forbid writes into .rodata map values from verifier
side.

Change-Id: Iad96790cec92137902fe3ad12f53f1a94d58bc61
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 18:28:09 -07:00
Daniel Borkmann
4e82568bf0 BACKPORT: bpf: implement lookup-free direct value access for maps
This generic extension to BPF maps allows for directly loading
an address residing inside a BPF map value as a single BPF
ldimm64 instruction!

The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
is a special src_reg flag for ldimm64 instruction that indicates
that inside the first part of the double insns's imm field is a
file descriptor which the verifier then replaces as a full 64bit
address of the map into both imm parts. For the newly added
BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following:
the first part of the double insns's imm field is again a file
descriptor corresponding to the map, and the second part of the
imm field is an offset into the value. The verifier will then
replace both imm parts with an address that points into the BPF
map value at the given value offset for maps that support this
operation. Currently supported is array map with single entry.
It is possible to support more than just single map element by
reusing both 16bit off fields of the insns as a map index, so
full array map lookup could be expressed that way. It hasn't
been implemented here due to lack of concrete use case, but
could easily be done so in future in a compatible way, since
both off fields right now have to be 0 and would correctly
denote a map index 0.

The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with
BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of
map pointer versus load of map's value at offset 0, and changing
BPF_PSEUDO_MAP_FD's encoding into off by one to differ between
regular map pointer and map value pointer would add unnecessary
complexity and increases barrier for debugability thus less
suitable. Using the second part of the imm field as an offset
into the value does /not/ come with limitations since maximum
possible value size is in u32 universe anyway.

This optimization allows for efficiently retrieving an address
to a map value memory area without having to issue a helper call
which needs to prepare registers according to calling convention,
etc, without needing the extra NULL test, and without having to
add the offset in an additional instruction to the value base
pointer. The verifier then treats the destination register as
PTR_TO_MAP_VALUE with constant reg->off from the user passed
offset from the second imm field, and guarantees that this is
within bounds of the map value. Any subsequent operations are
normally treated as typical map value handling without anything
extra needed from verification side.

The two map operations for direct value access have been added to
array map for now. In future other types could be supported as
well depending on the use case. The main use case for this commit
is to allow for BPF loader support for global variables that
reside in .data/.rodata/.bss sections such that we can directly
load the address of them with minimal additional infrastructure
required. Loader support has been added in subsequent commits for
libbpf library.

Change-Id: I51974f2fe227ba837b338b8b3ebb44c145583673
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 18:28:09 -07:00
Alexei Starovoitov
cd34c35941 UPSTREAM: bpf: introduce BPF_F_LOCK flag
Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands
and for map_update() helper function.
In all these cases take a lock of existing element (which was provided
in BTF description) before copying (in or out) the rest of map value.

Implementation details that are part of uapi:

Array:
The array map takes the element lock for lookup/update.

Hash:
hash map also takes the lock for lookup/update and tries to avoid the bucket lock.
If old element exists it takes the element lock and updates the element in place.
If element doesn't exist it allocates new one and inserts into hash table
while holding the bucket lock.
In rare case the hashmap has to take both the bucket lock and the element lock
to update old value in place.

Cgroup local storage:
It is similar to array. update in place and lookup are done with lock taken.

Change-Id: I76b13e23e1f6241c1f919a1c24650530f7705d9e
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-10-02 18:28:02 -07:00
Alexei Starovoitov
faa07c7e59 BACKPORT: bpf: introduce bpf_spin_lock
Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
bpf program serialize access to other variables.

Example:
struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
if (val) {
    bpf_spin_lock(&val->lock);
    val->cnt++;
    bpf_spin_unlock(&val->lock);
}

Restrictions and safety checks:
- bpf_spin_lock is only allowed inside HASH and ARRAY maps.
- BTF description of the map is mandatory for safety analysis.
- bpf program can take one bpf_spin_lock at a time, since two or more can
  cause dead locks.
- only one 'struct bpf_spin_lock' is allowed per map element.
  It drastically simplifies implementation yet allows bpf program to use
  any number of bpf_spin_locks.
- when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
- bpf program must bpf_spin_unlock() before return.
- bpf program can access 'struct bpf_spin_lock' only via
  bpf_spin_lock()/bpf_spin_unlock() helpers.
- load/store into 'struct bpf_spin_lock lock;' field is not allowed.
- to use bpf_spin_lock() helper the BTF description of map value must be
  a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
  Nested lock inside another struct is not allowed.
- syscall map_lookup doesn't copy bpf_spin_lock field to user space.
- syscall map_update and program map_update do not update bpf_spin_lock field.
- bpf_spin_lock cannot be on the stack or inside networking packet.
  bpf_spin_lock can only be inside HASH or ARRAY map value.
- bpf_spin_lock is available to root only and to all program types.
- bpf_spin_lock is not allowed in inner maps of map-in-map.
- ld_abs is not allowed inside spin_lock-ed region.
- tracing progs and socket filter progs cannot use bpf_spin_lock due to
  insufficient preemption checks

Implementation details:
- cgroup-bpf class of programs can nest with xdp/tc programs.
  Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
  Other solutions to avoid nested bpf_spin_lock are possible.
  Like making sure that all networking progs run with softirq disabled.
  spin_lock_irqsave is the simplest and doesn't add overhead to the
  programs that don't use it.
- arch_spinlock_t is used when its implemented as queued_spin_lock
- archs can force their own arch_spinlock_t
- on architectures where queued_spin_lock is not available and
  sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
- presence of bpf_spin_lock inside map value could have been indicated via
  extra flag during map_create, but specifying it via BTF is cleaner.
  It provides introspection for map key/value and reduces user mistakes.

Next steps:
- allow bpf_spin_lock in other map types (like cgroup local storage)
- introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
  to request kernel to grab bpf_spin_lock before rewriting the value.
  That will serialize access to map elements.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Change-Id: Id03322189a8f05c006a05479f7078b23c8c020ea
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-10-02 18:28:02 -07:00
Roman Gushchin
5d704c9b1a UPSTREAM: bpf: pass struct btf pointer to the map_check_btf() callback
If key_type or value_type are of non-trivial data types
(e.g. structure or typedef), it's not possible to check them without
the additional information, which can't be obtained without a pointer
to the btf structure.

So, let's pass btf pointer to the map_check_btf() callbacks.

Change-Id: I95716060b450288d4ffcbe231d1cf5fdb530e292
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 18:27:55 -07:00
Prashant Bhole
f13ca2b0ba UPSTREAM: bpf: return EOPNOTSUPP when map lookup isn't supported
Return ERR_PTR(-EOPNOTSUPP) from map_lookup_elem() methods of below
map types:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH

Change-Id: I13937c36055b419f4446d8bfa06f139c757480c9
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 18:27:45 -07:00
Yonghong Song
5ae0a28ff6 UPSTREAM: bpf: add bpffs pretty print for program array map
Added bpffs pretty print for program array map. For a particular
array index, if the program array points to a valid program,
the "<index>: <prog_id>" will be printed out like
   0: 6
which means bpf program with id "6" is installed at index "0".

Change-Id: Ibfeac1777df6dc8742debe574ba259d212e7ecea
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-10-02 18:27:42 -07:00
Yonghong Song
0ef3db4770 UPSTREAM: bpf: add bpffs pretty print for percpu arraymap/hash/lru_hash
Added bpffs pretty print for percpu arraymap, percpu hashmap
and percpu lru hashmap.

For each map <key, value> pair, the format is:
   <key_value>: {
	cpu0: <value_on_cpu0>
	cpu1: <value_on_cpu1>
	...
	cpun: <value_on_cpun>
   }

For example, on my VM, there are 4 cpus, and
for test_btf test in the next patch:
   cat /sys/fs/bpf/pprint_test_percpu_hash

You may get:
   ...
   43602: {
	cpu0: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
	cpu1: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
	cpu2: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
	cpu3: {43602,0,-43602,0x3,0xaa52,0x3,{43602|[82,170,0,0,0,0,0,0]},ENUM_TWO}
   }
   72847: {
	cpu0: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
	cpu1: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
	cpu2: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
	cpu3: {72847,0,-72847,0x3,0x11c8f,0x3,{72847|[143,28,1,0,0,0,0,0]},ENUM_THREE}
   }
   ...

Change-Id: I286e7505765aa92ea9a8919ddecf8434a24fc187
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-10-02 18:27:42 -07:00
Tao Chen
60d2469ae3 UPSTREAM: bpf: Check percpu map value size first
[ Upstream commit 1d244784be6b01162b732a5a7d637dfc024c3203 ]

Percpu map is often used, but the map value size limit often ignored,
like issue: https://github.com/iovisor/bcc/issues/2519. Actually,
percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we
can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first,
like percpu map of local_storage. Maybe the error message seems clearer
compared with "cannot allocate memory".

Change-Id: Iadd0604341b9af7bbada45665f06ae1e529a7882
Signed-off-by: Jinke Han <jinkehan@didiglobal.com>
Signed-off-by: Tao Chen <chen.dylane@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240910144111.1464912-2-chen.dylane@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-02 18:27:40 -07:00
Daniel Borkmann
2819daae82 UPSTREAM: bpf: decouple btf from seq bpf fs dump and enable more maps
Commit a26ca7c982cb ("bpf: btf: Add pretty print support to
the basic arraymap") and 699c86d6ec21 ("bpf: btf: add pretty
print for hash/lru_hash maps") enabled support for BTF and
dumping via BPF fs for array and hash/lru map. However, both
can be decoupled from each other such that regular BPF maps
can be supported for attaching BTF key/value information,
while not all maps necessarily need to dump via map_seq_show_elem()
callback.

The basic sanity check which is a prerequisite for all maps
is that key/value size has to match in any case, and some maps
can have extra checks via map_check_btf() callback, e.g.
probing certain types or indicating no support in general. With
that we can also enable retrieving BTF info for per-cpu map
types and lpm.

Change-Id: Ic26e072b39b443f3ffd9c1b53a1cd45a3ecea360
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
2025-10-02 18:27:18 -07:00
Martin KaFai Lau
a95060b1cd BACKPORT: bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY
This patch introduces a new map type BPF_MAP_TYPE_REUSEPORT_SOCKARRAY.

To unleash the full potential of a bpf prog, it is essential for the
userspace to be capable of directly setting up a bpf map which can then
be consumed by the bpf prog to make decision.  In this case, decide which
SO_REUSEPORT sk to serve the incoming request.

By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control
and visibility on where a SO_REUSEPORT sk should be located in a bpf map.
The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that
the bpf prog can directly select a sk from the bpf map.  That will
raise the programmability of the bpf prog attached to a reuseport
group (a group of sk serving the same IP:PORT).

For example, in UDP, the bpf prog can peek into the payload (e.g.
through the "data" pointer introduced in the later patch) to learn
the application level's connection information and then decide which sk
to pick from a bpf map.  The userspace can tightly couple the sk's location
in a bpf map with the application logic in generating the UDP payload's
connection information.  This connection info contact/API stays within the
userspace.

Also, when used with map-in-map, the userspace can switch the
old-server-process's inner map to a new-server-process's inner map
in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)".
The bpf prog will then direct incoming requests to the new process instead
of the old process.  The old process can finish draining the pending
requests (e.g. by "accept()") before closing the old-fds.  [Note that
deleting a fd from a bpf map does not necessary mean the fd is closed]

During map_update_elem(),
Only SO_REUSEPORT sk (i.e. which has already been added
to a reuse->socks[]) can be used.  That means a SO_REUSEPORT sk that is
"bind()" for UDP or "bind()+listen()" for TCP.  These conditions are
ensured in "reuseport_array_update_check()".

A SO_REUSEPORT sk can only be added once to a map (i.e. the
same sk cannot be added twice even to the same map).  SO_REUSEPORT
already allows another sk to be created for the same IP:PORT.
There is no need to re-create a similar usage in the BPF side.

When a SO_REUSEPORT is deleted from the "reuse->socks[]" (e.g. "close()"),
it will notify the bpf map to remove it from the map also.  It is
done through "bpf_sk_reuseport_detach()" and it will only be called
if >=1 of the "reuse->sock[]" has ever been added to a bpf map.

The map_update()/map_delete() has to be in-sync with the
"reuse->socks[]".  Hence, the same "reuseport_lock" used
by "reuse->socks[]" has to be used here also. Care has
been taken to ensure the lock is only acquired when the
adding sk passes some strict tests. and
freeing the map does not require the reuseport_lock.

The reuseport_array will also support lookup from the syscall
side.  It will return a sock_gen_cookie().  The sock_gen_cookie()
is on-demand (i.e. a sk's cookie is not generated until the very
first map_lookup_elem()).

The lookup cookie is 64bits but it goes against the logical userspace
expectation on 32bits sizeof(fd) (and as other fd based bpf maps do also).
It may catch user in surprise if we enforce value_size=8 while
userspace still pass a 32bits fd during update.  Supporting different
value_size between lookup and update seems unintuitive also.

We also need to consider what if other existing fd based maps want
to return 64bits value from syscall's lookup in the future.
Hence, reuseport_array supports both value_size 4 and 8, and
assuming user will usually use value_size=4.  The syscall's lookup
will return ENOSPC on value_size=4.  It will will only
return 64bits value from sock_gen_cookie() when user consciously
choose value_size=8 (as a signal that lookup is desired) which then
requires a 64bits value in both lookup and update.

Change-Id: Iacf628194cf89b00b45a75ced7d2dbddfb4c6a8a
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-10-02 18:27:16 -07:00
Martin KaFai Lau
42f28b95a6 BACKPORT: bpf: btf: Use exact btf value_size match in map_check_btf()
The current map_check_btf() in BPF_MAP_TYPE_ARRAY rejects
'> map->value_size' to ensure map_seq_show_elem() will not
access things beyond an array element.

Yonghong suggested that using '!=' is a more correct
check.  The 8 bytes round_up on value_size is stored
in array->elem_size.  Hence, using '!=' on map->value_size
is a proper check.

This patch also adds new tests to check the btf array
key type and value type.  Two of these new tests verify
the btf's value_size (the change in this patch).

It also fixes two existing tests that wrongly encoded
a btf's type size (pprint_test) and the value_type_id (in one
of the raw_tests[]).  However, that do not affect these two
BTF verification tests before or after this test changes.
These two tests mainly failed at array creation time after
this patch.

Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap")
Suggested-by: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Change-Id: I0ded0933cb7ba85373a700f746e9a2901db791e1
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-10-02 18:27:12 -07:00
Martin KaFai Lau
71df69a668 UPSTREAM: bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info
In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id"
could cause confusion because the "id" of "btf_id" means the BPF obj id
given to the BTF object while
"btf_key_id" and "btf_value_id" means the BTF type id within
that BTF object.

To make it clear, btf_key_id and btf_value_id are
renamed to btf_key_type_id and btf_value_type_id.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Change-Id: Ib10a5ee00041fb5b3ce1c80a407bb9277386c85b
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-10-02 18:02:34 -07:00
Martin KaFai Lau
cec9a42456 UPSTREAM: bpf: btf: Add pretty print support to the basic arraymap
This patch adds pretty print support to the basic arraymap.
Support for other bpf maps can be added later.

This patch adds new attrs to the BPF_MAP_CREATE command to allow
specifying the btf_fd, btf_key_id and btf_value_id.  The
BPF_MAP_CREATE can then associate the btf to the map if
the creating map supports BTF.

A BTF supported map needs to implement two new map ops,
map_seq_show_elem() and map_check_btf().  This patch has
implemented these new map ops for the basic arraymap.

It also adds file_operations, bpffs_map_fops, to the pinned
map such that the pinned map can be opened and read.
After that, the user has an intuitive way to do
"cat bpffs/pathto/a-pinned-map" instead of getting
an error.

bpffs_map_fops should not be extended further to support
other operations.  Other operations (e.g. write/key-lookup...)
should be realized by the userspace tools (e.g. bpftool) through
the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
Follow up patches will allow the userspace to obtain
the BTF from a map-fd.

Here is a sample output when reading a pinned arraymap
with the following map's value:

struct map_value {
	int count_a;
	int count_b;
};

cat /sys/fs/bpf/pinned_array_map:

0: {1,2}
1: {3,4}
2: {5,6}
...

Change-Id: I57605a80ce783f8545f7947d0a3445ccd81147e4
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-10-02 18:02:00 -07:00
Jakub Kicinski
ffcf8f77e1 BACKPORT: bpf: arraymap: use bpf_map_init_from_attr()
Arraymap was not converted to use bpf_map_init_from_attr()
to avoid merge conflicts with emergency fixes.  Do it now.

Change-Id: I6e385e0af32a1837b23d352c19824492f8c86764
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-10-02 18:01:45 -07:00
Jakub Kicinski
4a3928272b UPSTREAM: bpf: arraymap: move checks out of alloc function
Use the new callback to perform allocation checks for array maps.
The fd maps don't need a special allocation callback, they only
need a special check callback.

Change-Id: Idaed60c9b368f29ce7994a37d8bb381958ab0aeb
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2025-10-02 18:01:45 -07:00
Yonghong Song
6e5e3fdc13 BACKPORT: bpf: perf event change needed for subsequent bpf helpers
This patch does not impact existing functionalities.
It contains the changes in perf event area needed for
subsequent bpf_perf_event_read_value and
bpf_perf_prog_read_value helpers.

Change-Id: I3cb46eff8eb241ad744ab796bc3300e7560fe05d
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-10-02 18:01:25 -07:00
Michael Bestas
f5a1de2a76 Merge tag 'v4.14.340-openela' into android13-4.14-msmnile
This is the 4.14.340 OpenELA-Extended LTS stable release

* tag 'v4.14.340-openela':
  LTS: Update to 4.14.340
  fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
  KVM: arm64: vgic-its: Test for valid IRQ in its_sync_lpi_pending_table()
  PCI/MSI: Prevent MSI hardware interrupt number truncation
  s390: use the correct count for __iowrite64_copy()
  packet: move from strlcpy with unused retval to strscpy
  ipv6: sr: fix possible use-after-free and null-ptr-deref
  nouveau: fix function cast warnings
  scsi: jazz_esp: Only build if SCSI core is builtin
  RDMA/srpt: fix function pointer cast warnings
  RDMA/srpt: Support specifying the srpt_service_guid parameter
  IB/hfi1: Fix a memleak in init_credit_return
  usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
  l2tp: pass correct message length to ip6_append_data
  gtp: fix use-after-free and null-ptr-deref in gtp_genl_dump_pdp()
  dm-crypt: don't modify the data when using authenticated encryption
  mm: memcontrol: switch to rcu protection in drain_all_stock()
  s390/qeth: Fix potential loss of L3-IP@ in case of network issues
  virtio-blk: Ensure no requests in virtqueues before deleting vqs.
  firewire: core: send bus reset promptly on gap count error
  hwmon: (coretemp) Enlarge per package core count limit
  regulator: pwm-regulator: Add validity checks in continuous .get_voltage
  ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
  ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
  ahci: asm1166: correct count of reported ports
  fbdev: sis: Error out if pixclock equals zero
  fbdev: savage: Error out if pixclock equals zero
  wifi: mac80211: fix race condition on enabling fast-xmit
  wifi: cfg80211: fix missing interfaces when dumping
  dmaengine: shdma: increase size of 'dev_id'
  scsi: target: core: Add TMF to tmr_list handling
  sched/rt: Disallow writing invalid values to sched_rt_period_us
  sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset
  sched/rt: Fix sysctl_sched_rr_timeslice intial value
  nilfs2: replace WARN_ONs for invalid DAT metadata block requests
  memcg: add refcnt for pcpu stock to avoid UAF problem in drain_all_stock()
  net/sched: Retire dsmark qdisc
  net/sched: Retire ATM qdisc
  net/sched: Retire CBQ qdisc
  LTS: Update to 4.14.339
  netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()
  lsm: new security_file_ioctl_compat() hook
  nilfs2: fix potential bug in end_buffer_async_write
  sched/membarrier: reduce the ability to hammer on sys_membarrier
  Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"
  pmdomain: core: Move the unused cleanup to a _sync initcall
  irqchip/irq-brcmstb-l2: Add write memory barrier before exit
  nfp: use correct macro for LengthSelect in BAR config
  nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
  nilfs2: fix data corruption in dsync block recovery for small block sizes
  ALSA: hda/conexant: Add quirk for SWS JS201D
  x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
  staging: iio: ad5933: fix type mismatch regression
  ext4: fix double-free of blocks due to wrong extents moved_len
  xen-netback: properly sync TX responses
  nfc: nci: free rx_data_reassembly skb on NCI device cleanup
  firewire: core: correct documentation of fw_csr_string() kernel API
  scsi: Revert "scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock"
  usb: f_mass_storage: forbid async queue when shutdown happen
  USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT
  HID: wacom: Do not register input devices until after hid_hw_start
  HID: wacom: generic: Avoid reporting a serial of '0' to userspace
  mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
  tracing/trigger: Fix to return error if failed to alloc snapshot
  i40e: Fix waiting for queues of all VSIs to be disabled
  MIPS: Add 'memory' clobber to csum_ipv6_magic() inline assembler
  net: sysfs: Fix /sys/class/net/<iface> path for statistics
  Documentation: net-sysfs: describe missing statistics
  ASoC: rt5645: Fix deadlock in rt5645_jack_detect_work()
  spi: ppc4xx: Drop write-only variable
  btrfs: send: return EOPNOTSUPP on unknown flags
  vhost: use kzalloc() instead of kmalloc() followed by memset()
  Input: atkbd - skip ATKBD_CMD_SETLEDS when skipping ATKBD_CMD_GETID
  USB: serial: cp210x: add ID for IMST iM871A-USB
  USB: serial: option: add Fibocom FM101-GL variant
  USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e
  net/af_iucv: clean up a try_then_request_module()
  netfilter: nft_compat: restrict match/target protocol to u16
  netfilter: nft_compat: reject unused compat flag
  ppp_async: limit MRU to 64K
  tipc: Check the bearer type before calling tipc_udp_nl_bearer_add()
  rxrpc: Fix response to PING RESPONSE ACKs to a dead call
  inet: read sk->sk_family once in inet_recv_error()
  hwmon: (aspeed-pwm-tacho) mutex for tach reading
  atm: idt77252: fix a memleak in open_card_ubr0
  phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP
  dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV
  bonding: remove print in bond_verify_device_path
  HID: apple: Add 2021 magic keyboard FN key mapping
  HID: apple: Add support for the 2021 Magic Keyboard
  HID: apple: Swap the Fn and Left Control keys on Apple keyboards
  net: sysfs: Fix /sys/class/net/<iface> path
  af_unix: fix lockdep positive in sk_diag_dump_icons()
  net: ipv4: fix a memleak in ip_setup_cork
  net: Fix one possible memleak in ip_setup_cork
  netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
  llc: call sock_orphan() at release time
  ipv6: Ensure natural alignment of const ipv6 loopback and router addresses
  ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550()
  ixgbe: Refactor overtemp event handling
  ixgbe: Remove non-inclusive language
  net: remove unneeded break
  scsi: isci: Fix an error code problem in isci_io_request_build()
  wifi: cfg80211: fix RCU dereference in __cfg80211_bss_update
  drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'
  ceph: fix deadlock or deadcode of misusing dget()
  virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings
  libsubcmd: Fix memory leak in uniq()
  usb: hub: Replace hardcoded quirk value with BIT() macro
  PCI: Only override AMD USB controller if required
  mfd: ti_am335x_tscadc: Fix TI SoC dependencies
  um: net: Fix return type of uml_net_start_xmit()
  um: Don't use vfprintf() for os_info()
  um: Fix naming clash between UML and scheduler
  leds: trigger: panic: Don't register panic notifier if creating the trigger failed
  clk: mmp: pxa168: Fix memory leak in pxa168_clk_init()
  clk: hi3620: Fix memory leak in hi3620_mmc_clk_init()
  media: ddbridge: fix an error code problem in ddb_probe
  IB/ipoib: Fix mcast list locking
  drm/exynos: Call drm_atomic_helper_shutdown() at shutdown/unbind time
  ALSA: hda: Intel: add HDA_ARL PCI ID support
  ALSA: hda: Add Icelake PCI ID
  PCI: add INTEL_HDA_ARL to pci_ids.h
  media: stk1160: Fixed high volume of stk1160_dbg messages
  drm/mipi-dsi: Fix detach call without attach
  drm/framebuffer: Fix use of uninitialized variable
  drm/drm_file: fix use of uninitialized variable
  RDMA/IPoIB: Fix error code return in ipoib_mcast_join
  fast_dput(): handle underflows gracefully
  ASoC: doc: Fix undefined SND_SOC_DAPM_NOPM argument
  wifi: cfg80211: free beacon_ies when overridden from hidden BSS
  wifi: rtlwifi: rtl8723{be,ae}: using calculate_bit_shift()
  wifi: rtl8xxxu: Add additional USB IDs for RTL8192EU devices
  md: Whenassemble the array, consult the superblock of the freshest device
  ARM: dts: imx23/28: Fix the DMA controller node name
  ARM: dts: imx23-sansa: Use preferred i2c-gpios properties
  ARM: dts: imx27-apf27dev: Fix LED name
  ARM: dts: imx1: Fix sram node
  ARM: dts: imx27: Fix sram node
  ARM: dts: imx: Use flash@0,0 pattern
  ARM: dts: imx25/27-eukrea: Fix RTC node name
  ARM: dts: rockchip: fix rk3036 hdmi ports node
  scsi: libfc: Fix up timeout error in fc_fcp_rec_error()
  scsi: libfc: Don't schedule abort twice
  bpf: Add map and need_defer parameters to .map_fd_put_ptr()
  wifi: ath9k: Fix potential array-index-out-of-bounds read in ath9k_htc_txstatus()
  ARM: dts: imx7s: Fix nand-controller #size-cells
  ARM: dts: imx7s: Fix lcdif compatible
  bonding: return -ENOMEM instead of BUG in alb_upper_dev_walk
  PCI: Add no PM reset quirk for NVIDIA Spectrum devices
  scsi: lpfc: Fix possible file string name overflow when updating firmware
  ext4: unify the type of flexbg_size to unsigned int
  SUNRPC: Fix a suspicious RCU usage warning
  KVM: s390: fix setting of fpc register
  s390/ptrace: handle setting of fpc register correctly
  jfs: fix array-index-out-of-bounds in diNewExt
  rxrpc_find_service_conn_rcu: fix the usage of read_seqbegin_or_lock()
  pstore/ram: Fix crash when setting number of cpus to an odd number
  jfs: fix uaf in jfs_evict_inode
  jfs: fix array-index-out-of-bounds in dbAdjTree
  jfs: fix slab-out-of-bounds Read in dtSearch
  UBSAN: array-index-out-of-bounds in dtSplitRoot
  FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree
  ACPI: extlog: fix NULL pointer dereference check
  PNP: ACPI: fix fortify warning
  ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop
  audit: Send netlink ACK before setting connection in auditd_set
  powerpc/lib: Validate size for vector operations
  powerpc/mm: Fix build failures due to arch_reserved_kernel_pages()
  powerpc: Fix build error due to is_valid_bugaddr()
  powerpc/mm: Fix null-pointer dereference in pgtable_cache_add
  tick/sched: Preserve number of idle sleeps across CPU hotplug events
  mips: Call lose_fpu(0) before initializing fcr31 in mips_set_personality_nan
  drm/bridge: nxp-ptn3460: simplify some error checking
  drm/bridge: nxp-ptn3460: fix i2c_master_send() error checking
  drm: Don't unref the same fb many times by mistake due to deadlock handling
  gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-04
  netfilter: nf_tables: reject QUEUE/DROP verdict parameters
  btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args
  btrfs: don't warn if discard range is not aligned to sector
  net: fec: fix the unhandled context fault from smmu
  fjes: fix memleaks in fjes_hw_setup
  netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
  net/mlx5e: fix a double-free in arfs_create_groups
  net/mlx5: Use kfree(ft->g) in arfs_create_groups()
  netlink: fix potential sleeping issue in mqueue_flush_file
  tcp: Add memory barrier to tcp_push()
  net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv
  llc: Drop support for ETH_P_TR_802_2.
  llc: make llc_ui_sendmsg() more robust against bonding changes
  vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING
  drivers: core: fix kernel-doc markup for dev_err_probe()
  driver code: print symbolic error code
  Revert "driver core: Annotate dev_err_probe() with __must_check"
  driver core: Annotate dev_err_probe() with __must_check
  x86/CPU/AMD: Fix disabling XSAVES on AMD family 0x17 due to erratum
  powerpc: Use always instead of always-y in for crtsavres.o
  block: Remove special-casing of compound pages
  parisc/firmware: Fix F-extend for PDC addresses
  rpmsg: virtio: Free driver_override when rpmsg_remove()
  hwrng: core - Fix page fault dead lock on mmap-ed hwrng
  PM: hibernate: Enforce ordering during image compression/decompression
  crypto: api - Disallow identical driver names
  serial: sc16is7xx: add check for unsupported SPI modes during probe
  spi: introduce SPI_MODE_X_MASK macro
  driver core: add device probe log helper
  serial: sc16is7xx: set safe default SPI clock frequency
  units: add the HZ macros
  units: change from 'L' to 'UL'
  units: Add Watt units
  include/linux/units.h: add helpers for kelvin to/from Celsius conversion
  PCI: mediatek: Clear interrupt status before dispatching handler
  LTS: Update to 4.14.338
  crypto: scompress - initialize per-CPU variables on each CPU
  Revert "NFSD: Fix possible sleep during nfsd4_release_lockowner()"
  i2c: s3c24xx: fix transferring more than one message in polling mode
  i2c: s3c24xx: fix read transfers in polling mode
  kdb: Fix a potential buffer overflow in kdb_local()
  kdb: Censor attempts to set PROMPT without ENABLE_MEM_READ
  ipvs: avoid stat macros calls from preemptible context
  net: ravb: Fix dma_addr_t truncation in error case
  serial: imx: Correct clock error message in function probe()
  apparmor: avoid crash when parsed profile name is empty
  MIPS: Alchemy: Fix an out-of-bound access in db1550_dev_setup()
  MIPS: Alchemy: Fix an out-of-bound access in db1200_dev_setup()
  HID: wacom: Correct behavior when processing some confidence == false touches
  wifi: mwifiex: configure BSSID consistently when starting AP
  wifi: rtlwifi: Convert LNKCTL change to PCIe cap RMW accessors
  wifi: rtlwifi: Remove bogus and dangerous ASPM disable/enable code
  fbdev: flush deferred work in fb_deferred_io_fsync()
  ALSA: oxygen: Fix right channel of capture volume mixer
  usb: mon: Fix atomicity violation in mon_bin_vma_fault
  usb: chipidea: wait controller resume finished for wakeup irq
  usb: dwc: ep0: Update request status in dwc3_ep0_stall_restart
  usb: phy: mxs: remove CONFIG_USB_OTG condition for mxs_phy_is_otg_host()
  tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
  binder: fix unused alloc->free_async_space
  binder: fix race between mmput() and do_exit()
  xen-netback: don't produce zero-size SKB frags
  Input: atkbd - use ab83 as id when skipping the getid command
  binder: fix async space check for 0-sized buffers
  watchdog: bcm2835_wdt: Fix WDIOC_SETTIMEOUT handling
  watchdog: set cdev owner before adding
  gpu/drm/radeon: fix two memleaks in radeon_vm_init
  drivers/amd/pm: fix a use-after-free in kv_parse_power_table
  drm/amd/pm: fix a double-free in si_dpm_init
  media: dvbdev: drop refcount on error path in dvb_device_open()
  media: cx231xx: fix a memleak in cx231xx_init_isoc
  drm/radeon/trinity_dpm: fix a memleak in trinity_parse_power_table
  drm/radeon/dpm: fix a memleak in sumo_parse_power_table
  drm/radeon: check the alloc_workqueue return value in radeon_crtc_init()
  drm/drv: propagate errors from drm_modeset_register_all()
  drm/msm/mdp4: flush vblank event on disable
  ASoC: cs35l34: Fix GPIO name and drop legacy include
  ASoC: cs35l33: Fix GPIO name and drop legacy include
  drm/radeon: check return value of radeon_ring_lock()
  drm/radeon/r100: Fix integer overflow issues in r100_cs_track_check()
  drm/radeon/r600_cs: Fix possible int overflows in r600_cs_check_reg()
  f2fs: fix to avoid dirent corruption
  drm/bridge: Fix typo in post_disable() description
  media: pvrusb2: fix use after free on context disconnection
  RDMA/usnic: Silence uninitialized symbol smatch warnings
  ip6_tunnel: fix NEXTHDR_FRAGMENT handling in ip6_tnl_parse_tlv_enc_lim()
  Bluetooth: Fix bogus check for re-auth no supported with non-ssp
  wifi: rtlwifi: rtl8192se: using calculate_bit_shift()
  wifi: rtlwifi: rtl8192ee: using calculate_bit_shift()
  wifi: rtlwifi: rtl8192de: using calculate_bit_shift()
  rtlwifi: rtl8192de: make arrays static const, makes object smaller
  wifi: rtlwifi: rtl8192ce: using calculate_bit_shift()
  wifi: rtlwifi: rtl8192cu: using calculate_bit_shift()
  wifi: rtlwifi: rtl8192c: using calculate_bit_shift()
  wifi: rtlwifi: rtl8188ee: phy: using calculate_bit_shift()
  wifi: rtlwifi: add calculate_bit_shift()
  wifi: rtlwifi: rtl8821ae: phy: fix an undefined bitwise shift behavior
  rtlwifi: Use ffs in <foo>_phy_calculate_bit_shift
  firmware: ti_sci: Fix an off-by-one in ti_sci_debugfs_create()
  net/ncsi: Fix netlink major/minor version numbers
  ncsi: internal.h: Fix a spello
  wifi: libertas: stop selecting wext
  bpf, lpm: Fix check prefixlen before walking trie
  NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICT
  crypto: scomp - fix req->dst buffer overflow
  crypto: scompress - Use per-CPU struct instead multiple variables
  crypto: scompress - return proper error code for allocation failure
  crypto: sahara - do not resize req->src when doing hash operations
  crypto: sahara - fix processing hash requests with req->nbytes < sg->length
  crypto: sahara - improve error handling in sahara_sha_process()
  crypto: sahara - fix wait_for_completion_timeout() error handling
  crypto: sahara - fix ahash reqsize
  crypto: virtio - Wait for tasklet to complete on device remove
  pstore: ram_core: fix possible overflow in persistent_ram_init_ecc()
  crypto: sahara - fix error handling in sahara_hw_descriptor_create()
  crypto: sahara - fix processing requests with cryptlen < sg->length
  crypto: sahara - fix ahash selftest failure
  crypto: sahara - remove FLAGS_NEW_KEY logic
  crypto: af_alg - Disallow multiple in-flight AIO requests
  crypto: ccp - fix memleak in ccp_init_dm_workarea
  crypto: virtio - Handle dataq logic with tasklet
  mtd: Fix gluebi NULL pointer dereference caused by ftl notifier
  calipso: fix memory leak in netlbl_calipso_add_pass()
  netlabel: remove unused parameter in netlbl_netlink_auditinfo()
  net: netlabel: Fix kerneldoc warnings
  ACPI: video: check for error while searching for backlight device parent
  mtd: rawnand: Increment IFC_TIMEOUT_MSECS for nand controller response
  powerpc/imc-pmu: Add a null pointer check in update_events_in_group()
  powerpc/powernv: Add a null pointer check in opal_event_init()
  selftests/powerpc: Fix error handling in FPU/VMX preemption tests
  powerpc/pseries/memhp: Fix access beyond end of drmem array
  powerpc/pseries/memhotplug: Quieten some DLPAR operations
  powerpc/44x: select I2C for CURRITUCK
  powerpc: remove redundant 'default n' from Kconfig-s
  powerpc: add crtsavres.o to always-y instead of extra-y
  EDAC/thunderx: Fix possible out-of-bounds string access
  x86/lib: Fix overflow when counting digits
  coresight: etm4x: Fix width of CCITMIN field
  uio: Fix use-after-free in uio_open
  binder: fix comment on binder_alloc_new_buf() return value
  drm/crtc: fix uninitialized variable use
  Input: xpad - add Razer Wolverine V2 support
  ARC: fix spare error
  s390/scm: fix virtual vs physical address confusion
  Input: atkbd - skip ATKBD_CMD_GETID in translated mode
  reset: hisilicon: hi6220: fix Wvoid-pointer-to-enum-cast warning
  ring-buffer: Do not record in NMI if the arch does not support cmpxchg in NMI
  tracing: Add size check when printing trace_marker output
  tracing: Have large events show up as '[LINE TOO BIG]' instead of nothing
  drm/crtc: Fix uninit-value bug in drm_mode_setcrtc
  jbd2: correct the printing of write_flags in jbd2_write_superblock()
  clk: rockchip: rk3128: Fix HCLK_OTG gate register
  drm/exynos: fix a potential error pointer dereference
  ASoC: da7219: Support low DC impedance headset
  net/tg3: fix race condition in tg3_reset_task()
  ASoC: rt5650: add mutex to avoid the jack detection failure
  ASoC: cs43130: Fix incorrect frame delay configuration
  ASoC: cs43130: Fix the position of const qualifier
  f2fs: explicitly null-terminate the xattr list
  LTS: Update to 4.14.337
  ipv6: remove max_size check inline with ipv4
  ipv6: make ip6_rt_gc_expire an atomic_t
  net/dst: use a smaller percpu_counter batch for dst entries accounting
  net: add a route cache full diagnostic message
  netfilter: nf_tables: Reject tables of unsupported family
  fuse: nlookup missing decrement in fuse_direntplus_link
  mm: fix unmap_mapping_range high bits shift bug
  mm/memory-failure: check the mapcount of the precise page
  bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
  asix: Add check for usbnet_get_endpoints
  net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
  net/qla3xxx: switch from 'pci_' to 'dma_' API
  LTS: create metadata for 4.14.y

 Conflicts:
	drivers/android/binder_alloc.c
	drivers/infiniband/ulp/srpt/ib_srpt.c
	fs/aio.c
	fs/f2fs/namei.c
	include/linux/fs.h
	kernel/power/swap.c
	mm/memory-failure.c

Change-Id: I559d04dc6e27861ffd63ac8ae8ae9db8ff498e24
2024-03-23 00:45:25 +02:00
Hou Tao
7474abe2c0 bpf: Add map and need_defer parameters to .map_fd_put_ptr()
[ Upstream commit 20c20bd11a0702ce4dc9300c3da58acf551d9725 ]

map is the pointer of outer map, and need_defer needs some explanation.
need_defer tells the implementation to defer the reference release of
the passed element and ensure that the element is still alive before
the bpf program, which may manipulate it, exits.

The following three cases will invoke map_fd_put_ptr() and different
need_defer values will be passed to these callers:

1) release the reference of the old element in the map during map update
   or map deletion. The release must be deferred, otherwise the bpf
   program may incur use-after-free problem, so need_defer needs to be
   true.
2) release the reference of the to-be-added element in the error path of
   map update. The to-be-added element is not visible to any bpf
   program, so it is OK to pass false for need_defer parameter.
3) release the references of all elements in the map during map release.
   Any bpf program which has access to the map must have been exited and
   released, so need_defer=false will be OK.

These two parameters will be used by the following patches to fix the
potential use-after-free problem for map-in-map.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231204140425.1480317-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5aa1e7d3f6d0db96c7139677d9e898bbbd6a7dcf)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-03-08 08:21:31 +00:00
Greg Kroah-Hartman
f9cf23e1ff Merge 4.14.79 into android-4.14-p
Changes in 4.14.79
	xfrm: Validate address prefix lengths in the xfrm selector.
	xfrm6: call kfree_skb when skb is toobig
	xfrm: reset transport header back to network header after all input transforms ahave been applied
	xfrm: reset crypto_done when iterating over multiple input xfrms
	mac80211: Always report TX status
	cfg80211: reg: Init wiphy_idx in regulatory_hint_core()
	mac80211: fix pending queue hang due to TX_DROP
	cfg80211: Address some corner cases in scan result channel updating
	mac80211: TDLS: fix skb queue/priority assignment
	mac80211: fix TX status reporting for ieee80211s
	xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.
	ARM: 8799/1: mm: fix pci_ioremap_io() offset check
	xfrm: validate template mode
	netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev
	arm64: hugetlb: Fix handling of young ptes
	ARM: dts: BCM63xx: Fix incorrect interrupt specifiers
	net: macb: Clean 64b dma addresses if they are not detected
	soc: fsl: qbman: qman: avoid allocating from non existing gen_pool
	soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift()
	nl80211: Fix possible Spectre-v1 for NL80211_TXRATE_HT
	mac80211_hwsim: do not omit multicast announce of first added radio
	Bluetooth: SMP: fix crash in unpairing
	pxa168fb: prepare the clock
	qed: Avoid implicit enum conversion in qed_set_tunn_cls_info
	qed: Fix mask parameter in qed_vf_prep_tunn_req_tlv
	qed: Avoid implicit enum conversion in qed_roce_mode_to_flavor
	qed: Avoid constant logical operation warning in qed_vf_pf_acquire
	qed: Avoid implicit enum conversion in qed_iwarp_parse_rx_pkt
	nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds
	asix: Check for supported Wake-on-LAN modes
	ax88179_178a: Check for supported Wake-on-LAN modes
	lan78xx: Check for supported Wake-on-LAN modes
	sr9800: Check for supported Wake-on-LAN modes
	r8152: Check for supported Wake-on-LAN Modes
	smsc75xx: Check for Wake-on-LAN modes
	smsc95xx: Check for Wake-on-LAN modes
	cfg80211: fix use-after-free in reg_process_hint()
	perf/core: Fix perf_pmu_unregister() locking
	perf/ring_buffer: Prevent concurent ring buffer access
	perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX
	perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events
	net: fec: fix rare tx timeout
	declance: Fix continuation with the adapter identification message
	net: qualcomm: rmnet: Skip processing loopback packets
	locking/ww_mutex: Fix runtime warning in the WW mutex selftest
	be2net: don't flip hw_features when VXLANs are added/deleted
	net: cxgb3_main: fix a missing-check bug
	yam: fix a missing-check bug
	ocfs2: fix crash in ocfs2_duplicate_clusters_by_page()
	iwlwifi: mvm: check for short GI only for OFDM
	iwlwifi: dbg: allow wrt collection before ALIVE
	iwlwifi: fix the ALIVE notification layout
	tools/testing/nvdimm: unit test clear-error commands
	usbip: vhci_hcd: update 'status' file header and format
	scsi: aacraid: address UBSAN warning regression
	IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
	IB/rxe: put the pool on allocation failure
	s390/qeth: fix error handling in adapter command callbacks
	net/mlx5: Fix mlx5_get_vector_affinity function
	powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=n
	dm integrity: fail early if required HMAC key is not available
	net: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b
	net: phy: Add general dummy stubs for MMD register access
	net/mlx5e: Refine ets validation function
	scsi: qla2xxx: Avoid double completion of abort command
	kbuild: set no-integrated-as before incl. arch Makefile
	IB/mlx5: Avoid passing an invalid QP type to firmware
	ARM: tegra: Fix ULPI regression on Tegra20
	l2tp: remove configurable payload offset
	cifs: Use ULL suffix for 64-bit constant
	test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches
	KVM: x86: Update the exit_qualification access bits while walking an address
	sparc64: Fix regression in pmdp_invalidate().
	tpm: move the delay_msec increment after sleep in tpm_transmit()
	bpf: sockmap, map_release does not hold refcnt for pinned maps
	tpm: tpm_crb: relinquish locality on error path.
	xen-netfront: Update features after registering netdev
	xen-netfront: Fix mismatched rtnl_unlock
	IB/usnic: Update with bug fixes from core code
	mmc: dw_mmc-rockchip: correct property names in debug
	MIPS: Workaround GCC __builtin_unreachable reordering bug
	lan78xx: Don't reset the interface on open
	enic: do not overwrite error code
	iio: buffer: fix the function signature to match implementation
	selftests/powerpc: Add ptrace hw breakpoint test
	scsi: ibmvfc: Avoid unnecessary port relogin
	scsi: sd: Remember that READ CAPACITY(16) succeeded
	btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
	net: phy: phylink: Don't release NULL GPIO
	x86/paravirt: Fix some warning messages
	net: stmmac: mark PM functions as __maybe_unused
	kconfig: fix the rule of mainmenu_stmt symbol
	libertas: call into generic suspend code before turning off power
	perf tests: Fix indexing when invoking subtests
	compiler.h: Allow arch-specific asm/compiler.h
	ARM: dts: imx53-qsb: disable 1.2GHz OPP
	perf python: Use -Wno-redundant-decls to build with PYTHON=python3
	rxrpc: Don't check RXRPC_CALL_TX_LAST after calling rxrpc_rotate_tx_window()
	rxrpc: Only take the rwind and mtu values from latest ACK
	rxrpc: Fix connection-level abort handling
	net: ena: fix warning in rmmod caused by double iounmap
	net: ena: fix NULL dereference due to untimely napi initialization
	selftests: rtnetlink.sh explicitly requires bash.
	fs/fat/fatent.c: add cond_resched() to fat_count_free_clusters()
	sch_netem: restore skb->dev after dequeuing from the rbtree
	mtd: spi-nor: Add support for is25wp series chips
	kvm: x86: fix WARN due to uninitialized guest FPU state
	ARM: dts: r8a7790: Correct critical CPU temperature
	media: uvcvideo: Fix driver reference counting
	ALSA: usx2y: Fix invalid stream URBs
	Revert "netfilter: ipv6: nf_defrag: drop skb dst before queueing"
	perf tools: Disable parallelism for 'make clean'
	drm/i915/gvt: fix memory leak of a cmd_entry struct on error exit path
	bridge: do not add port to router list when receives query with source 0.0.0.0
	net: bridge: remove ipv6 zero address check in mcast queries
	ipv6: mcast: fix a use-after-free in inet6_mc_check
	ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called
	llc: set SOCK_RCU_FREE in llc_sap_add_socket()
	net: fec: don't dump RX FIFO register when not available
	net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs
	net: sched: gred: pass the right attribute to gred_change_table_def()
	net: socket: fix a missing-check bug
	net: stmmac: Fix stmmac_mdio_reset() when building stmmac as modules
	net: udp: fix handling of CHECKSUM_COMPLETE packets
	r8169: fix NAPI handling under high load
	sctp: fix race on sctp_id2asoc
	udp6: fix encap return code for resubmitting
	vhost: Fix Spectre V1 vulnerability
	virtio_net: avoid using netif_tx_disable() for serializing tx routine
	ethtool: fix a privilege escalation bug
	bonding: fix length of actor system
	ip6_tunnel: Fix encapsulation layout
	openvswitch: Fix push/pop ethernet validation
	net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault type
	net: sched: Fix for duplicate class dump
	net: drop skb on failure in ip_check_defrag()
	net: fix pskb_trim_rcsum_slow() with odd trim offset
	net/mlx5e: fix csum adjustments caused by RXFCS
	rtnetlink: Disallow FDB configuration for non-Ethernet device
	net: ipmr: fix unresolved entry dumps
	net: bcmgenet: Poll internal PHY for GENETv5
	net/sched: cls_api: add missing validation of netlink attributes
	net/mlx5: Fix build break when CONFIG_SMP=n
	Linux 4.14.79

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-11-08 07:44:15 -08:00
John Fastabend
3c0cff34e9 bpf: sockmap, map_release does not hold refcnt for pinned maps
[ Upstream commit ba6b8de423f8d0dee48d6030288ed81c03ddf9f0 ]

Relying on map_release hook to decrement the reference counts when a
map is removed only works if the map is not being pinned. In the
pinned case the ref is decremented immediately and the BPF programs
released. After this BPF programs may not be in-use which is not
what the user would expect.

This patch moves the release logic into bpf_map_put_uref() and brings
sockmap in-line with how a similar case is handled in prog array maps.

Fixes: 3d9e952697de ("bpf: sockmap, fix leaking maps with attached but not detached progs")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-04 14:52:44 +01:00
Greg Kroah-Hartman
4576e0eca9 Merge 4.14.26 into android-4.14
Changes in 4.14.26
	bpf: fix mlock precharge on arraymaps
	bpf: fix memory leak in lpm_trie map_free callback function
	bpf: fix rcu lockdep warning for lpm_trie map_free callback
	bpf, x64: implement retpoline for tail call
	bpf, arm64: fix out of bounds access in tail call
	bpf: add schedule points in percpu arrays management
	bpf: allow xadd only on aligned memory
	bpf, ppc64: fix out of bounds access in tail call
	KVM: x86: fix backward migration with async_PF
	Linux 4.14.26

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-03-11 17:37:01 +01:00
Eric Dumazet
e1760b3563 bpf: add schedule points in percpu arrays management
[ upstream commit 32fff239de37ef226d5b66329dd133f64d63b22d ]

syszbot managed to trigger RCU detected stalls in
bpf_array_free_percpu()

It takes time to allocate a huge percpu map, but even more time to free
it.

Since we run in process context, use cond_resched() to yield cpu if
needed.

Fixes: a10423b87a ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11 16:23:22 +01:00
Daniel Borkmann
d9fd73c60b bpf: fix mlock precharge on arraymaps
[ upstream commit 9c2d63b843a5c8a8d0559cc067b5398aa5ec3ffc ]

syzkaller recently triggered OOM during percpu map allocation;
while there is work in progress by Dennis Zhou to add __GFP_NORETRY
semantics for percpu allocator under pressure, there seems also a
missing bpf_map_precharge_memlock() check in array map allocation.

Given today the actual bpf_map_charge_memlock() happens after the
find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock()
is there to bail out early before we go and do the map setup work
when we find that we hit the limits anyway. Therefore add this for
array map as well.

Fixes: 6c90598174 ("bpf: pre-allocate hash map elements")
Fixes: a10423b87a ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-11 16:23:21 +01:00
Greg Kroah-Hartman
9b68347c35 Merge 4.14.14 into android-4.14
Changes in 4.14.14
	dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
	KVM: Fix stack-out-of-bounds read in write_mmio
	can: vxcan: improve handling of missing peer name attribute
	can: gs_usb: fix return value of the "set_bittiming" callback
	IB/srpt: Disable RDMA access by the initiator
	IB/srpt: Fix ACL lookup during login
	MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
	MIPS: Factor out NT_PRFPREG regset access helpers
	MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
	MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
	MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
	MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
	MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
	cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC
	kvm: vmx: Scrub hardware GPRs at VM-exit
	platform/x86: wmi: Call acpi_wmi_init() later
	iw_cxgb4: only call the cq comp_handler when the cq is armed
	iw_cxgb4: atomically flush the qp
	iw_cxgb4: only clear the ARMED bit if a notification is needed
	iw_cxgb4: reflect the original WR opcode in drain cqes
	iw_cxgb4: when flushing, complete all wrs in a chain
	x86/acpi: Handle SCI interrupts above legacy space gracefully
	ALSA: pcm: Remove incorrect snd_BUG_ON() usages
	ALSA: pcm: Workaround for weird PulseAudio behavior on rewind error
	ALSA: pcm: Add missing error checks in OSS emulation plugin builder
	ALSA: pcm: Abort properly at pending signal in OSS read/write loops
	ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
	ALSA: aloop: Release cable upon open error path
	ALSA: aloop: Fix inconsistent format due to incomplete rule
	ALSA: aloop: Fix racy hw constraints adjustment
	x86/acpi: Reduce code duplication in mp_override_legacy_irq()
	8021q: fix a memory leak for VLAN 0 device
	ip6_tunnel: disable dst caching if tunnel is dual-stack
	net: core: fix module type in sock_diag_bind
	phylink: ensure we report link down when LOS asserted
	RDS: Heap OOB write in rds_message_alloc_sgs()
	RDS: null pointer dereference in rds_atomic_free_op
	net: fec: restore dev_id in the cases of probe error
	net: fec: defer probe if regulator is not ready
	net: fec: free/restore resource in related probe error pathes
	sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
	sctp: fix the handling of ICMP Frag Needed for too small MTUs
	sh_eth: fix TSU resource handling
	net: stmmac: enable EEE in MII, GMII or RGMII only
	sh_eth: fix SH7757 GEther initialization
	ipv6: fix possible mem leaks in ipv6_make_skb()
	ethtool: do not print warning for applications using legacy API
	mlxsw: spectrum_router: Fix NULL pointer deref
	net/sched: Fix update of lastuse in act modules implementing stats_update
	ipv6: sr: fix TLVs not being copied using setsockopt
	mlxsw: spectrum: Relax sanity checks during enslavement
	sfp: fix sfp-bus oops when removing socket/upstream
	membarrier: Disable preemption when calling smp_call_function_many()
	crypto: algapi - fix NULL dereference in crypto_remove_spawns()
	mmc: renesas_sdhi: Add MODULE_LICENSE
	rbd: reacquire lock should update lock owner client id
	rbd: set max_segments to USHRT_MAX
	iwlwifi: pcie: fix DMA memory mapping / unmapping
	x86/microcode/intel: Extend BDW late-loading with a revision check
	KVM: x86: Add memory barrier on vmcs field lookup
	KVM: PPC: Book3S PR: Fix WIMG handling under pHyp
	KVM: PPC: Book3S HV: Drop prepare_done from struct kvm_resize_hpt
	KVM: PPC: Book3S HV: Fix use after free in case of multiple resize requests
	KVM: PPC: Book3S HV: Always flush TLB in kvmppc_alloc_reset_hpt()
	drm/vmwgfx: Don't cache framebuffer maps
	drm/vmwgfx: Potential off by one in vmw_view_add()
	drm/i915/gvt: Clear the shadow page table entry after post-sync
	drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake.
	drm/i915: Move init_clock_gating() back to where it was
	drm/i915: Fix init_clock_gating for resume
	bpf: prevent out-of-bounds speculation
	bpf, array: fix overflow in max_entries and undefined behavior in index_mask
	bpf: arsh is not supported in 32 bit alu thus reject it
	USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
	USB: serial: cp210x: add new device ID ELV ALC 8xxx
	usb: misc: usb3503: make sure reset is low for at least 100us
	USB: fix usbmon BUG trigger
	USB: UDC core: fix double-free in usb_add_gadget_udc_release
	usbip: remove kernel addresses from usb device and urb debug msgs
	usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input
	usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer
	staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
	Bluetooth: Prevent stack info leak from the EFS element.
	uas: ignore UAS for Norelsys NS1068(X) chips
	mux: core: fix double get_device()
	kdump: write correct address of mem_section into vmcoreinfo
	apparmor: fix ptrace label match when matching stacked labels
	e1000e: Fix e1000_check_for_copper_link_ich8lan return value.
	x86/pti: Unbreak EFI old_memmap
	x86/Documentation: Add PTI description
	x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
	sysfs/cpu: Add vulnerability folder
	x86/cpu: Implement CPU vulnerabilites sysfs functions
	x86/tboot: Unbreak tboot with PTI enabled
	x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*()
	x86/cpu/AMD: Make LFENCE a serializing instruction
	x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
	sysfs/cpu: Fix typos in vulnerability documentation
	x86/alternatives: Fix optimize_nops() checking
	x86/pti: Make unpoison of pgd for trusted boot work for real
	objtool: Detect jumps to retpoline thunks
	objtool: Allow alternatives to be ignored
	x86/retpoline: Add initial retpoline support
	x86/spectre: Add boot time option to select Spectre v2 mitigation
	x86/retpoline/crypto: Convert crypto assembler indirect jumps
	x86/retpoline/entry: Convert entry assembler indirect jumps
	x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
	x86/retpoline/hyperv: Convert assembler indirect jumps
	x86/retpoline/xen: Convert Xen hypercall indirect jumps
	x86/retpoline/checksum32: Convert assembler indirect jumps
	x86/retpoline/irq32: Convert assembler indirect jumps
	x86/retpoline: Fill return stack buffer on vmexit
	selftests/x86: Add test_vsyscall
	x86/pti: Fix !PCID and sanitize defines
	security/Kconfig: Correct the Documentation reference for PTI
	x86,perf: Disable intel_bts when PTI
	x86/retpoline: Remove compile time warning
	Linux 4.14.14

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-17 10:33:24 +01:00
Daniel Borkmann
67c05d9414 bpf, array: fix overflow in max_entries and undefined behavior in index_mask
commit bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1 upstream.

syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc98
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.

However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.

Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.

This fixes all the issues triggered by syzkaller's reproducers.

Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:45:25 +01:00
Alexei Starovoitov
a5dbaf8768 bpf: prevent out-of-bounds speculation
commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index >= max_entries) {
    index &= map->index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:45:25 +01:00
Chenbo Feng
cace572e16 BACKPORT: bpf: Add file mode configuration into bpf maps
Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Bug: 30950746
Change-Id: Icfad20f1abb77f91068d244fb0d87fa40824dd1b

(cherry picked from commit 6e71b04a82248ccf13a94b85cbc674a9fefe53f5)
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2017-12-18 21:11:22 +05:30
Daniel Borkmann
bc6d5031b4 bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
PCPU_MIN_UNIT_SIZE is an implementation detail of the percpu
allocator. Given we support __GFP_NOWARN now, lets just let
the allocation request fail naturally instead. The two call
sites from BPF mistakenly assumed __GFP_NOWARN would work, so
no changes needed to their actual __alloc_percpu_gfp() calls
which use the flag already.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19 13:13:50 +01:00
Daniel Borkmann
7b0c2a0508 bpf: inline map in map lookup functions for array and htab
Avoid two successive functions calls for the map in map lookup, first
is the bpf_map_lookup_elem() helper call, and second the callback via
map->ops->map_lookup_elem() to get to the map in map implementation.
Implementation inlines array and htab flavor for map in map lookups.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19 21:56:34 -07:00
Martin KaFai Lau
96eabe7a40 bpf: Allow selecting numa node during map creation
The current map creation API does not allow to provide the numa-node
preference.  The memory usually comes from where the map-creation-process
is running.  The performance is not ideal if the bpf_prog is known to
always run in a numa node different from the map-creation-process.

One of the use case is sharding on CPU to different LRU maps (i.e.
an array of LRU maps).  Here is the test result of map_perf_test on
the INNER_LRU_HASH_PREALLOC test if we force the lru map used by
CPU0 to be allocated from a remote numa node:

[ The machine has 20 cores. CPU0-9 at node 0. CPU10-19 at node 1 ]

># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1628380 events per sec
4:inner_lru_hash_map_perf pre-alloc 1626396 events per sec
3:inner_lru_hash_map_perf pre-alloc 1626144 events per sec
6:inner_lru_hash_map_perf pre-alloc 1621657 events per sec
2:inner_lru_hash_map_perf pre-alloc 1621534 events per sec
1:inner_lru_hash_map_perf pre-alloc 1620292 events per sec
7:inner_lru_hash_map_perf pre-alloc 1613305 events per sec
0:inner_lru_hash_map_perf pre-alloc 1239150 events per sec  #<<<

After specifying numa node:
># taskset -c 10 ./map_perf_test 512 8 1260000 8000000
5:inner_lru_hash_map_perf pre-alloc 1629627 events per sec
3:inner_lru_hash_map_perf pre-alloc 1628057 events per sec
1:inner_lru_hash_map_perf pre-alloc 1623054 events per sec
6:inner_lru_hash_map_perf pre-alloc 1616033 events per sec
2:inner_lru_hash_map_perf pre-alloc 1614630 events per sec
4:inner_lru_hash_map_perf pre-alloc 1612651 events per sec
7:inner_lru_hash_map_perf pre-alloc 1609337 events per sec
0:inner_lru_hash_map_perf pre-alloc 1619340 events per sec #<<<

This patch adds one field, numa_node, to the bpf_attr.  Since numa node 0
is a valid node, a new flag BPF_F_NUMA_NODE is also added.  The numa_node
field is honored if and only if the BPF_F_NUMA_NODE flag is set.

Numa node selection is not supported for percpu map.

This patch does not change all the kmalloc.  F.e.
'htab = kzalloc()' is not changed since the object
is small enough to stay in the cache.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19 21:35:43 -07:00
Martin KaFai Lau
14dc6f04f4 bpf: Add syscall lookup support for fd array and htab
This patch allows userspace to do BPF_MAP_LOOKUP_ELEM on
BPF_MAP_TYPE_PROG_ARRAY,
BPF_MAP_TYPE_ARRAY_OF_MAPS and
BPF_MAP_TYPE_HASH_OF_MAPS.

The lookup returns a prog-id or map-id to the userspace.
The userspace can then use the BPF_PROG_GET_FD_BY_ID
or BPF_MAP_GET_FD_BY_ID to get a fd.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29 13:13:25 -04:00
Alexei Starovoitov
f91840a32d perf, bpf: Add BPF support to all perf_event types
Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all
perf_event types, including HW_CACHE, RAW, and dynamic pmu events.
Only tracepoint/kprobe events are treated differently which require
BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly.

Also add support for reading all event counters using
bpf_perf_event_read() helper.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 21:58:01 -04:00
Daniel Borkmann
a316338cb7 bpf: fix wrong exposure of map_flags into fdinfo for lpm
trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.

Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.

Fixes: b95a5c4db0 ("bpf: add a longest prefix match trie map implementation")
Reported-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 13:44:28 -04:00
Teng Qin
8fe4592438 bpf: map_get_next_key to return first key on NULL
When iterating through a map, we need to find a key that does not exist
in the map so map_get_next_key will give us the first key of the map.
This often requires a lot of guessing in production systems.

This patch makes map_get_next_key return the first key when the key
pointer in the parameter is NULL.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-25 11:57:45 -04:00
Johannes Berg
40077e0cf6 bpf: remove struct bpf_map_type_list
There's no need to have struct bpf_map_type_list since
it just contains a list_head, the type, and the ops
pointer. Since the types are densely packed and not
actually dynamically registered, it's much easier and
smaller to have an array of type->ops pointer. Also
initialize this array statically to remove code needed
to initialize it.

In order to save duplicating the list, move it to the
types header file added by the previous patch and
include it in the same fashion.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 14:38:43 -04:00
Martin KaFai Lau
56f668dfe0 bpf: Add array of maps support
This patch adds a few helper funcs to enable map-in-map
support (i.e. outer_map->inner_map).  The first outer_map type
BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch.
The next patch will introduce a hash of maps type.

Any bpf map type can be acted as an inner_map.  The exception
is BPF_MAP_TYPE_PROG_ARRAY because the extra level of
indirection makes it harder to verify the owner_prog_type
and owner_jited.

Multi-level map-in-map is not supported (i.e. map->map is ok
but not map->map->map).

When adding an inner_map to an outer_map, it currently checks the
map_type, key_size, value_size, map_flags, max_entries and ops.
The verifier also uses those map's properties to do static analysis.
map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT
is using a preallocated hashtab for the inner_hash also.  ops and
max_entries are needed to generate inlined map-lookup instructions.
For simplicity reason, a simple '==' test is used for both map_flags
and max_entries.  The equality of ops is implied by the equality of
map_type.

During outer_map creation time, an inner_map_fd is needed to create an
outer_map.  However, the inner_map_fd's life time does not depend on the
outer_map.  The inner_map_fd is merely used to initialize
the inner_map_meta of the outer_map.

Also, for the outer_map:

* It allows element update and delete from syscall
* It allows element lookup from bpf_prog

The above is similar to the current fd_array pattern.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 15:45:45 -07:00
Martin KaFai Lau
fad73a1a35 bpf: Fix and simplifications on inline map lookup
Fix in verifier:
For the same bpf_map_lookup_elem() instruction (i.e. "call 1"),
a broken case is "a different type of map could be used for the
same lookup instruction". For example, an array in one case and a
hashmap in another.  We have to resort to the old dynamic call behavior
in this case.  The fix is to check for collision on insn_aux->map_ptr.
If there is collision, don't inline the map lookup.

Please see the "do_reg_lookup()" in test_map_in_map_kern.c in the later
patch for how-to trigger the above case.

Simplifications on array_map_gen_lookup():
1. Calculate elem_size from map->value_size.  It removes the
   need for 'struct bpf_array' which makes the later map-in-map
   implementation easier.
2. Remove the 'elem_size == 1' test

Fixes: 81ed18ab30 ("bpf: add helper inlining infra and optimize map_array lookup")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 15:45:45 -07:00
Alexei Starovoitov
81ed18ab30 bpf: add helper inlining infra and optimize map_array lookup
Optimize bpf_call -> bpf_map_lookup_elem() -> array_map_lookup_elem()
into a sequence of bpf instructions.
When JIT is on the sequence of bpf instructions is the sequence
of native cpu instructions with significantly faster performance
than indirect call and two function's prologue/epilogue.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 20:44:11 -07:00
Daniel Borkmann
c78f8bdfa1 bpf: mark all registered map/prog types as __ro_after_init
All map types and prog types are registered to the BPF core through
bpf_register_map_type() and bpf_register_prog_type() during init and
remain unchanged thereafter. As by design we don't (and never will)
have any pluggable code that can register to that at any later point
in time, lets mark all the existing bpf_{map,prog}_type_list objects
in the tree as __ro_after_init, so they can be moved to read-only
section from then onwards.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 13:40:04 -05:00
Daniel Borkmann
d407bd25a2 bpf: don't trigger OOM killer under pressure with map alloc
This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(),
that are to be used for map allocations. Using kmalloc() for very large
allocations can cause excessive work within the page allocator, so i) fall
back earlier to vmalloc() when the attempt is considered costly anyway,
and even more importantly ii) don't trigger OOM killer with any of the
allocators.

Since this is based on a user space request, for example, when creating
maps with element pre-allocation, we really want such requests to fail
instead of killing other user space processes.

Also, don't spam the kernel log with warnings should any of the allocations
fail under pressure. Given that, we can make backend selection in
bpf_map_area_alloc() generic, and convert all maps over to use this API
for spots with potentially large allocation requests.

Note, replacing the one kmalloc_array() is fine as overflow checks happen
earlier in htab_map_alloc(), since it must also protect the multiplication
for vmalloc() should kmalloc_array() fail.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18 17:12:26 -05:00
Michal Hocko
7984c27c2c bpf: do not use KMALLOC_SHIFT_MAX
Commit 01b3f52157 ("bpf: fix allocation warnings in bpf maps and
integer overflow") has added checks for the maximum allocateable size.
It (ab)used KMALLOC_SHIFT_MAX for that purpose.

While this is not incorrect it is not very clean because we already have
KMALLOC_MAX_SIZE for this very reason so let's change both checks to use
KMALLOC_MAX_SIZE instead.

The original motivation for using KMALLOC_SHIFT_MAX was to work around
an incorrect KMALLOC_MAX_SIZE which could lead to allocation warnings
but it is no longer needed since "slab: make sure that KMALLOC_MAX_SIZE
will fit into MAX_ORDER".

Link: http://lkml.kernel.org/r/20161220130659.16461-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:54 -08:00
Sargun Dhillon
60d20f9195 bpf: Add bpf_current_task_under_cgroup helper
This adds a bpf helper that's similar to the skb_in_cgroup helper to check
whether the probe is currently executing in the context of a specific
subset of the cgroupsv2 hierarchy. It does this based on membership test
for a cgroup arraymap. It is invalid to call this in an interrupt, and
it'll return an error. The helper is primarily to be used in debugging
activities for containers, where you may have multiple programs running in
a given top-level "container".

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-12 21:49:41 -07:00
Daniel Borkmann
858d68f102 bpf: bpf_event_entry_gen's alloc needs to be in atomic context
Should have been obvious, only called from bpf() syscall via map_update_elem()
that calls bpf_fd_array_map_update_elem() under RCU read lock and thus this
must also be in GFP_ATOMIC, of course.

Fixes: 3b1efb196e ("bpf, maps: flush own entries on perf map release")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-16 22:03:39 -07:00