28434 Commits

Author SHA1 Message Date
Tim Zimmermann
e9dfea1356 syscall: Increase bpf fake uname to 5.4.299
Signed-off-by: Ansh <singhansh64321@gmail.com>
2026-01-20 13:45:21 +00:00
Marco Elver
d392fd0a00 BACKPORT: panic: use error_report_end tracepoint on warnings
Introduce the error detector "warning" to the error_report event and use
the error_report_end tracepoint at the end of a warning report.

This allows in-kernel tests but also userspace to more easily determine
if a warning occurred without polling kernel logs.

[akpm@linux-foundation.org: add comma to enum list, per Andy]

Link: https://lkml.kernel.org/r/20211115085630.1756817-1-elver@google.com
Change-Id: Ia82127785563994f9a8b07a4c9e5c2483242f9f0
Signed-off-by: Marco Elver <elver@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-20 13:45:21 +00:00
Alexander Potapenko
3a9310b21b tracing: add error_report_end trace point
Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3.

This patchset adds a tracepoint, error_repor_end, that is to be used by
KFENCE, KASAN, and potentially other bug detection tools, when they print
an error report.  One of the possible use cases is userspace collection of
kernel error reports: interested parties can subscribe to the tracing
event via tracefs, and get notified when an error report occurs.

This patch (of 3):

Introduce error_report_end tracepoint.  It can be used in debugging tools
like KASAN, KFENCE, etc.  to provide extensions to the error reporting
mechanisms (e.g.  allow tests hook into error reporting, ease error report
collection from production kernels).  Another benefit would be making use
of ftrace for debugging or benchmarking the tools themselves.

Should we need it, the tracepoint name leaves us with the possibility to
introduce a complementary error_report_start tracepoint in the future.

Link: https://lkml.kernel.org/r/20210121131915.1331302-1-glider@google.com
Link: https://lkml.kernel.org/r/20210121131915.1331302-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Marco Elver <elver@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-20 13:45:21 +00:00
Peter Zijlstra
c515adf83a sched: Provide sched_set_fifo()
SCHED_FIFO (or any static priority scheduler) is a broken scheduler
model; it is fundamentally incapable of resource management, the one
thing an OS is actually supposed to do.

It is impossible to compose static priority workloads. One cannot take
two well designed and functional static priority workloads and mash
them together and still expect them to work.

Therefore it doesn't make sense to expose the priority field; the
kernel is fundamentally incapable of setting a sensible value, it
needs systems knowledge that it doesn't have.

Take away sched_setschedule() / sched_setattr() from modules and
replace them with:

  - sched_set_fifo(p); create a FIFO task (at prio 50)
  - sched_set_fifo_low(p); create a task higher than NORMAL,
	which ends up being a FIFO task at prio 1.
  - sched_set_normal(p, nice); (re)set the task to normal

This stops the proliferation of randomly chosen, and irrelevant, FIFO
priorities that dont't really mean anything anyway.

The system administrator/integrator, whoever has insight into the
actual system design and requirements (userspace) can set-up
appropriate priorities if and when needed.

Cc: airlied@redhat.com
Cc: alexander.deucher@amd.com
Cc: awalls@md.metrocast.net
Cc: axboe@kernel.dk
Cc: broonie@kernel.org
Cc: daniel.lezcano@linaro.org
Cc: gregkh@linuxfoundation.org
Cc: hannes@cmpxchg.org
Cc: herbert@gondor.apana.org.au
Cc: hverkuil@xs4all.nl
Cc: john.stultz@linaro.org
Cc: nico@fluxnic.net
Cc: paulmck@kernel.org
Cc: rafael.j.wysocki@intel.com
Cc: rmk+kernel@arm.linux.org.uk
Cc: sudeep.holla@arm.com
Cc: tglx@linutronix.de
Cc: ulf.hansson@linaro.org
Cc: wim@linux-watchdog.org
Change-Id: I52ed4f1253e82ba3e8f40f3aa1aff62580163f25
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-20 13:45:21 +00:00
Michael Bestas
1d4129ac18 sched: Provide sched_setattr_nocheck()
Based on upstream commit 794a56ebd9a57db12abaec63f038c6eb073461f7

Change-Id: I2a6be669c847da253f09e72c6f41437a9c0f11ef
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-20 13:45:20 +00:00
Kees Cook
c7417dd0b8 UPSTREAM: kheaders: Use array declaration instead of char
Under CONFIG_FORTIFY_SOURCE, memcpy() will check the size of destination
and source buffers. Defining kernel_headers_data as "char" would trip
this check. Since these addresses are treated as byte arrays, define
them as arrays (as done everywhere else).

This was seen with:

  $ cat /sys/kernel/kheaders.tar.xz >> /dev/null

  detected buffer overflow in memcpy
  kernel BUG at lib/string_helpers.c:1027!
  ...
  RIP: 0010:fortify_panic+0xf/0x20
  [...]
  Call Trace:
   <TASK>
   ikheaders_read+0x45/0x50 [kheaders]
   kernfs_fop_read_iter+0x1a4/0x2f0
  ...

Bug: 254441685
Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/20230302112130.6e402a98@kernel.org/
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 43d8ce9d65a5 ("Provide in-kernel headers to make extending kernel easier")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230302224946.never.243-kees@kernel.org
(cherry picked from commit b69edab47f1da8edd8e7bfdf8c70f51a2a5d89fb)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I73c7530b9c558c1c8dac5f8962dbc31c553c0be7
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-20 13:45:19 +00:00
Munehisa Kamata
ad8750dbfb UPSTREAM: sched/psi: Fix use-after-free in ep_remove_wait_queue()
If a non-root cgroup gets removed when there is a thread that registered
trigger and is polling on a pressure file within the cgroup, the polling
waitqueue gets freed in the following path:

 do_rmdir
   cgroup_rmdir
     kernfs_drain_open_files
       cgroup_file_release
         cgroup_pressure_release
           psi_trigger_destroy

However, the polling thread still has a reference to the pressure file and
will access the freed waitqueue when the file is closed or upon exit:

 fput
   ep_eventpoll_release
     ep_free
       ep_remove_wait_queue
         remove_wait_queue

This results in use-after-free as pasted below.

The fundamental problem here is that cgroup_file_release() (and
consequently waitqueue's lifetime) is not tied to the file's real lifetime.
Using wake_up_pollfree() here might be less than ideal, but it is in line
with the comment at commit 42288cb44c4b ("wait: add wake_up_pollfree()")
since the waitqueue's lifetime is not tied to file's one and can be
considered as another special case. While this would be fixable by somehow
making cgroup_file_release() be tied to the fput(), it would require
sizable refactoring at cgroups or higher layer which might be more
justifiable if we identify more cases like this.

  BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x60/0xc0
  Write of size 4 at addr ffff88810e625328 by task a.out/4404

	CPU: 19 PID: 4404 Comm: a.out Not tainted 6.2.0-rc6 #38
	Hardware name: Amazon EC2 c5a.8xlarge/, BIOS 1.0 10/16/2017
	Call Trace:
	<TASK>
	dump_stack_lvl+0x73/0xa0
	print_report+0x16c/0x4e0
	kasan_report+0xc3/0xf0
	kasan_check_range+0x2d2/0x310
	_raw_spin_lock_irqsave+0x60/0xc0
	remove_wait_queue+0x1a/0xa0
	ep_free+0x12c/0x170
	ep_eventpoll_release+0x26/0x30
	__fput+0x202/0x400
	task_work_run+0x11d/0x170
	do_exit+0x495/0x1130
	do_group_exit+0x100/0x100
	get_signal+0xd67/0xde0
	arch_do_signal_or_restart+0x2a/0x2b0
	exit_to_user_mode_prepare+0x94/0x100
	syscall_exit_to_user_mode+0x20/0x40
	do_syscall_64+0x52/0x90
	entry_SYSCALL_64_after_hwframe+0x63/0xcd
	</TASK>

 Allocated by task 4404:

	kasan_set_track+0x3d/0x60
	__kasan_kmalloc+0x85/0x90
	psi_trigger_create+0x113/0x3e0
	pressure_write+0x146/0x2e0
	cgroup_file_write+0x11c/0x250
	kernfs_fop_write_iter+0x186/0x220
	vfs_write+0x3d8/0x5c0
	ksys_write+0x90/0x110
	do_syscall_64+0x43/0x90
	entry_SYSCALL_64_after_hwframe+0x63/0xcd

 Freed by task 4407:

	kasan_set_track+0x3d/0x60
	kasan_save_free_info+0x27/0x40
	____kasan_slab_free+0x11d/0x170
	slab_free_freelist_hook+0x87/0x150
	__kmem_cache_free+0xcb/0x180
	psi_trigger_destroy+0x2e8/0x310
	cgroup_file_release+0x4f/0xb0
	kernfs_drain_open_files+0x165/0x1f0
	kernfs_drain+0x162/0x1a0
	__kernfs_remove+0x1fb/0x310
	kernfs_remove_by_name_ns+0x95/0xe0
	cgroup_addrm_files+0x67f/0x700
	cgroup_destroy_locked+0x283/0x3c0
	cgroup_rmdir+0x29/0x100
	kernfs_iop_rmdir+0xd1/0x140
	vfs_rmdir+0xfe/0x240
	do_rmdir+0x13d/0x280
	__x64_sys_rmdir+0x2c/0x30
	do_syscall_64+0x43/0x90
	entry_SYSCALL_64_after_hwframe+0x63/0xcd

Bug: 254441685
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Signed-off-by: Mengchi Cheng <mengcc@amazon.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/20230106224859.4123476-1-kamatam@amazon.com/
Link: https://lore.kernel.org/r/20230214212705.4058045-1-kamatam@amazon.com
(cherry picked from commit c2dbe32d5db5c4ead121cf86dabd5ab691fb47fe)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I9677499b2885149a1070f508931113ad8a02277a
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-20 13:45:19 +00:00
Namhyung Kim
009784781d UPSTREAM: perf/core: Call LSM hook after copying perf_event_attr
It passes the attr struct to the security_perf_event_open() but it's
not initialized yet.

Bug: 254441685
Fixes: da97e18458fb ("perf_event: Add support for LSM and SELinux checks")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221220223140.4020470-1-namhyung@kernel.org
(cherry picked from commit 0a041ebca4956292cadfb14a63ace3a9c1dcb0a3)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I0bda5722f4cff80252e2ec483b5b1f18b1194356
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-20 13:45:19 +00:00
Mukesh Ojha
973d72d79e UPSTREAM: gcov: clang: fix the buffer overflow issue
Currently, in clang version of gcov code when module is getting removed
gcov_info_add() incorrectly adds the sfn_ptr->counter to all the
dst->functions and it result in the kernel panic in below crash report.
Fix this by properly handling it.

[    8.899094][  T599] Unable to handle kernel write to read-only memory at virtual address ffffff80461cc000
[    8.899100][  T599] Mem abort info:
[    8.899102][  T599]   ESR = 0x9600004f
[    8.899103][  T599]   EC = 0x25: DABT (current EL), IL = 32 bits
[    8.899105][  T599]   SET = 0, FnV = 0
[    8.899107][  T599]   EA = 0, S1PTW = 0
[    8.899108][  T599]   FSC = 0x0f: level 3 permission fault
[    8.899110][  T599] Data abort info:
[    8.899111][  T599]   ISV = 0, ISS = 0x0000004f
[    8.899113][  T599]   CM = 0, WnR = 1
[    8.899114][  T599] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000ab8de000
[    8.899116][  T599] [ffffff80461cc000] pgd=18000009ffcde003, p4d=18000009ffcde003, pud=18000009ffcde003, pmd=18000009ffcad003, pte=00600000c61cc787
[    8.899124][  T599] Internal error: Oops: 9600004f [#1] PREEMPT SMP
[    8.899265][  T599] Skip md ftrace buffer dump for: 0x1609e0
....
..,
[    8.899544][  T599] CPU: 7 PID: 599 Comm: modprobe Tainted: G S         OE     5.15.41-android13-8-g38e9b1af6bce #1
[    8.899547][  T599] Hardware name: XXX (DT)
[    8.899549][  T599] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
[    8.899551][  T599] pc : gcov_info_add+0x9c/0xb8
[    8.899557][  T599] lr : gcov_event+0x28c/0x6b8
[    8.899559][  T599] sp : ffffffc00e733b00
[    8.899560][  T599] x29: ffffffc00e733b00 x28: ffffffc00e733d30 x27: ffffffe8dc297470
[    8.899563][  T599] x26: ffffffe8dc297000 x25: ffffffe8dc297000 x24: ffffffe8dc297000
[    8.899566][  T599] x23: ffffffe8dc0a6200 x22: ffffff880f68bf20 x21: 0000000000000000
[    8.899569][  T599] x20: ffffff880f68bf00 x19: ffffff8801babc00 x18: ffffffc00d7f9058
[    8.899572][  T599] x17: 0000000000088793 x16: ffffff80461cbe00 x15: 9100052952800785
[    8.899575][  T599] x14: 0000000000000200 x13: 0000000000000041 x12: 9100052952800785
[    8.899577][  T599] x11: ffffffe8dc297000 x10: ffffffe8dc297000 x9 : ffffff80461cbc80
[    8.899580][  T599] x8 : ffffff8801babe80 x7 : ffffffe8dc2ec000 x6 : ffffffe8dc2ed000
[    8.899583][  T599] x5 : 000000008020001f x4 : fffffffe2006eae0 x3 : 000000008020001f
[    8.899586][  T599] x2 : ffffff8027c49200 x1 : ffffff8801babc20 x0 : ffffff80461cb3a0
[    8.899589][  T599] Call trace:
[    8.899590][  T599]  gcov_info_add+0x9c/0xb8
[    8.899592][  T599]  gcov_module_notifier+0xbc/0x120
[    8.899595][  T599]  blocking_notifier_call_chain+0xa0/0x11c
[    8.899598][  T599]  do_init_module+0x2a8/0x33c
[    8.899600][  T599]  load_module+0x23cc/0x261c
[    8.899602][  T599]  __arm64_sys_finit_module+0x158/0x194
[    8.899604][  T599]  invoke_syscall+0x94/0x2bc
[    8.899607][  T599]  el0_svc_common+0x1d8/0x34c
[    8.899609][  T599]  do_el0_svc+0x40/0x54
[    8.899611][  T599]  el0_svc+0x94/0x2f0
[    8.899613][  T599]  el0t_64_sync_handler+0x88/0xec
[    8.899615][  T599]  el0t_64_sync+0x1b4/0x1b8
[    8.899618][  T599] Code: f905f56c f86e69ec f86e6a0f 8b0c01ec (f82e6a0c)
[    8.899620][  T599] ---[ end trace ed5218e9e5b6e2e6 ]---

Bug: 254441685
Link: https://lkml.kernel.org/r/1668020497-13142-1-git-send-email-quic_mojha@quicinc.com
Fixes: e178a5beb369 ("gcov: clang support")
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: <stable@vger.kernel.org>	[5.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit a6f810efabfd789d3bbafeacb4502958ec56c5ce)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: If73014531a63392cda8b1ce2607573b85978be30
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-20 13:45:19 +00:00
ThunderStorms21th
8546733a7c Powersuspend v2.0 - compile fix :
make static power_suspend
make static power_resume

Signed-off-by: ThunderStorms21th - nalas
2026-01-18 10:58:07 +00:00
ThunderStorms21th
da71c9f39e POWERSUSPEND: SQUASH - updated to Powersuspend v2.0 for Exynos 9820
*  v1.9.0 - Syncronized suspend/resume driver printing of ignored errors,
 *           and turned on state notifier debugger..
 *         - subsys_incall changed to module_init,
 *         - state notifier - going back to scheduled work, and subsys initcall,
 *         - initializing work in module init.
 *         - updated our outdated method of workqueue declaration.
 *
 *  v1.9.1 - Updated the depecrated method of declaring work but simply declaring
 *           the two work structs. Also actually INITialized the work on init,
 *           and flushed it on exit.
 *
 *  v2.0.0 - Included State Notifier hooks to run explicitly once power state
 *           changes are completed to prevent blocking issues.

Credits to robcore (https://github.com/robcore) for some updates.

Signed-off-by: ThunderStorms21th - nalas
2026-01-18 10:58:07 +00:00
yank555-lu
62741bc1cc kernel/power: POWERSUSPEND v1.8 - squash
kernel/power/powersuspend: new PM kernel driver for Android w/o early…

…_suspend v1.5 (faux123/Yank555.lu)

powersuspend: new PM kernel driver for Android w/o early_suspend

Android early_suspend/late_resume PM kernel driver framework has been
deprecated by Google. This new powersuspend PM kernel driver is a replacement
for it and existing early_suspend drivers can be easily adapted to use this
new replacement driver

Signed-off-by: Paul Reioux <reioux@gmail.com>
powersuspend: fix logci derps :p

Signed-off-by: Paul Reioux <reioux@gmail.com>
kernel/power/powersuspend: remove userspace dependency from powersuspend

make powersuspend not depend on a userspace initiator anymore, but use a hook
in autosleep instead.

Signed-off-by: Paul Reioux <reioux@gmail.com>
kernel/power/powersuspend: add back userpace control w/ default kernel control

make kernel / userspace mode switchable

Signed-off-by: Paul Reioux <reioux@gmail.com>
kernel/power/powersuspend: default to userspace for now

Signed-off-by: Paul Reioux <reioux@gmail.com>
kernel/power/powersuspend: LCD screen on/off hooks (Yank555.lu)

- add an alternative hook for screen on / off detection in the display
  pannel driver
- sleeps  ~0.1s sooner, wakes up ~1s later then autosleep
- clean up source formatting a bit (Paul Reioux)

Signed-off-by: Paul Reioux <reioux@gmail.com>
mdss_dsi_panel.c: add powersuspend hooks

Signed-off-by: Paul Reioux <reioux@gmail.com>
kernel/power/powersuspend: cumulative update to version 1.5

- fix hybrid-kernel mode cannot be set through sysfs

kernel/power/powersuspend: new PM kernel driver for Android w/o early_suspend v1.4 (Yank555.lu)

- add hybrid-kernel mode (autosleep and panel, first wins)
- default to hybrid-kernel mode
- harmonize debug message with my other stuff
- include all 3 modes as default, so it's only commenting out to change

kernel/power/powersuspend: new PM kernel driver for Android w/o early_suspend v1.3 (Yank555.lu)

- fix stupid typo
- add an alternative hook for screen on / off detection in the display panel driver
(sleeps ~0.1s sooner, wakes up ~1s later then autosleep)

kernel/power/powersuspend: new PM kernel driver for Android w/o early_suspend v1.2 (Yank555.lu)

- make kernel / userspace mode switchable

kernel/power/powersuspend: new PM kernel driver for Android w/o early_suspend v1.1 (Yank555.lu)

- make powersuspend not depend on a userspace initiator anymore, but use a hook in autosleep instead.

kernel/power/powersuspend: new PM kernel driver for Android w/o early_suspend (faux123)

Android early_suspend/late_resume PM kernel driver framework has been
deprecated by Google. This new powersuspend PM kernel driver is a replacement
for it and existing early_suspend drivers can be easily adapted to use this
new replacement driver

Signed-off-by: Paul Reioux <reioux@gmail.com>
drivers/video/msm/mdss/mdss_dsi_panel.c: update powersuspend hook calls

fix typo!

Change-Id: If9ee91d5ff2ba2d7623865758d0d8cdf582c341c
Signed-off-by: Paul Reioux <reioux@gmail.com>

Removed mdss_dsi_panel.c panel hooks not for exynos/decon display

Signed-off-by: UpInTheAir <upintheair.xda@gmail.com>
Conflicts:
	drivers/video/msm/mdss/mdss_dsi_panel.c

 kernel/power/powersuspend: new PM kernel driver for Android w/o early…

…_suspend v1.6 (faux123/Yank555.lu)

- autosleep hook removed, not working on shamu
- autosleep and hybrid modes removed
- panel mode is now default
- debug output switchable in defconfig

 kernel/power/powersuspend: v1.6.1 add autosleep & hybrid modes

hybrid mode is default

Signed-off-by: UpInTheAir <upintheair.xda@gmail.com>

 kernel/power/powersuspend: new PM kernel driver for Android w/o early…

…_suspend v1.7 (faux123/Yank555.lu)

- do only run state change if change actually requests a new state

Change-Id: I3f3989ce939cd4d60831fb05dc8790cf10bcbd17

Signed-off-by: UpInTheAir <upintheair.xda@gmail.com>
Conflicts:
	kernel/power/powersuspend.c

 kernel/power/powersuspend: new PM kernel driver for Android w/o early…

…_suspend v1.7 (faux123/Yank555.lu)

- fix a #ifdef / #endif derp

Thanx to AuxXxilium (Christian Schulthess) for pointing this out to me !

Change-Id: I5827ea36a69e4a42d6a9196749ccac12c8bee746
Signed-off-by: yank555-lu <yank555.lu@gmail.com>

 powersuspend: add power_suspended boolean for global access

Some routines just need the boolean for whether powersuspend is active or not.

Rather than hooking to the powersuspend on every occasions,
add power_suspended boolean for global access.

Signed-off-by: arter97 <qkrwngud825@gmail.com>

Signed-off-by: UpInTheAir <upintheair.xda@gmail.com>
Conflicts:
	kernel/power/powersuspend.c

 powersuspend: Replaced deprecated singlethread workqueue with updated…

… schedule_work

 powersuspend: add debug sysfs trigger to see how driver work

Signed-off-by: UpInTheAir <upintheair.skyhigh@gmail.com>

 powersuspend: disable debugging by default

Signed-off-by: UpInTheAir <upintheair.skyhigh@gmail.com>

 fix powersuspend compile error

Signed-off-by: morogoku <morogoku@hotmail.com>
2026-01-18 10:58:06 +00:00
backslashxx
2f4a56036f syscall: Only spoof uname for root 2026-01-18 10:57:55 +00:00
Tim Zimmermann
ec049e51c9 syscall: Increase bpf fake uname to 5.4.186
* https://android-review.googlesource.com/c/platform/packages/modules/Connectivity/+/3088785

Change-Id: Iaba91f5594cebd2e361b670fb866abb5c58c6707
2026-01-18 10:57:55 +00:00
Tim Zimmermann
3e8a602e0d syscall: Increase bpf fake uname to 5.4
Change-Id: I50bfa0d35d81f1c8cc21530ea0524a6752d0d34c
2026-01-18 10:57:55 +00:00
Tim Zimmermann
bdb781ac72 syscall: Fake uname to 4.19 also for netbpfload
* This is required for U QPR2

Change-Id: I0321c64f77fccf74ff2472c3abd29e8b6b4be1ce
2026-01-18 10:57:54 +00:00
Tim Zimmermann
e23f25c247 syscall: Fake uname to 4.19 for bpfloader/netd
* Google is attempting to kill 4.14 in 0156d6e2ba

Change-Id: Ic87a66753a7acc89b0fe5b19158eea4c58ba980f
2026-01-18 10:57:54 +00:00
Al Viro
24b762d057 UPSTREAM: close_range(): fix the logics in descriptor table trimming
commit 678379e1d4f7443b170939525d3312cfc37bf86b upstream.

Cloning a descriptor table picks the size that would cover all currently
opened files.  That's fine for clone() and unshare(), but for close_range()
there's an additional twist - we clone before we close, and it would be
a shame to have
	close_range(3, ~0U, CLOSE_RANGE_UNSHARE)
leave us with a huge descriptor table when we are not going to keep
anything past stderr, just because some large file descriptor used to
be open before our call has taken it out.

Unfortunately, it had been dealt with in an inherently racy way -
sane_fdtable_size() gets a "don't copy anything past that" argument
(passed via unshare_fd() and dup_fd()), close_range() decides how much
should be trimmed and passes that to unshare_fd().

The problem is, a range that used to extend to the end of descriptor
table back when close_range() had looked at it might very well have stuff
grown after it by the time dup_fd() has allocated a new files_struct
and started to figure out the capacity of fdtable to be attached to that.

That leads to interesting pathological cases; at the very least it's a
QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes
a snapshot of descriptor table one might have observed at some point.
Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination
of unshare(CLONE_FILES) with plain close_range(), ending up with a
weird state that would never occur with unshare(2) is confusing, to put
it mildly.

It's not hard to get rid of - all it takes is passing both ends of the
range down to sane_fdtable_size().  There we are under ->files_lock,
so the race is trivially avoided.

So we do the following:
	* switch close_files() from calling unshare_fd() to calling
dup_fd().
	* undo the calling convention change done to unshare_fd() in
60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE"
	* introduce struct fd_range, pass a pointer to that to dup_fd()
and sane_fdtable_size() instead of "trim everything past that point"
they are currently getting.  NULL means "we are not going to be punching
any holes"; NR_OPEN_MAX is gone.
	* make sane_fdtable_size() use find_last_bit() instead of
open-coding it; it's easier to follow that way.
	* while we are at it, have dup_fd() report errors by returning
ERR_PTR(), no need to use a separate int *errorp argument.

Fixes: 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE"
Cc: stable@vger.kernel.org
Change-Id: I6782a2edf98970b6c2d662048061e28f7e57b9c9
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-01-18 10:57:53 +00:00
Christian Brauner
9a0578738a BACKPORT: close_range: add CLOSE_RANGE_UNSHARE
One of the use-cases of close_range() is to drop file descriptors just before
execve(). This would usually be expressed in the sequence:

unshare(CLONE_FILES);
close_range(3, ~0U);

as pointed out by Linus it might be desirable to have this be a part of
close_range() itself under a new flag CLOSE_RANGE_UNSHARE.

This expands {dup,unshare)_fd() to take a max_fds argument that indicates the
maximum number of file descriptors to copy from the old struct files. When the
user requests that all file descriptors are supposed to be closed via
close_range(min, max) then we can cap via unshare_fd(min) and hence don't need
to do any of the heavy fput() work for everything above min.

The patch makes it so that if CLOSE_RANGE_UNSHARE is requested and we do in
fact currently share our file descriptor table we create a new private copy.
We then close all fds in the requested range and finally after we're done we
install the new fd table.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I0813045886501e40a45693ee1edad50bdf2b66e5
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2026-01-18 10:57:51 +00:00
Willem de Bruijn
1e308c53b9 BACKPORT: epoll: wire up syscall epoll_pwait2
Split off from prev patch in the series that implements the syscall.

Link: https://lkml.kernel.org/r/20201121144401.3727659-4-willemdebruijn.kernel@gmail.com
Change-Id: I48dfae6f721b24ebc53de603e393289954a95908
Signed-off-by: Willem de Bruijn <willemb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-18 10:57:51 +00:00
StimLuks87
4fd6c9a932 kernel/sched: Remove unused variables
We need to do it to compile with CC_WERROR enabled.
Should just help to keep the code cleaner.

Signed-off-by: VoidDev-0 <dev.avax@protonmail.com>
Signed-off-by: StimLuks87 <153687700+StimLuks87@users.noreply.github.com>
2026-01-18 10:57:50 +00:00
StimLuks87
3dba35c542 kernel: sched/fair.c: Fix clang warnings.
Signed-off-by: StimLuks87 <153687700+StimLuks87@users.noreply.github.com>
2026-01-18 10:57:49 +00:00
StimLuks87
aea9f63668 kernel: sched: Update flush_smp_call_function_from_idle()
* to ensure all pending call functions are processed correctly when the CPU is idle. This
synchronizes the idle task handling with recent core SMP changes.

Signed-off-by: StimLuks87 <san10031987san@gmail.com>
2026-01-18 10:57:41 +00:00
Sultan Alsawaf
0051e63fa6 cpu: Silence log spam when a CPU is brought up
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: celtare21 <celtare21@gmail.com>
Signed-off-by: Forenche <prahul2003@gmail.com>
Signed-off-by: LinkBoi00 <linkdevel@protonmail.com>
Signed-off-by: prathamdby <134331217+prathamdby@users.noreply.github.com>
2026-01-18 10:57:41 +00:00
Shandy Reynaldi
ad934da6f1 kernel: Change tick rate to 300 HZ
Several users reported stutters in gameplay with v2 which was resolved after restoring the default value of the timer frequency (also known as the tick rate). Some users noticed improvments with 300Hz, but there weren't any noticeable improvmenets with higher tick rate values like 600Hz. So, sticking to 300Hz for now.

Signed-off-by: alternoegraha <noegrahachan@gmail.com>
2026-01-18 10:57:38 +00:00
StimLuks87
c5f1f5e914 time: alarmtimer: fix unused variable 'ratelimit' warning
Eliminate Clang -Wunused-variable warning by marking the ratelimit
state as __maybe_unused or removing the declaration.
2026-01-18 10:57:31 +00:00
Adithya R
cb0b9c4683 sched/fair: Switch sched scaling to linear
- improves performance.

Signed-off-by: Adithya R <gh0strider.2k18.reborn@gmail.com>
Signed-off-by: Wahid Khan <wahidzk0091@gmail.com>
Signed-off-by: Wahid7852 <wahidzk0091@gmail.com>
2026-01-18 10:57:31 +00:00
Luks
d18abfbcd2 trace/trace_event_perf: remove duplicate samples on the first tracepoint event
[ Upstream commit afe5960dc208fe069ddaaeb0994d857b24ac19d1 ]

When a tracepoint event is created with attr.freq = 1,
'hwc->period_left' is not initialized correctly. As a result,
in the perf_swevent_overflow() function, when the first time the event occurs,
it calculates the event overflow and the perf_swevent_set_period() returns 3,
this leads to the event are recorded for three duplicate times.

Step to reproduce:
    1. Enable the tracepoint event & starting tracing
         $ echo 1 > /sys/kernel/tracing/events/module/module_free
         $ echo 1 > /sys/kernel/tracing/tracing_on

    2. Record with perf
         $ perf record -a --strict-freq -F 1 -e "module:module_free"

    3. Trigger module_free event.
         $ modprobe -i sunrpc
         $ modprobe -r sunrpc

Result:
     - Trace pipe result:
         $ cat trace_pipe
         modprobe-174509  [003] .....  6504.868896: module_free: sunrpc

     - perf sample:
         modprobe  174509 [003]  6504.868980: module:module_free: sunrpc
         modprobe  174509 [003]  6504.868980: module:module_free: sunrpc
         modprobe  174509 [003]  6504.868980: module:module_free: sunrpc

By setting period_left via perf_swevent_set_period() as other sw_event did,
This problem could be solved.

After patch:
     - Trace pipe result:
         $ cat trace_pipe
         modprobe 1153096 [068] 613468.867774: module:module_free: xfs

     - perf sample
         modprobe 1153096 [068] 613468.867794: module:module_free: xfs

Link: https://lore.kernel.org/20240913021347.595330-1-yeoreum.yun@arm.com
Fixes: bd2b5b1 ("perf_counter: More aggressive frequency adjustment")
Signed-off-by: Levi Yun <yeoreum.yun@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-01-18 10:57:27 +00:00
KanonifyX Tzy
cddb8b3244 mm: Patch le9ec
Protect the working set under memory pressure to prevent thrashing, avoid high latency and prevent livelock in near-OOM conditions

The kernel does not provide a way to protect the working set under memory pressure. A certain amount of anonymous and clean file pages is required by the userspace for normal operation. First of all, the userspace needs a cache of shared libraries and executable binaries. If the amount of the clean file pages falls below a certain level, then thrashing and even livelock can take place.

The patch provides sysctl knobs for protecting the working set (anonymous and clean file pages) under memory pressure.

The vm.anon_min_kbytes sysctl knob provides hard protection of anonymous pages. The anonymous pages on the current node won't be reclaimed under any conditions when their amount is below vm.anon_min_kbytes. This knob may be used to prevent excessive swap thrashing when anonymous memory is low (for example, when memory is going to be overfilled by compressed data of zram module).

The vm.clean_low_kbytes sysctl knob provides best-effort protection of clean file pages. The file pages on the current node won't be reclaimed under memory pressure when the amount of clean file pages is below vm.clean_low_kbytes unless we threaten to OOM. Protection of clean file pages using this knob may be used when swapping is still possible to

- Prevent disk I/O thrashing under memory pressure.
- Improve performance in disk cache-bound tasks under memory pressure.

The vm.clean_min_kbytes sysctl knob provides hard protection of clean file pages. The file pages on the current node won't be reclaimed under memory pressure when the amount of clean file pages is below vm.clean_min_kbytes. Hard protection of clean file pages using this knob may be used to

- Prevent disk I/O thrashing under memory pressure even with no free swap space.
- Improve performance in disk cache-bound tasks under memory pressure.
- Avoid high latency and prevent livelock in near-OOM conditions.

le9ec patches provide three sysctl knobs (vm.anon_min_kbytes, vm.clean_low_kbytes, vm.clean_min_kbytes) with zero values and does not protect the working set by default (CONFIG_ANON_MIN_KBYTES=0, CONFIG_CLEAN_LOW_KBYTES=0, CONFIG_CLEAN_MIN_KBYTES=0). You can specify other values during kernel build, or change the knob values on the fly.

Effects
- Improving system responsiveness under low-memory conditions.
- Improving performance in I/O bound tasks under memory pressure;
- OOM killer comes faster (with hard protection).
- Fast system reclaiming after OOM (with hard protection).

Note that the effects depend on the values of the sysctl tunables.

source patch: https://github.com/hakavlad/le9-patch

Signed-off-by: kanonifyX <kanonify01@gmail.com>
2026-01-18 10:57:27 +00:00
Jebaitedneko
1a97440ceb kernel: cpu: suppress cpu state change logspam
Signed-off-by: Jebaitedneko <Jebaitedneko@gmail.com>
Signed-off-by: Saikrishna1504 <saikrishna26918@gmail.com>
2026-01-18 10:57:26 +00:00
Jebaitedneko
b299d9faee kernel: power: omit trace markers and logspam
Signed-off-by: Saikrishna1504 <saikrishna26918@gmail.com>
2026-01-18 10:57:26 +00:00
Jebaitedneko
f81272ef86 kernel: power: suspend: omit trace markers and logspam
Signed-off-by: Saikrishna1504 <saikrishna26918@gmail.com>
2026-01-18 10:57:26 +00:00
Sultan Alsawaf
872423fc61 treewide: Suppress overly verbose log spam
This tames quite a bit of the log spam and makes dmesg readable.

Uses work from Danny Lin <danny@kdrag0n.dev>.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Saikrishna1504 <saikrishna26918@gmail.com>
2026-01-18 10:57:26 +00:00
Cyber Knight
7888b27024 kernel/cpu: Silence abundance of logspam
We don't really need to know if the CPU is getting disabled or enabled on a production device.

Signed-off-by: Cyber Knight <cyberknight755@gmail.com>
Signed-off-by: Saikrishna1504 <saikrishna26918@gmail.com>
2026-01-18 10:57:22 +00:00
friedrich420
d13c866a4e kernel/sched: Reduce CPU-bound latency
Signed-off-by: officialputuid <officialputuid@hack.id>
2026-01-18 10:57:20 +00:00
officialputuid
1de828540f kernel/sched/tune.c: Reduce BOOSTGROUPS_COUNT to 5
* Import from realme B65 Kernel Sources

Signed-off-by: officialputuid <officialputuid@hack.id>
2026-01-18 10:57:19 +00:00
holyangel
21dc713f70 kernel/sched: Reduce latency for better responsiveness
Signed-off-by: officialputuid <officialputuid@hack.id>
2026-01-18 10:57:07 +00:00
Sultan Alsawaf
6b516eb320 simple_lmk: Report mm as freed as soon as exit_mmap() finishes
exit_mmap() is responsible for freeing the vast majority of an mm's
memory; in order to unblock Simple LMK faster, report an mm as freed as
soon as exit_mmap() finishes.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2026-01-18 10:56:55 +00:00
Sultan Alsawaf
e71fb4006a simple_lmk: Mark victim thread group with TIF_MEMDIE
The OOM killer sets the TIF_MEMDIE thread flag for its victims to alert
other kernel code that the current process was killed due to memory
pressure, and needs to finish whatever it's doing quickly. In the page
allocator this allows victim processes to quickly allocate memory using
emergency reserves. This is especially important when memory pressure is
high; if all processes are taking a while to allocate memory, then our
victim processes will face the same problem and can potentially get
stuck in the page allocator for a while rather than die expeditiously.

To ensure that victim processes die quickly, set TIF_MEMDIE for the
entire victim thread group.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2026-01-18 10:56:55 +00:00
Sultan Alsawaf
f222876ec4 simple_lmk: Introduce Simple Low Memory Killer for Android
This is a complete low memory killer solution for Android that is small
and simple. Processes are killed according to the priorities that
Android gives them, so that the least important processes are always
killed first. Processes are killed until memory deficits are satisfied,
as observed from kswapd struggling to free up pages. Simple LMK stops
killing processes when kswapd finally goes back to sleep.

The only tunables are the desired amount of memory to be freed per
reclaim event and desired frequency of reclaim events. Simple LMK tries
to free at least the desired amount of memory per reclaim and waits
until all of its victims' memory is freed before proceeding to kill more
processes.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2026-01-18 10:56:51 +00:00
Kirill Tkhai
b58e2f06c3 locking/rwsem: Add down_read_killable()
Similar to down_read() and down_write_killable(),
add killable version of down_read(), based on
__down_read_killable() function, added in previous
patches.

Change-Id: I1437294240803082fdb24bdfd3231c8f09d3ff11
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: avagin@virtuozzo.com
Cc: davem@davemloft.net
Cc: fenghua.yu@intel.com
Cc: gorcunov@virtuozzo.com
Cc: heiko.carstens@de.ibm.com
Cc: hpa@zytor.com
Cc: ink@jurassic.park.msu.ru
Cc: mattst88@gmail.com
Cc: rientjes@google.com
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: viro@zeniv.linux.org.uk
Link: http://lkml.kernel.org/r/150670119884.23930.2585570605960763239.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2026-01-18 10:56:47 +00:00
Peter Xu
dce6f87b4f BACKPORT: userfaultfd/sysctl: add vm.unprivileged_userfaultfd
Userfaultfd can be misued to make it easier to exploit existing
use-after-free (and similar) bugs that might otherwise only make a
short window or race condition available.  By using userfaultfd to
stall a kernel thread, a malicious program can keep some state that it
wrote, stable for an extended period, which it can then access using an
existing exploit.  While it doesn't cause the exploit itself, and while
it's not the only thing that can stall a kernel thread when accessing a
memory location, it's one of the few that never needs privilege.

We can add a flag, allowing userfaultfd to be restricted, so that in
general it won't be useable by arbitrary user programs, but in
environments that require userfaultfd it can be turned back on.

Add a global sysctl knob "vm.unprivileged_userfaultfd" to control
whether userfaultfd is allowed by unprivileged users.  When this is
set to zero, only privileged users (root user, or users with the
CAP_SYS_PTRACE capability) will be able to use the userfaultfd
syscalls.

Andrea said:

: The only difference between the bpf sysctl and the userfaultfd sysctl
: this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability
: requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement,
: because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE
: already if it's doing other kind of tracking on processes runtime, in
: addition of userfaultfd.  In other words both syscalls works only for
: root, when the two sysctl are opt-in set to 1.

[dgilbert@redhat.com: changelog additions]
[akpm@linux-foundation.org: documentation tweak, per Mike]
Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com
Change-Id: Ied2500a773b06ac1fdc378e61fd5403a270114a6
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-18 10:56:15 +00:00
Steven Rostedt (VMware)
fc97a955a1 rcu: Speed up calling of RCU tasks callbacks
Joel Fernandes found that the synchronize_rcu_tasks() was taking a
significant amount of time. He demonstrated it with the following test:

 # cd /sys/kernel/tracing
 # while [ 1 ]; do x=1; done &
 # echo '__schedule_bug:traceon' > set_ftrace_filter
 # time echo '!__schedule_bug:traceon' > set_ftrace_filter;

real	0m1.064s
user	0m0.000s
sys	0m0.004s

Where it takes a little over a second to perform the synchronize,
because there's a loop that waits 1 second at a time for tasks to get
through their quiescent points when there's a task that must be waited
for.

After discussion we came up with a simple way to wait for holdouts but
increase the time for each iteration of the loop but no more than a
full second.

With the new patch we have:

 # time echo '!__schedule_bug:traceon' > set_ftrace_filter;

real	0m0.131s
user	0m0.000s
sys	0m0.004s

Which drops it down to 13% of what the original wait time was.

Link: http://lkml.kernel.org/r/20180523063815.198302-2-joel@joelfernandes.org
Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Edwiin Kusuma Jaya <kutemeikito0905@gmail.com>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-18 10:38:35 +00:00
NeilBrown
a01ea7b247 BACKPORT: cred: add get_cred_rcu()
97d0fb239c upstream

Sometimes we want to opportunistically get a
ref to a cred in an rcu_read_lock protected section.
get_task_cred() does this, and NFS does as similar thing
with its own credential structures.
To prepare for NFS converting to use 'struct cred' more
uniformly, define get_cred_rcu(), and use it in
get_task_cred().

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

CONFLICTS:
cred: switch to using atomic_long_t
- resolve conflict by using `atomic_long_inc_not_zero`
  instead of `atomic_inc_not_zero`

Signed-off-by: backslashxx <118538522+backslashxx@users.noreply.github.com>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-18 10:38:35 +00:00
Felix Fietkau
16568926fc debloat: procfs
Strip non-essential /proc functionality to reduce code size

Imported from openwrt

Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-18 10:38:34 +00:00
Tejun Heo
c6fa9f201a UPSTREAM: cgroup: add cgroup_parse_float()
cgroup already uses floating point for percent[ile] numbers and there
are several controllers which want to take them as input.  Add a
generic parse helper to handle inputs.

Update the interface convention documentation about the use of
percentage numbers.  While at it, also clarify the default time unit.

Bug: 120440300
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit a5e112e6424adb77d953eac20e6936b952fd6b32)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ic1fcf21d7955eb8edd2e8e91517bca6aef41694f
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Naveen <133593113+elohim-etz@users.noreply.github.com>
2026-01-18 10:38:33 +00:00
Ard Biesheuvel
d020e15b22 bpf: add __weak hook for allocating executable memory
By default, BPF uses module_alloc() to allocate executable memory,
but this is not necessary on all arches and potentially undesirable
on some of them.

So break out the module_alloc() and module_memfree() calls into __weak
functions to allow them to be overridden in arch code.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Change-Id: I582794881942bc0b766515861f2232354860536b
Signed-off-by: Saikrishna1504 <saikrishna26918@gmail.com>
2026-01-18 10:38:31 +00:00
Shaokun Zhang
f025713caa BACKPORT: cgroup: Remove unused cgrp variable
The 'cgrp' is set but not used in commit <76f969e8948d8>
("cgroup: cgroup v2 freezer").
Remove it to avoid [-Wunused-but-set-variable] warning.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from 533307dc20a9e84a0687d4ca24aeb669516c0243)
Bug: 154548692
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Change-Id: I6221a975c04f06249a4f8d693852776ae08a8d8e
Signed-off-by: prathamdby <134331217+prathamdby@users.noreply.github.com>
2026-01-18 10:38:30 +00:00
Oleg Nesterov
5d0a354afe BACKPORT: cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop()
ptrace_stop() does preempt_enable_no_resched() to avoid the preemption,
but after that cgroup_enter_frozen() does spin_lock/unlock and this adds
another preemption point.

Reported-and-tested-by: Bruce Ashfield <bruce.ashfield@gmail.com>
Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

Change-Id: Ic53e0f2d6624b0bb90817b0c57060fb7db971348
(cherry picked from commit 937c6b27c73e02cd4114f95f5c37ba2c29fadba1)
Bug: 154548692
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: prathamdby <134331217+prathamdby@users.noreply.github.com>
2026-01-18 10:38:30 +00:00
Roman Gushchin
3ca0b7f109 BACKPORT: cgroup: freezer: fix frozen state inheritance
If a new child cgroup is created in the frozen cgroup hierarchy
(one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup
flag should be set. Otherwise if a process will be attached to the
child cgroup, it won't become frozen.

The problem can be reproduced with the test_cgfreezer_mkdir test.

This is the output before this patch:
  ~/test_freezer
  ok 1 test_cgfreezer_simple
  ok 2 test_cgfreezer_tree
  ok 3 test_cgfreezer_forkbomb
  Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen
  not ok 4 test_cgfreezer_mkdir
  ok 5 test_cgfreezer_rmdir
  ok 6 test_cgfreezer_migrate
  ok 7 test_cgfreezer_ptrace
  ok 8 test_cgfreezer_stopped
  ok 9 test_cgfreezer_ptraced
  ok 10 test_cgfreezer_vfork

And with this patch:
  ~/test_freezer
  ok 1 test_cgfreezer_simple
  ok 2 test_cgfreezer_tree
  ok 3 test_cgfreezer_forkbomb
  ok 4 test_cgfreezer_mkdir
  ok 5 test_cgfreezer_rmdir
  ok 6 test_cgfreezer_migrate
  ok 7 test_cgfreezer_ptrace
  ok 8 test_cgfreezer_stopped
  ok 9 test_cgfreezer_ptraced
  ok 10 test_cgfreezer_vfork

Reported-by: Mark Crossen <mcrossen@fb.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer")
Cc: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Tejun Heo <tj@kernel.org>

Change-Id: I6ba7b8dec5600e78bb7448f03fd97a9b43838fa0
(cherry picked from commit 97a61369830ab085df5aed0ff9256f35b07d425a)
Bug: 154548692
Signed-off-by: Marco Ballesio <balejs@google.com>
Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: prathamdby <134331217+prathamdby@users.noreply.github.com>
2026-01-18 10:38:30 +00:00