Notice that ignore_dl_rate_limit() need not piggy back on the
limits_changed handling to achieve its goal (which is to enforce a
frequency update before its due time).
Namely, if sugov_should_update_freq() is updated to check
sg_policy->need_freq_update and return 'true' if it is set when
sg_policy->limits_changed is not set, ignore_dl_rate_limit() may
set the former directly instead of setting the latter, so it can
avoid hitting the memory barrier in sugov_should_update_freq().
Update the code accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/10666429.nUPlyArG6x@rjwysocki.net
The handling of the limits_changed flag in struct sugov_policy needs to
be explicitly synchronized to ensure that cpufreq policy limits updates
will not be missed in some cases.
Without that synchronization it is theoretically possible that
the limits_changed update in sugov_should_update_freq() will be
reordered with respect to the reads of the policy limits in
cpufreq_driver_resolve_freq() and in that case, if the limits_changed
update in sugov_limits() clobbers the one in sugov_should_update_freq(),
the new policy limits may not take effect for a long time.
Likewise, the limits_changed update in sugov_limits() may theoretically
get reordered with respect to the updates of the policy limits in
cpufreq_set_policy() and if sugov_should_update_freq() runs between
them, the policy limits change may be missed.
To ensure that the above situations will not take place, add memory
barriers preventing the reordering in question from taking place and
add READ_ONCE() and WRITE_ONCE() annotations around all of the
limits_changed flag updates to prevent the compiler from messing up
with that code.
Fixes: 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change")
Cc: 5.3+ <stable@vger.kernel.org> # 5.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3376719.44csPzL39Z@rjwysocki.net
Commit 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused
by need_freq_update") modified sugov_should_update_freq() to set the
need_freq_update flag only for drivers with CPUFREQ_NEED_UPDATE_LIMITS
set, but that flag generally needs to be set when the policy limits
change because the driver callback may need to be invoked for the new
limits to take effect.
However, if the return value of cpufreq_driver_resolve_freq() after
applying the new limits is still equal to the previously selected
frequency, the driver callback needs to be invoked only in the case
when CPUFREQ_NEED_UPDATE_LIMITS is set (which means that the driver
specifically wants its callback to be invoked every time the policy
limits change).
Update the code accordingly to avoid missing policy limits changes for
drivers without CPUFREQ_NEED_UPDATE_LIMITS.
Fixes: 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update")
Closes: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org/
Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3010358.e9J7NaK4W3@rjwysocki.net
To restore previous uclamp value, we still need store uclamp_req directly.
Otherwise, saved_priority will store effective uclamp value and restore it
to uclamp_req later.
Bug: 277389699
Change-Id: I7b3e357fcfc3bd955789e85d730713c384d0ade7
Signed-off-by: Chungkai Mei <chungkai@google.com>
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
Don't use the uclamp of current task as the default uclamp for
binders, because the uclamp of current task influence
binders' placement when not in a transaction.
Just use default value 0 and SCHED_CAPACITY_SCALE for binders'
default uclamp min and max. Also replace set_inherited_uclamp with
set_binder_prio_uclamp
Bug: 277389699
Change-Id: I07c4f40c2689dbc7eb23e7d3e2a2f435353dc25f
Signed-off-by: Chungkai Mei <chungkai@google.com>
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
uclamp inheritance is not relevant while !uclamp_is_used().
Partially addresses the issue of enabling uclamp_is_used static key
causing a splat because of holding cpus_read_lock() in_atomic() context.
Bug: 259145692
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: Ib752e96e41b2fcace6edcdcec169f1ca56540a9b
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
We may still have long binder transaction due to
insufficient uclamp, so expand binder_prioity to
inherit uclamp as well.
Bug: 226003124
Signed-off-by: ChungKai Mei <chungkai@google.com>
Change-Id: I307ab812638eeea1ca3e80ae07dee05fb9797dd6
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
Most of binder's memory allocations are tiny, and they're allocated
and freed extremely frequently. The latency from going through the page
allocator all the time for such small allocations ends up being quite
high, especially when the system is low on memory. Binder is
performance-critical, so this is suboptimal.
Instead of using kzalloc to allocate a struct every time, reserve caches
specifically for allocating each struct quickly.
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
Every binder operation is being logged. This impacts performance and
increases memory footprint. Since binder is critical for Android
operation, doing any logging on production builds isn't best idea.
Quick grep over Android sources revealed that only lshal and dumpsys
binaries use binder_log directory, these are not critical so breaking
them won't hurt much. Anyways, I was able to succesfully run both and
bacis functionality was still there.
Benchmarks showed significant decrease in transaction time with an avg
of 1000ns.
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
[Helium-Studio: Forwardport to 5.10 binder]
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
# Conflicts:
# drivers/android/Kconfig
In commit 15d9da3f818c ("binder: use bitmap for faster descriptor
lookup"), it was incorrectly assumed that references to the context
manager node should always get descriptor zero assigned to them.
However, if the context manager dies and a new process takes its place,
then assigning descriptor zero to the new context manager might lead to
collisions, as there could still be references to the older node. This
issue was reported by syzbot with the following trace:
kernel BUG at drivers/android/binder.c:1173!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 447 Comm: binder-util Not tainted 6.10.0-rc6-00348-g31643d84b8c3 #10
Hardware name: linux,dummy-virt (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : binder_inc_ref_for_node+0x500/0x544
lr : binder_inc_ref_for_node+0x1e4/0x544
sp : ffff80008112b940
x29: ffff80008112b940 x28: ffff0e0e40310780 x27: 0000000000000000
x26: 0000000000000001 x25: ffff0e0e40310738 x24: ffff0e0e4089ba34
x23: ffff0e0e40310b00 x22: ffff80008112bb50 x21: ffffaf7b8f246970
x20: ffffaf7b8f773f08 x19: ffff0e0e4089b800 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 000000002de4aa60
x14: 0000000000000000 x13: 2de4acf000000000 x12: 0000000000000020
x11: 0000000000000018 x10: 0000000000000020 x9 : ffffaf7b90601000
x8 : ffff0e0e48739140 x7 : 0000000000000000 x6 : 000000000000003f
x5 : ffff0e0e40310b28 x4 : 0000000000000000 x3 : ffff0e0e40310720
x2 : ffff0e0e40310728 x1 : 0000000000000000 x0 : ffff0e0e40310710
Call trace:
binder_inc_ref_for_node+0x500/0x544
binder_transaction+0xf68/0x2620
binder_thread_write+0x5bc/0x139c
binder_ioctl+0xef4/0x10c8
[...]
This patch adds back the previous behavior of assigning the next
non-zero descriptor if references to previous context managers still
exist. It amends both strategies, the newer dbitmap code and also the
legacy slow_desc_lookup_olocked(), by allowing them to start looking
for available descriptors at a given offset.
Fixes: 15d9da3f818c ("binder: use bitmap for faster descriptor lookup")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+3dae065ca76952a67257@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000c1c0a0061d1e6979@google.com/
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240722150512.4192473-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
When creating new binder references, the driver assigns a descriptor id
that is shared with userspace. Regrettably, the driver needs to keep the
descriptors small enough to accommodate userspace potentially using them
as Vector indexes. Currently, the driver performs a linear search on the
rb-tree of references to find the smallest available descriptor id. This
approach, however, scales poorly as the number of references grows.
This patch introduces the usage of bitmaps to boost the performance of
descriptor assignments. This optimization results in notable performance
gains, particularly in processes with a large number of references. The
following benchmark with 100,000 references showcases the difference in
latency between the dbitmap implementation and the legacy approach:
[ 587.145098] get_ref_desc_olocked: 15us (dbitmap on)
[ 602.788623] get_ref_desc_olocked: 47343us (dbitmap off)
Note the bitmap size is dynamically adjusted in line with the number of
references, ensuring efficient memory usage. In cases where growing the
bitmap is not possible, the driver falls back to the slow legacy method.
A previous attempt to solve this issue was proposed in [1]. However,
such method involved adding new ioctls which isn't great, plus older
userspace code would not have benefited from the optimizations either.
Link: https://lore.kernel.org/all/20240417191418.1341988-1-cmllamas@google.com/ [1]
Cc: Tim Murray <timmurray@google.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Martijn Coenen <maco@android.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: John Stultz <jstultz@google.com>
Cc: Steven Moreland <smoreland@google.com>
Suggested-by: Nick Chen <chenjia3@oppo.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20240612042535.1556708-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
Based on upstream commit 794a56ebd9a57db12abaec63f038c6eb073461f7
Change-Id: I2a6be669c847da253f09e72c6f41437a9c0f11ef
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
* On qcom devices, there are not too many IRQs to balance, set polling interval
to 20 seconds should improve battery backup w/o hurting performance.
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
The old default value for slice is 0.75 msec * (1 + ilog(ncpus)) which
means that we have a default slice of:
0.75 for 1 cpu
1.50 up to 3 cpus
2.25 up to 7 cpus
3.00 for 8 cpus and above.
For HZ=250 and HZ=100, because of the tick accuracy, the runtime of
tasks is far higher than their slice.
For HZ=1000 with 8 cpus or more, the accuracy of tick is already
satisfactory, but there is still an issue that tasks will get an extra
tick because the tick often arrives a little faster than expected. In
this case, the task can only wait until the next tick to consider that it
has reached its deadline, and will run 1ms longer.
vruntime + sysctl_sched_base_slice = deadline
|-----------|-----------|-----------|-----------|
1ms 1ms 1ms 1ms
^ ^ ^ ^
tick1 tick2 tick3 tick4(nearly 4ms)
There are two reasons for tick error: clockevent precision and the
CONFIG_IRQ_TIME_ACCOUNTING/CONFIG_PARAVIRT_TIME_ACCOUNTING. with
CONFIG_IRQ_TIME_ACCOUNTING every tick will be less than 1ms, but even
without it, because of clockevent precision, tick still often less than
1ms.
In order to make scheduling more precise, we changed 0.75 to 0.70,
Using 0.70 instead of 0.75 should not change much for other configs
and would fix this issue:
0.70 for 1 cpu
1.40 up to 3 cpus
2.10 up to 7 cpus
2.8 for 8 cpus and above.
This does not guarantee that tasks can run the slice time accurately
every time, but occasionally running an extra tick has little impact.
Signed-off-by: zihan zhou <15645113830zzh@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20250208075322.13139-1-15645113830zzh@gmail.com
[Helium-Studio: Adapt for 8 cpus]
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
* 4.14 doesn't have KBUILD_LDFLAGS, rename it to LDFLAGS
* Use --plugin-opt=O3 for LTO_CLANG, use -O3 for !LTO_CLANG
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
Compiler gets confused between current variable in lz4.c and the
current macro in current.h.
This fixes the following compilation errors:
../lib/lz4/lz4.c:1145:15: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
1145 | U32 const current = (U32)(forwardIp - base);
| ^
../arch/arm64/include/asm/current.h:24:28: note: expanded from macro 'current'
24 | #define current get_current()
| ^
../lib/lz4/lz4.c:1145:15: error: conflicting types for 'get_current'
../arch/arm64/include/asm/current.h:24:17: note: expanded from macro 'current'
24 | #define current get_current()
| ^
../arch/arm64/include/asm/current.h:15:44: note: previous definition is here
15 | static __always_inline struct task_struct *get_current(void)
| ^
../lib/lz4/lz4.c:1145:15: error: illegal initializer (only variables can be initialized)
1145 | U32 const current = (U32)(forwardIp - base);
| ^
../arch/arm64/include/asm/current.h:24:17: note: expanded from macro 'current'
24 | #define current get_current()
| ^
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
* The official lz4 source files don't follow the Linux kernel coding style,
reformat to keep the coding style consistent.
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
Upstream lz4 mentioned a performance regression on Qualcomm SoCs
when built with Clang, but not with GCC [1]. However, according to my
testing on sm8350 with LLVM Clang 15, this patch does offer a nice
10% boost in decompression, so enable the fast dec loop for Clang
as well.
Testing procedure:
- pre-fill zram with 1GB of real-word zram data dumped under memory
pressure, for example
$ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000
- $ fio --readonly --name=randread --direct=1 --rw=randread \
--ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \
--group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M
Results:
- vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec)
- lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec)
[1] https://github.com/lz4/lz4/pull/707
Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
[Helium-Studio: Reword the commit message to reflect unconditionally enabling]
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
This refactors original KSU hooks to replace deep kernel function hooks with targeted hooks.
This backports KernelSU pr#1657 and having pr#2084 elements (32-bit sucompat).
It reduces the scope of kernel function interception and still maintains full fucntionality.
more info: backslashxx/KernelSU#5