24438 Commits

Author SHA1 Message Date
Sultan Alsawaf
5f4b739089 simple_lmk: Report mm as freed as soon as exit_mmap() finishes
exit_mmap() is responsible for freeing the vast majority of an mm's
memory; in order to unblock Simple LMK faster, report an mm as freed as
soon as exit_mmap() finishes.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-04-16 18:30:20 +07:00
Michal Hocko
9be5f1d1e1 cpuset, mm: fix TIF_MEMDIE check in cpuset_change_task_nodemask
Commit c0ff7453bb ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") has added TIF_MEMDIE and PF_EXITING check but
it is checking the flag on the current task rather than the given one.

This doesn't make much sense and it is actually wrong.  If the current
task which updates the nodemask of a cpuset got killed by the OOM killer
then a part of the cpuset cgroup processes would have incompatible
nodemask which is surprising to say the least.

The comment suggests the intention was to skip oom victim or an exiting
task so we should be checking the given task.  But even then it would be
layering violation because it is the memory allocator to interpret the
TIF_MEMDIE meaning.  Simply drop both checks.  All tasks in the cpuset
should simply follow the same mask.

Link: http://lkml.kernel.org/r/1467029719-17602-3-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-04-16 18:30:20 +07:00
Michal Hocko
88d1dfa883 freezer, oom: check TIF_MEMDIE on the correct task
freezing_slow_path() is checking TIF_MEMDIE to skip OOM killed tasks.
It is, however, checking the flag on the current task rather than the
given one.  This is really confusing because freezing() can be called
also on !current tasks.  It would end up working correctly for its main
purpose because __refrigerator will be always called on the current task
so the oom victim will never get frozen.  But it could lead to
surprising results when a task which is freezing a cgroup got oom killed
because only part of the cgroup would get frozen.  This is highly
unlikely but worth fixing as the resulting code would be more clear
anyway.

Link: http://lkml.kernel.org/r/1467029719-17602-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-04-16 18:30:20 +07:00
Sultan Alsawaf
30a07a4ed8 simple_lmk: Mark victim thread group with TIF_MEMDIE
The OOM killer sets the TIF_MEMDIE thread flag for its victims to alert
other kernel code that the current process was killed due to memory
pressure, and needs to finish whatever it's doing quickly. In the page
allocator this allows victim processes to quickly allocate memory using
emergency reserves. This is especially important when memory pressure is
high; if all processes are taking a while to allocate memory, then our
victim processes will face the same problem and can potentially get
stuck in the page allocator for a while rather than die expeditiously.

To ensure that victim processes die quickly, set TIF_MEMDIE for the
entire victim thread group.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-04-16 18:30:20 +07:00
Sultan Alsawaf
bda0e82652 simple_lmk: Introduce Simple Low Memory Killer for Android
This is a complete low memory killer solution for Android that is small
and simple. Processes are killed according to the priorities that
Android gives them, so that the least important processes are always
killed first. Processes are killed until memory deficits are satisfied,
as observed from kswapd struggling to free up pages. Simple LMK stops
killing processes when kswapd finally goes back to sleep.

The only tunables are the desired amount of memory to be freed per
reclaim event and desired frequency of reclaim events. Simple LMK tries
to free at least the desired amount of memory per reclaim and waits
until all of its victims' memory is freed before proceeding to kill more
processes.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-04-16 18:30:18 +07:00
Dmitry Torokhov
af970645c4 CHROMIUM: remove Android's cgroup generic permissions checks
The implementation is utterly broken, resulting in all processes being
allows to move tasks between sets (as long as they have access to the
"tasks" attribute), and upstream is heading towards checking only
capability anyway, so let's get rid of this code.

BUG=b:31790445,chromium:647994
TEST=Boot android container, examine logcat

Change-Id: I2f780a5992c34e52a8f2d0b3557fc9d490da2779
Signed-off-by: Dmitry Torokhov <dtor@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/394967
Reviewed-by: Ricky Zhou <rickyz@chromium.org>
Reviewed-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-03-12 18:42:34 +00:00
GhostMaster69-dev
181f02c497 Revert "kernel: time: reduce ntp wakeups"
This reverts commit f90e33ca7c.

Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-02-23 15:55:56 +00:00
Daniel Borkmann
57f442207d bpf: add BPF_J{LT,LE,SLT,SLE} instructions
Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=),
BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that
particularly *JLT/*JLE counterparts involving immediates need
to be rewritten from e.g. X < [IMM] by swapping arguments into
[IMM] > X, meaning the immediate first is required to be loaded
into a register Y := [IMM], such that then we can compare with
Y > X. Note that the destination operand is always required to
be a register.

This has the downside of having unnecessarily increased register
pressure, meaning complex program would need to spill other
registers temporarily to stack in order to obtain an unused
register for the [IMM]. Loading to registers will thus also
affect state pruning since we need to account for that register
use and potentially those registers that had to be spilled/filled
again. As a consequence slightly more stack space might have
been used due to spilling, and BPF programs are a bit longer
due to extra code involving the register load and potentially
required spill/fills.

Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=)
counterparts to the eBPF instruction set. Modifying LLVM to
remove the NegateCC() workaround in a PoC patch at [1] and
allowing it to also emit the new instructions resulted in
cilium's BPF programs that are injected into the fast-path to
have a reduced program length in the range of 2-3% (e.g.
accumulated main and tail call sections from one of the object
file reduced from 4864 to 4729 insns), reduced complexity in
the range of 10-30% (e.g. accumulated sections reduced in one
of the cases from 116432 to 88428 insns), and reduced stack
usage in the range of 1-5% (e.g. accumulated sections from one
of the object files reduced from 824 to 784b).

The modification for LLVM will be incorporated in a backwards
compatible way. Plan is for LLVM to have i) a target specific
option to offer a possibility to explicitly enable the extension
by the user (as we have with -m target specific extensions today
for various CPU insns), and ii) have the kernel checked for
presence of the extensions and enable them transparently when
the user is selecting more aggressive options such as -march=native
in a bpf target context. (Other frontends generating BPF byte
code, e.g. ply can probe the kernel directly for its code
generation.)

  [1] https://github.com/borkmann/llvm/tree/bpf-insns

Change-Id: Ic56500aaeaf5f3ebdfda094ad6ef4666c82e18c5
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-02-23 15:53:13 +00:00
Alexei Starovoitov
2c179ea4b2 bpf: free up BPF_JMP | BPF_CALL | BPF_X opcode
free up BPF_JMP | BPF_CALL | BPF_X opcode to be used by actual
indirect call by register and use kernel internal opcode to
mark call instruction into bpf_tail_call() helper.

Change-Id: I1a45b8e3c13848c9689ce288d4862935ede97fa7
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-02-23 15:53:11 +00:00
Daniel Borkmann
8e9ce5d806 bpf: remove stubs for cBPF from arch code
Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
that a single __weak function in the core that can be overridden
similarly to the eBPF one. Also remove stale pr_err() mentions
of bpf_jit_compile.

Change-Id: Iac221c09e9ae0879acdd7064d710c4f7cb8f478d
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-02-23 15:53:10 +00:00
Miklos Szeredi
dd1f74b28f libfs: support RENAME_NOREPLACE in simple_rename()
This is trivial to do:

 - add flags argument to simple_rename()
 - check if flags doesn't have any other than RENAME_NOREPLACE
 - assign simple_rename() to .rename2 instead of .rename

Filesystems converted:

hugetlbfs, ramfs, bpf.

Debugfs uses simple_rename() to implement debugfs_rename(), which is for
debugfs instances to rename files internally, not for userspace filesystem
access.  For this case pass zero flags to simple_rename().

Change-Id: I1a46ece3b40b05c9f18fd13b98062d2a959b76a0
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-02-23 15:50:02 +00:00
Arjan van de Ven
f90e33ca7c kernel: time: reduce ntp wakeups
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: YousefAlgadri <yusufgadrie@gmail.com>
Signed-off-by: Yousef Algadri <yusufgadrie@gmail.com>
Signed-off-by: iusmac <iusico.maxim@libero.it>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-01-31 06:20:30 +00:00
Nitesh Kataria
1955a76cc9 async: Run Hi Priority WQ when system is booting
Changes to run high priority workqueue when system
is booting. This will help to reduce the kernel boot time.

Change-Id: Ibecc2c746016268f5cbf4f4f6f876114050beb54
Signed-off-by: Vivek Kumar <vivekuma@codeaurora.org>
Signed-off-by: ankusa <ankusa@codeaurora.org>
Signed-off-by: Nitesh Kataria <nkataria@codeaurora.org>
Signed-off-by: utsavbalar1231 <utsavbalar1231@gmail.com>
Signed-off-by: Dakkshesh <dakkshesh5@gmail.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-01-17 15:38:21 +00:00
Paul Walmsley
64ce80af52 sched: reinitialize rq->next_balance when a CPU is hot-added
Reinitialize rq->next_balance when a CPU is hot-added.  Otherwise,
scheduler domain rebalancing may be skipped if rq->next_balance was
set to a future time when the CPU was last active, and the
newly-re-added CPU is in idle_balance().  As a result, the
newly-re-added CPU will remain idle with no tasks scheduled until the
softlockup watchdog runs - potentially 4 seconds later.  This can
waste energy and reduce performance.

This behavior can be observed in some SoC kernels, which use CPU
hotplug to dynamically remove and add CPUs in response to load.  In
one case that triggered this behavior,

0. the system started with all cores enabled, running multi-threaded
   CPU-bound code;

1. the system entered some single-threaded code;

2. a CPU went idle and was hot-removed;

3. the system started executing a multi-threaded CPU-bound task;

4. the CPU from event 2 was re-added, to respond to the load.

The time interval between events 2 and 4 was approximately 300
milliseconds.

Of course, ideally CPU hotplug would not be used in this manner,
but this patch does appear to fix a real bug.

Nvidia folks: this patch is submitted as at least a partial fix for
bug 1243368 ("[sched] Load-balancing not happening correctly after
cores brought online")

Change-Id: Iabac21e110402bb581b7db40c42babc951d378d0
Signed-off-by: Paul Walmsley <pwalmsley@nvidia.com>
Cc: Peter Boonstoppel <pboonstoppel@nvidia.com>
Reviewed-on: http://git-master/r/206918
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Amit Kamath <akamath@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
Reviewed-by: Diwakar Tundlam <dtundlam@nvidia.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-01-07 19:09:39 +00:00
Gaurav Kohli
0af6983f54 kthread/smpboot: Disable irq while setting smpboot thread as running
While setting smpboot thread state as running, there is possibility
of irq may fire at same core and wake up the smpboot thread of same
core which create self deadlock. To avoid that, protect the
same with spin_lock_irqsave.

Change-Id: I5eca9b27af94fee22af3bb201f26b63ed8930efe
Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-01-07 19:07:56 +00:00
Abhijeet Dharmapurikar
5ad714c846 genirq: implement read_irq_line for interrupt lines
Some drivers need to know what the status of the interrupt line is.
This is especially true for drivers that register a handler with
IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING and in the handler they
need to know which edge transition it was invoked for. Provide a way
for these handlers to read the logical status of the line after their
handler was invoked. If the line reads high it was called for a
rising edge and if the line reads low it was called for a falling edge.

The irq_read_line callback in the chip allows the controller to provide
the real time status of this line. Controllers that can read the status
of an interrupt line should implement this by doing necessary
hardware reads and return the logical state of the line.

Interrupt controllers based on the slow bus architecture should conduct
the transaction in this callback. The genirq code will call the chip's
bus lock prior to calling irq_read_line. Obviously since the transaction
would be completed before returning from irq_read_line it need not do
any transactions in the bus unlock call.

Change-Id: I3c8746706530bba14a373c671d22ee963b84dfab
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-01-07 19:06:46 +00:00
Joseph Lo
cd73afaaa1 CHROMIUM: PM / QoS: add min/max online cpus as PM QoS parameter
Adding min/max online cpus as PM QoS parameter

Based on work by:
Alex Frid <afrid@nvidia.com>
Gaurav Sarode <gsarode@nvidia.com>

BUG=None
TEST=tested on Dalmore and Venice2

Change-Id: I85593ae07861ea15aff588699a549518165ba043
Signed-off-by: Joseph Lo <josephl@nvidia.com>
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/174695
Reviewed-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-01-07 19:06:46 +00:00
Mukesh Ojha
76fd0a8cc5 time: Fix extra sleeptime injection when suspend fails
Currently, there exists a corner case assuming when there is
only one clocksource e.g RTC, and system failed to go to
suspend mode. While resume rtc_resume() injects the sleeptime
as timekeeping_rtc_skipresume() returned 'false' (default value
of sleeptime_injected) due to which we can see mismatch in
timestamps.

This issue can also come in a system where more than one
clocksource are present and very first suspend fails.

Success case:
------------
                                        {sleeptime_injected=false}
rtc_suspend() => timekeeping_suspend() => timekeeping_resume() =>

(sleeptime injected)
 rtc_resume()

Failure case:
------------
         {failure in sleep path} {sleeptime_injected=false}
rtc_suspend()     =>          rtc_resume()

{sleeptime injected again which was not required as the suspend failed}

Fix this by handling the boolean logic properly.

Change-Id: I7ac5210ec326b41f4d36bb87209b667f21f3aa50
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Git-commit: f473e5f467f6049370575390b08dc42131315d60
Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-01-07 19:06:32 +00:00
Tyler Nijmeh
b25836ebb5 sched: Do not reduce perceived CPU capacity while idle
CPUs that are idle are excellent candidates for latency sensitive or
high-performance tasks. Decrementing their capacity while they are idle
will result in these CPUs being chosen less, and they will prefer to
schedule smaller tasks instead of large ones. Disable this.

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
Signed-off-by: clarencelol <clarencekuiek@icloud.com>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-01-07 19:05:13 +00:00
Tyler Nijmeh
93b43c69ad sched: Enable NEXT_BUDDY for better cache locality
By scheduling the last woken task first, we can increase cache locality
since that task is likely to touch the same data as before.

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
Signed-off-by: clarencelol <clarencekuiek@icloud.com>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-01-07 19:04:27 +00:00
Tyler Nijmeh
4a86bd1b7b cpufreq: schedutil: Enforce realtime priority
Even the interactive governor utilizes a realtime priority. It is
beneficial for schedutil to process it's workload at a >= priority
than mundane tasks (KGSL/AUDIO/ETC).

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
Signed-off-by: clarencelol <clarencekuiek@icloud.com>
Signed-off-by: Tashfin Shakeer Rhythm <tashfinshakeerrhythm@gmail.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2024-01-07 19:04:06 +00:00
Pavankumar Kondeti
b2eac7f4bd sched/tune: Increase the cgroup limit to 6
The schedtune cgroup controller allows upto 5 cgroups including the
default/root cgroup. Until now the user space is creating only
4 additional cgroups namely, foreground, background, top-app and
audio-app. Recently another cgroup called rt is created before
the audio-app cgroup. Since kernel limits the cgroups to 5, the
creation of audio-app cgroup is failing. Fix this by increasing
the schedtune cgroup controller cgroup limit to 6.

Change-Id: I13252a90dba9b8010324eda29b8901cb0b20bc21
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 16:47:57 +00:00
Josh Choo
987ff25a8f sched: Turn on MIN_CAPACITY_CAPPING feature
Inform scheduler about capacity restrictions, such as during frequency
boosting.

Change-Id: Ic65bede69608acf8ca3f144f144049a4392a70f6
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:11 +00:00
Joel Fernandes
235feafa88 FROMLIST: sched: Make iowait_boost optional in schedutil
We should apply the iowait boost only if cpufreq policy has iowait boost
enabled. Also make it a schedutil configuration from sysfs so it can be
turned on/off if needed (by default initialize it to the policy value).

For systems that don't need/want it enabled, such as those on arm64
based mobile devices that are battery operated, it saves energy when the
cpufreq driver policy doesn't have it enabled (details below):

Here are some results for energy measurements collected running a
YouTube video for 30 seconds:
Before: 8.042533 mWh
After: 7.948377 mWh
Energy savings is ~1.2%

Bug: 38010527
Link: https://lkml.org/lkml/2017/5/19/42
Change-Id: If124076ad0c16ade369253840dedfbf870aff927
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:10 +00:00
Josh Choo
d62729035a sched/fair: Add bias schedtune boosted tasks sched feature
Schedtune boosted tasks are biased to higher capacity CPUs by default.
Add a sched feature to enable/disable this behaviour.

Change-Id: I3500675c182f3929e893dbb33850fe033db6c146
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:10 +00:00
joshuous
5919b22cc5 sched/boost: Update functions for newer Dynamic Schedtune Boost changes
We now need to pass the functions a boost slot argument. Also we rename
the functions to reflect that we intend to perform sched_boost.

Change-Id: I84a63aea2c9035267095762804efabf7be6c66d5
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:08 +00:00
joshuous
8dcd76d547 sched/tune: Switch Dynamic Schedtune Boost to a slot-based tracking system
Switch from a counter-based system to a slot-based system for managing
multiple dynamic Schedtune boost requests.

The primary limitations of the counter-based system was that it could
only keep track of two boost values at a time: the current dynamic boost
value and the default boost value. When more than one boost request is
issued, the system would only remember the highest value of them all.
Even if the task that requested the highest value had unboosted, this
value is still maintained as long as there are other active boosts that
are still running. A more ideal outcome would be for the system to
unboost to the maximum boost value of the remaining active boosts.

The slot-based system provides a solution to the problem by keeping
track of the boost values of all ongoing active boosts. It ensures that
the current boost value will be equal to the maximum boost value of
all ongoing active boosts. This is achieved with two linked lists
(active_boost_slots and available_boost_slots), which assign and keep
track of boost slot numbers for each successful boost request. The boost
value of each request is stored in an array (slot_boost[]), at an index
value equal to the assigned boost slot number.

For now we limit the number of active boost slots to 5 per Schedtune
group.

Change-Id: Iadc738fc919af092fd4c1b6312becf9567bc4c62
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:07 +00:00
joshuous
183063591f sched/stune: Rename stune_boost() to do_stune_sched_boost()
To reflect that the function is to be used mainly with CAF's devices
that have sched_boost. However, developers may use it as a switch to
dynamically boost schedtune to the values specified in
/dev/stune/*/schedtune.sched_boost.

Change-Id: I5012273e5572c6091a99a6954452bed3a2501c55
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:07 +00:00
joshuous
9563a62aad sched/tune: Rename dynamic_boost parameter to sched_boost
This was confusing to deal with given that it had the same name as the
Dynamic Schedtune Boost framework. It will be more apt to call it
sched_boost given that it was created to work with the sched_boost
feature in CAF devices.

The new tunable can be found in /dev/stune/*/schedtune.sched_boost

Change-Id: Iafa3e35ef7c7991f09595ba452d8050ddc694743
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:07 +00:00
joshuous
26302c604c sched/tune: Track active boosts on a per-Schedtune basis
It does not make sense to be unable to reset Schedtune boost for a
particular Schedtune group if another Schedtune group's boost is still
active. Instead of using a global count, we should use a per-Schedtune
group count to keep track of active boosts taking place.

Change-Id: Ic47ccd2582dbb31aa245a13d301ddf538b0d318b
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:06 +00:00
joshuous
ab9f0e7c6b sched/tune: Reset Dynamic Schedtune Boost only if no more boosts running
We will need to take care to ensure that every do_stune_boost() we call
is followed eventually by a reset_stune_boost() so that
stune_boost_count is managed correctly.

This allows us to stack several Dynamic Schedtune Boosts and reset only
when all Dynamic Schedtune Boosts have been disengaged.

Change-Id: I09b739e4503930eaf0e3f14870758b21ce9868f5
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:06 +00:00
joshuous
f621bfbd67 sched/boost: Perform SchedTune boosting when sched_boost is triggered
Boost top-app SchedTune tasks using the dynamic_boost value when
/proc/sys/kernel/sched_boost is activated. This is usually triggered by
CAF's perf daemon.

Change-Id: I23f0e7822673230288fbaeda0a7f4aa8546bf7d3
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:05 +00:00
joshuous
3ac5e76c78 sched/boost: Re-introduce sched_boost proc from HMP
We will use this in conjunction with CAF's perf daemon to somewhat
replicate core_ctl's sched_boost capabilities.

Credits to the developers at Codeaurora for the code.

Change-Id: Ifc4f76e02eed97ac2c5fc8c9a60e56c09aed6578
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:05 +00:00
joshuous
c634e8d7c4 sched/tune: Introduce stune_boost() function
Add a simple function to activate Dynamic Schedtune Boost and use the
dynamic_boost value of the SchedTune CGroup.

Change-Id: I106c1ad169419a575df400fc511b4be046b52152
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:04 +00:00
joshuous
ca4dc5a69c sched/tune: Refactor do_stune_boost()
For added flexibility and in preparation for introducing another function.

Change-Id: Ic95ba54e1549b0b70222c82a5ee1e164340e3258
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:04 +00:00
joshuous
e04e99e553 sched/tune: Create dynamic_boost SchedTune parameter
Change-Id: I89b4b1cd9cbb6820e1bce4626ce64d5dcde8b975
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:04 +00:00
joshuous
e091cd03a9 sched/tune: Rename dynamic_boost_write() to dynamic_boost()
This is to reduce confusion when we create a new dynamic_boost_write()
function in future patches.

Change-Id: I0cef57875a193034ce4a7dab6769449c9c0cda8a
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:03 +00:00
joshuous
2d96c4126c sched/tune: Add initial support for Dynamic SchedTune Boost
Provide functions to activate and reset SchedTune boost:

int do_stune_boost(char *st_name, int boost);
int reset_stune_boost(char *st_name);

Change-Id: Id3f93a63b7a94a08b124cb304bc0ffe9cc889d7a
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:02 +00:00
Joel Fernandes
1773704ee3 sched/fair: Fix issue where frequency update not skipped
This patch fixes one of the infrequent conditions in
commit 54b6baeca500 ("sched/fair: Skip frequency updates if CPU about to idle")
where we could have skipped a frequency update. The fix is to use the
correct flag which skips freq updates.

Note that this is a rare issue (can show up only during CFS throttling)
and even then we just do an additional frequency update which we were
doing anyway before the above patch.

Bug: 64689959

Change-Id: I0117442f395cea932ad56617065151bdeb9a3b53
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:02 +00:00
Chris Redpath
01cf171fb3 ANDROID: Move schedtune en/dequeue before schedutil update triggers
CPU rq util updates happen when rq signals are updated as part of
enqueue and dequeue operations. Doing these updates triggers a call to
the registered util update handler, which takes schedtune boosting
into account. Enqueueing the task in the correct schedtune group after
this happens means that we will potentially not see the boost for an
entire throttle period.

Move the enqueue/dequeue operations for schedtune before the signal
updates which can trigger OPP changes.

Change-Id: I4236e6b194bc5daad32ff33067d4be1987996780
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:02 +00:00
Joel Fernandes
73aaf0fe17 sched/fair: Skip frequency updates if CPU about to idle
If CPU is about to idle, prevent a frequency update. With the number of
schedutil governor wake ups are reduced by more than half on a test
playing bluetooth audio.

Test: sugov wake ups drop by more than half when playing music with
screen off (476 / 1092)

Bug: 64689959

Change-Id: I400026557b4134c0ac77f51c79610a96eb985b4a
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:01 +00:00
Josh Choo
da2d1461ab sched: Add stub function for core_ctl_set_boost
Needed to load the stock qcacld kernel module.

Change-Id: I6eae01471efc53874d1481c97e29894c2443412c
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:01 +00:00
Josh Choo
27a7fa13e9 sched: Add stub functions for wake_up_idle API
Needed to load the stock qcacld kernel module.

Change-Id: I9d63a81699ab498757dfd6dd8ee0e304a0d9b472
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:48:01 +00:00
Joonwoo Park
6e3caa829c sched: energy: calculate and update CPU capacity dynamically
One SoC can have multiple CPU speedbins which cannot be represented
with current energy model due to fixed capacity per CPU frequency
steps.

Provide CPU's all possible frequency steps instead of capacities along
with corresponding energy costs to be able to support different
speedbins.

Change-Id: I96ff01372da5c383cd3172999ea1dcf95a7862ce
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: therootlord <igor_cestari@hotmail.com>
[kdrag0n: added missing sched_feat(ENERGY_AWARE) check]
Signed-off-by: kdrag0n <dragon@khronodragon.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:47:59 +00:00
Ionela Voinescu
2ba56d6d43 ANDROID: sched/fair: add idle state filter to prefer_idle case
The CPU selection process for a prefer_idle task either minimizes or
maximizes the CPU capacity for idle CPUs depending on the task being
boosted or not.

Given that we are iterating through all CPUs, additionally filter the
choice by preferring a CPU in a more shallow idle state. This will
provide both a faster wake-up for the task and higher energy efficiency,
by allowing CPUs in deeper idle states to remain idle.

Change-Id: Ic55f727a0c551adc0af8e6ee03de6a41337a571b
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:47:58 +00:00
Pavankumar Kondeti
9eb7c227d8 ANDROID: sched/fair: fix CPU selection for non latency sensitive tasks
The Non latency sensitive tasks CPU selection targets for an active
CPU in the little cluster. The shallowest c-state CPU is stored as
a backup. However if all CPUs in the little cluster are idle, we pick
an active CPU in the BIG cluster as the target CPU. This incorrect
choice of the target CPU may not get corrected by the
select_energy_cpu_idx() depending on the energy difference between
previous CPU and target CPU.

This can be fixed easily by maintaining the same variable that tracks
maximum capacity of the traversed CPU for both idle and active CPUs.

Change-Id: I3efb8bc82ff005383163921ef2bd39fcac4589ad
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:47:58 +00:00
Ionela Voinescu
0fecaf304a ANDROID: sched/fair: unify spare capacity calculation
Given that we have a few sites where the spare capacity of a CPU is
calculated as the difference between the original capacity of the CPU
and its computed new utilization, let's unify the calculation and use
that value tracked with a local spare_cap variable.

Change-Id: I78daece7543f78d4f74edbee5e9ceb62908af507
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:47:58 +00:00
Ionela Voinescu
461c0cf4d2 ANDROID:sched/fair: prefer energy efficient CPUs for !prefer_idle tasks
For !prefer_idle tasks we want to minimize capacity_orig to bias their
scheduling towards more energy efficient CPUs. This does not happen in
the current code for boosted tasks due the order of CPUs considered
(from big CPUs to LITTLE CPUs), and to the shallow idle state and
spare capacity maximization filters, which are used to select the best
idle backup CPU and the best active CPU candidates.

Let's fix this by enabling the above filters only when we are within
same capacity CPUs.

Taking in part each of the two cases:
 1. Selection of a backup idle CPU - Non prefer_idle tasks should prefer
    more energy efficient CPUs when there are idle CPUs in the system,
    independent of the order given by the presence of a boosted margin.
    This is the behavior for !sysctl_sched_cstate_aware and this should
    be the behaviour for when sysctl_sched_cstate_aware is set as well,
    given that we should prefer a more efficient CPU even if it's in a
    deeper idle state.

 2. Selection of an active target CPU: There is no reason for boosted
    tasks to benefit from a higher chance to be placed on big CPU which
    is provided by ordering CPUs from bigs to littles.
    The other mechanism in place set for boosted tasks (making sure we
    select a CPU that fits the task) is enough for a non latency
    sensitive case. Also, by choosing a CPU with maximum spare capacity
    we also cover the preference towards spreading tasks, rather than
    packing them, which improves the chances for tasks to get better
    performance due to potential reduced preemption. Therefore, prefer
    more energy efficient CPUs and only consider spare capacity for CPUs
    with equal capacity_orig.

Change-Id: I3b97010e682674420015e771f0717192444a63a2
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reported-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:47:57 +00:00
Ionela Voinescu
b81d92370b ANDROID: sched/fair: remove order from CPU selection
find_best_target is currently split into code handling latency sensitive
tasks and code handling non-latency sensitive tasks based on the value
of the prefer_idle flag.
Another differentiation is done for boosted tasks, preferring to start
with higher-capacity CPU when boosted, and with more efficient CPUs when
not boosted. This additional differentiation is obtained by imposing an
order when considering CPUs for selection. This order is determined in
typical big.LITTLE systems by the start point (the CPU with the maximum
or minimum capacity) and by the order of big and little CPU groups
provided in the sched domain hierarchy.

However, it's not guaranteed that the sched domain hierarchy will give
us a sorted list of CPU groups based on their maximum capacities when
dealing with systems with more than 2 capacity groups.
For example, if we consider a system with three groups of CPUs (LITTLEs,
mediums, bigs), the sched domain hierarchy might provide the following
scheduling groups ordering for a prefer_idle-boosted task:
   big CPUs -> LITTLE CPUs -> medium CPUs.
If the big CPUs are not idle, but there are a few LITTLEs and mediums
as idle CPUs, by returning the first idle CPU, we will be incorrectly
prefering a lower capacity CPU over a higher capacity CPU.

In order to eliminate this reliance on assuming sched groups are ordered
by capacity, let's:
1. Iterate though all candidate CPUs for all cases.
2. Minimise or maximise the capacity of the considered CPU, depending
on prefer_idle and boost information.

Taking in part each of the four possible cases we analyse the
implementation and impact of this solution:

1. prefer_idle and boosted
This type of tasks needs to favour the selection of a reserved idle CPU,
and thus we still start from the biggest CPU in the system, but we
iterate though all CPUs as to correctly handle the example above by
maximising the capacity of the idle CPU we select. When all CPUs are
active, we already iterate though all CPUs and we're able to maximise
spare capacity or minimise utilisation for the considered target or
backup CPU.

2. prefer_idle and !boosted
For these tasks we prefer the selection of a more energy efficient CPU
and therefore we start from the smallest CPUs in the system, but we
iterate through all the CPUs as to select the most energy efficient idle
CPU, implementation which mimics existing behaviour. When all CPUs are
active, we already iterate though all CPUs and we're able to
maximise spare capacity or minimise utilisation for the considered
target or backup CPU.

3. !prefer_idle and boosted and
4. !prefer_idle and !boosted
For these tasks we already iterate though all CPUs and we're able to
maximise the energy efficiency of the selected CPU.

Change-Id: I940399e22eff29453cba0e2ec52a03b17eec12ae
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:47:57 +00:00
Pavankumar Kondeti
8356807fbc ANDROID: sched: WALT: Add support for CFS_BANDWIDTH
cumulative runnable average is maintained in cfs_rq along with
rq so that when a cfs_rq is throttled/unthrottled, the contribution
of that cfs_rq can be updated at the rq level. Implement the
fixup_cumulative_runnable_avg callback for fair class to handle
the cfs_rq cumulative runnable average updates when the runnable
tasks demand is changed.

Bug: 139071966
Change-Id: Iccd473677cf491920aa82a6fc7e0a5374e5bb27f
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: GhostMaster69-dev <rathore6375@gmail.com>
2023-12-29 14:47:56 +00:00