DISCLAIMER:
=====================================================================
This patch is intended to go upstream after collecting feedback from
Android community that it resolves the issues reported by various
partners. It is not meant to be merged into android-mainline.
=====================================================================
uclamp_max effectiveness could be easily impacted by small transient
tasks that wake up frequency to do small work then go back to sleep.
If there's a busy task that is capped by uclamp_max to run at a smaller
frequency, due to max-aggregation rule tasks that wake up on the same
cpu will increase the rq->uclamp_max value if they were higher than the
capped task. Given that all tasks by default have a uclamp_max = 1024,
this is the likely case by default.
Note that since the capped task is likely to be a busy and throttled
one, its util, and hence the rq->util, will be very high and as soon as
we lift the capping the requested frequency will be very high.
To address this issue of increasing the resilience of uclamp_max against
these transient tasks that don't really need to run at a higher
frequency, we implement a simple filter mechanism to ignore uclamp_max
for those tasks.
The algorithm looks at the runtime of the task and compares it to
sched_slice(). By default we assume any task that its runtime is 1/4th
of sched_slice() or less is a small transient task that we can ignore
its uclamp_max requirement.
runtime < sched_slice() / divider
We can tweak the divider by
/proc/sys/kernel/sched_util_uclamp_max_filter_divider sysctl. It accepts
values 0-4.
divider = 1 << sched_util_uclamp_max_filter_divider
We add a new task_tick_uclamp() function to verify this condition
periodically and ensure the conditions checked at wake up are still true
- in case this transient task suddenly becomes a busy one.
For EAS, we can't use sched_slice() there to figure out if uclamp_max
will be ignored because the task is not enqueued yet. So we leave it
as-is to figure out the placement based on worst case scenario.
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: Ie3afa93a7d70dab5b7c22e820cc078ffd0e891ef
[yaro: ported to msm-5.4 and remove sysctl parts for now]
Signed-off-by: Yaroslav Furman <yaro330@gmail.com>
RT tasks by default run at the highest capacity/performance level. When
uclamp is selected this default behavior is retained by enforcing the
requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be
uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
value.
This is also referred to as 'the default boost value of RT tasks'.
See commit 1a00d999971c ("sched/uclamp: Set default clamps for RT tasks").
On battery powered devices, it is desired to control this default
(currently hardcoded) behavior at runtime to reduce energy consumed by
RT tasks.
For example, a mobile device manufacturer where big.LITTLE architecture
is dominant, the performance of the little cores varies across SoCs, and
on high end ones the big cores could be too power hungry.
Given the diversity of SoCs, the new knob allows manufactures to tune
the best performance/power for RT tasks for the particular hardware they
run on.
They could opt to further tune the value when the user selects
a different power saving mode or when the device is actively charging.
The runtime aspect of it further helps in creating a single kernel image
that can be run on multiple devices that require different tuning.
Keep in mind that a lot of RT tasks in the system are created by the
kernel. On Android for instance I can see over 50 RT tasks, only
a handful of which created by the Android framework.
To control the default behavior globally by system admins and device
integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default
to change the default boost value of the RT tasks.
I anticipate this to be mostly in the form of modifying the init script
of a particular device.
To avoid polluting the fast path with unnecessary code, the approach
taken is to synchronously do the update by traversing all the existing
tasks in the system. This could race with a concurrent fork(), which is
dealt with by introducing sched_post_fork() function which will ensure
the racy fork will get the right update applied.
Tested on Juno-r2 in combination with the RT capacity awareness [1].
By default an RT task will go to the highest capacity CPU and run at the
maximum frequency, which is particularly energy inefficient on high end
mobile devices because the biggest core[s] are 'huge' and power hungry.
With this patch the RT task can be controlled to run anywhere by
default, and doesn't cause the frequency to be maximum all the time.
Yet any task that really needs to be boosted can easily escape this
default behavior by modifying its requested uclamp.min value
(p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall.
[1] 804d402fb6f6: ("sched/rt: Make RT capacity-aware")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200716110347.19553-2-qais.yousef@arm.com
(cherry picked from commit 13685c4a08fca9dd76bf53bfcbadc044ab2a08cb)
Conflicts:
kernel/fork.c
kernel/sysctl.c
Upstream has commit 5a5cf5cb30d7 ("cgroup: refactor fork helpers") and
further commit ef2c41cf38a7 ("clone3: allow spawning processes into
cgroups") which affect the calls after this. Picking the first would
be easy but the 2nd would be much bigger. Also, my cherry-pick put my
sysctl in the wrong place in the table in sysctl.c, so I manually
moved it. Weird.
BUG=b:160171130
TEST=With series rt tasks don't get boosted
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Change-Id: I678d8ee899ecfbe0a1f0bb94da85d54fff924a57
Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2340433
Reviewed-by: Joel Fernandes <joelaf@google.com>
Tasks without a user-defined clamp value are considered not clamped
and by default their utilization can have any value in the
[0..SCHED_CAPACITY_SCALE] range.
Tasks with a user-defined clamp value are allowed to request any value
in that range, and the required clamp is unconditionally enforced.
However, a "System Management Software" could be interested in limiting
the range of clamp values allowed for all tasks.
Add a privileged interface to define a system default configuration via:
/proc/sys/kernel/sched_uclamp_util_{min,max}
which works as an unconditional clamp range restriction for all tasks.
With the default configuration, the full SCHED_CAPACITY_SCALE range of
values is allowed for each clamp index. Otherwise, the task-specific
clamp is capped by the corresponding system default value.
Do that by tracking, for each task, the "effective" clamp value and
bucket the task has been refcounted in at enqueue time. This
allows to lazy aggregate "requested" and "system default" values at
enqueue time and simplifies refcounting updates at dequeue time.
The cached bucket ids are used to avoid (relatively) more expensive
integer divisions every time a task is enqueued.
An active flag is used to report when the "effective" value is valid and
thus the task is actually refcounted in the corresponding rq's bucket.
Bug: 120440300
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190621084217.8167-5-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e8f14172c6b11e9a86c65532497087f8eb0f91b1)
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Change-Id: I4f014c5ec9c312aaad606518f6e205fd0cfbcaa2
Signed-off-by: Quentin Perret <qperret@google.com>
xNombre: Android modifies some scheduler parameters on boot.
Applying these manually resulted in better hackbench performance.
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>
To identify certain apps which request max cpu freq to affine its
tasks to specific cpus, besides checking its lib name, task name is
also a factor that we can identify the suspcious task.
Test: build and test the 'perfect kick 2' game.
Bug: 163293825
Bug: 161324271
Change-Id: I4359859db743b4c9122e9df40af0b109370e8f1f
Signed-off-by: Jimmy Shiu <jimmyshiu@google.com>
This change is for general scheduler improvement.
Change-Id: I50d41aa3338803cbd45ff6314b2bb3978c59282b
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Certain userspace applications, to achieve max performance, affines its
threads to cpus that run the fastest. This is not always the
correct strategy. For e.g. in certain architectures all the
cores have the same max freq but few of them have a bigger
cache. Affining to the cpus that have bigger cache is advantageous
but such an application would end up affining them to all the cores.
Similarly if an architecture has just one cpu that runs at max freq,
it ends up crowding all its thread on that single core, which is
detrimental for performance.
To address this issue, we need to detect a suspicious looking affinity
request from userspace and check if it links in a particular library.
The latter can easily be detected by traversing executable vm areas
that map a file and checking for that library name.
When such a affinity request is found, change it to use a proper
affinity. The suspicious affinity request, the proper affinity request
and the library name can be configured by the userspace.
Change-Id: I6bb8c310ca54c03261cc721f28dfd6023ab5591a
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
With the introduction of placement hint patch, boosted tasks will not
scheduled from big cores. We tune capacity margin to let important
boosted tasks get scheduled on big cores. However, the capacity margin
affects all group of tasks, so that non-boosted tasks get more chances
to be scheduled on big cores, too. This could be solved by separating
capacity margin for boosted tasks.
Bug: 147785606
Test: margin set correctly
Signed-off-by: Rick Yiu <rickyiu@google.com>
Change-Id: I2b02e138e36a6844afbc1ade60fe86a001814b30
Energy aware feature control is previously done through debugfs,
which will be deprecated, so move the control to sysctl.
Bug: 141333728
Test: function works as expected
Change-Id: I55411d3bb2669ba1fae3225d67cdf1cf8b3b3a7f
Signed-off-by: Rick Yiu <rickyiu@google.com>
If sysctl_sched_prefer_spread is enabled, then tasks would be freely
migrated to idle cpus within same cluster to reduce runnables.
By default, the feature is disabled.
User can trigger feature with:
echo 1 > /proc/sys/kernel/sched_prefer_spread
Aggressively spread tasks with in little cluster.
echo 2 > /proc/sys/kernel/sched_prefer_spread
Aggressively spread tasks with in little cluster as well as
big cluster, but not between big and little.
Change-Id: I0a4d87bd17de3525548765472e6f388a9970f13c
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
[render: minor fixups]
Signed-off-by: Zachariah Kennedy <zkennedy87@gmail.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
This change is for general scheduler improvement.
Change-Id: I5fbadf248c0bfe27bc761686de7a925cec2e4163
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
Signed-off-by: Sai Harshini Nimmala <snimmala@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: I33e9ec890f8b54d673770d5d02dba489a8e08ce7
Signed-off-by: Sai Harshini Nimmala <snimmala@codeaurora.org>
Currently, sched updown migration handler derives cluster topology
based on arch topology, the cluster information is already populated
in walt sched_cluster. So reuse it instead of deriving it again.
And move updown tunables support to under WALT.
Change-Id: Iddf4d18ddf75cc20637281d9889f671f42369513
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
[render: minor fixup]
Signed-off-by: Zachariah Kennedy <zkennedy87@gmail.com>
This change is for general scheduler improvement.
Change-Id: I8459bcf7b412a5f301566054c28c910567548485
Signed-off-by: Sai Harshini Nimmala <snimmala@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: Ida39a3ee5e6b4b0d3255bfef95601890afd80709
Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: I8ff4768d56d8e63b2cfa78e5f34cb156ee60e3da
Signed-off-by: Amir Vajid <avajid@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: I737751f065df6a5ed3093e3bda5e48750a14e4c9
Signed-off-by: Amir Vajid <avajid@codeaurora.org>
The current code is using a bitmap for sched_busy_hysteresis_enable_cpus
tunable. Since the feature is not enabled for any of the CPUs, the
default value is printed as "\n". This is very inconvenient for
user space applications which tries to write a new value and try to
restore the previous value. Like other scheduler tunables, use
the bitmask to avoid this issue.
Change-Id: I0c5989606352be5382dd688602aefd753fb62317
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Currently sched busy hysteresis feature is applied only for
CPUs other than the min capacity CPUs. This policy restricts
the flexibility on a system with more than 2 clusters. Add a
tunable to specify which CPUs needs this feature. By default,
the feature is turned off for all the CPUs.
The usage of this tunable:
echo 4-7 > /proc/sys/kernel/sched_busy_hysteresis_enable_cpus
Change-Id: I636575af2c42e2774007582f3d589495c6a3a9f1
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: I310bbdc19bb65a0c562ec6a208f2da713eba954d
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[render: minor fixups]
Signed-off-by: Zachariah Kennedy <zkennedy87@gmail.com>
This change is for general scheduler improvement.
Change-Id: I7d794ad1be10a6811602fabb388facd39c8f3c53
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Zachariah Kennedy <zkennedy87@gmail.com>
This change is for general scheduler improvement.
Change-Id: I18364c6061ed7525755aaf187bf15a8cb9b54a8a
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
[render: minor fixup]
Signed-off-by: Zachariah Kennedy <zkennedy87@gmail.com>
This change is for general scheduler improvement.
Change-Id: Idef278a9551e6d7d3c1a945dcfd8804cbc7d6aff
Signed-off-by: Puja Gupta <pujag@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: I5d89acdde73f5379d68ebc8513d0bbeaac128f5d
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Signed-off-by: Jonathan Avila <avilaj@codeaurora.org>
[render: minor fixups]
Signed-off-by: Zachariah Kennedy <zkennedy87@gmail.com>
This change is for general scheduler improvement.
Change-Id: Ib963aef88d85e15fcd19cda3d3f0944b530239ab
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: Ie37ab752a4d69569bce506b0a12715bb45ece79e
Co-developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
When enabling PROVE_LOCKING, kernel build fails due to undefined symbol
error for the irqsoff_tracing_threshold_ns which is used in sysctl
interface,sysctl_irqsoff_tracing_threshold_ns. As per PREEMPTIRQ_EVENTS
Kconfig help, for tracing irq disable/enable events CONFIG_PROVE_LOCKING
should be disabled. This commit fixes by making the proc entry under
the PROVE_LOCKING config.
Change-Id: Ie28afd31013a9c393f32ad328cedfc0517867fc4
Signed-off-by: Yadu MG <ymg@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: If1ee58a8ed59e4a9ee25dfa6fa2a1c1654e00e6d
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: I9216f9316e2bad067c10762de8d67912826b7bc7
Signed-off-by: Maria Yu <aiquny@codeaurora.org>
Co-developed-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[pkondeti@codeaurora.org: skip_cpu argument is implemented for fbt]
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: I50d41aa3338803cbd45ff6314b2bb3978c59282b
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: Iba464885e8b2172f955cfba3bd6d55743d790b32
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
khungtask by default monitors all tasks for long unterruptible sleep.
This change introduces a sysctl option, /proc/sys/kernel/
hung_task_selective_monitoring, to enable monitoring selected tasks.
If this sysctl option is enabled then only the tasks with
/proc/$PID/hang_detection_enabled set are to be monitored,
otherwise all tasks are monitored as default case.
Some tasks may intentionally moves to uninterruptable sleep state,
which shouldn't leads to khungtask panics, as those are recoverable
hungs. So to avoid false hung reports, add an option to select tasks
to be monitored and report/panic them only.
By default, enable the feature always to monitor selected tasks.
Change-Id: I48cd8cfe73dbe2b577541fe9607190eac5556bb2
Signed-off-by: Imran Khan <kimran@codeaurora.org>
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
[sramana: Resolved minor merge conflict]
Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
Certain userspace applications, to achieve max performance, affines its
threads to cpus that run the fastest. This is not always the
correct strategy. For e.g. in certain architectures all the
cores have the same max freq but few of them have a bigger
cache. Affining to the cpus that have bigger cache is advantageous
but such an application would end up affining them to all the cores.
Similarly if an architecture has just one cpu that runs at max freq,
it ends up crowding all its thread on that single core, which is
detrimental for performance.
To address this issue, we need to detect a suspicious looking affinity
request from userspace and check if it links in a particular library.
The latter can easily be detected by traversing executable vm areas
that map a file and checking for that library name.
When such a affinity request is found, change it to use a proper
affinity. The suspicious affinity request, the proper affinity request
and the library name can be configured by the userspace.
Change-Id: I6bb8c310ca54c03261cc721f28dfd6023ab5591a
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: Ie162a57537bb9ada66a4254d606e17d54b7a3a49
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[pkondeti@codeaurora.org: code refactoring and implemented freq to load
calculations.]
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: I01e6610bba2e8c66a628d6289eeed4e854264fdd
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
Signed-off-by: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
This change is for general scheduler improvement.
Change-Id: Ice980dde340bff8362b4f2adc679423d8f54e8e4
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
In case of multi cluster systems with varying capacity of CPUs,
need for migration of tasks would be different as task would need
to get migrated to CPU in adjacent cluster. This change adds
support to accept multiple values for the {up,down}_migrate knobs,
so that, tasks can be migrated to CPU in adjacent cluster.
Change-Id: I325cc71884d9bbac14475cd838a3955d53e03d1e
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
sched_boost_handler was introduced as part commit 3f083eada363e8f
("sched: Add snapshot of Window Assisted Load Tracking (WALT)"),
but, missed to add to sysctl.
Change-Id: Id6f5dcc076d074fa5665991dd074bdc9251c8255
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>