kernel_xiaomi_cepheus

Evolution-X-Devices/kernel_xiaomi_cepheus

Author	SHA1	Message	Date
John Galt	4d152a92f9	schedutil: checkout to 934c3511b5b53	2023-08-10 12:25:32 -05:00
Quentin Perret	fe7b1e0f76	BACKPORT: FROMGIT: sched: Skip priority checks with SCHED_FLAG_KEEP_PARAMS SCHED_FLAG_KEEP_PARAMS can be passed to sched_setattr to specify that the call must not touch scheduling parameters (nice or priority). This is particularly handy for uclamp when used in conjunction with SCHED_FLAG_KEEP_POLICY as that allows to issue a syscall that only impacts uclamp values. However, sched_setattr always checks whether the priorities and nice values passed in sched_attr are valid first, even if those never get used down the line. This is useless at best since userspace can trivially bypass this check to set the uclamp values by specifying low priorities. However, it is cumbersome to do so as there is no single expression of this that skips both RT and CFS checks at once. As such, userspace needs to query the task policy first with e.g. sched_getattr and then set sched_attr.sched_priority accordingly. This is racy and slower than a single call. As the priority and nice checks are useless when SCHED_FLAG_KEEP_PARAMS is specified, simply inherit them in this case to match the policy inheritance of SCHED_FLAG_KEEP_POLICY. Reported-by: Wei Wang <wvw@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Qais Yousef <qais.yousef@arm.com> Link: https://lore.kernel.org/r/20210805102154.590709-3-qperret@google.com Bug: 190237315 (cherry picked from commit f4dddf9 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core) Signed-off-by: Quentin Perret <qperret@google.com> Change-Id: Ifdbc9262b82c7f5c0d34952ece07770a53e3f6a5 [panchajanya1999: adapt for k4.14] Signed-off-by: Panchajanya1999 <panchajanya@azure-dev.live> Signed-off-by: mcdofrenchfreis <xyzevan@androidist.net>	2023-08-10 12:25:31 -05:00
Patrick Bellasi	8be6bdb00f	cpufreq: schedutil: Fix iowait boost reset A more energy efficient update of the IO wait boosting mechanism has been introduced in: commit `a5a0809` ("cpufreq: schedutil: Make iowait boost more energy efficient") where the boost value is expected to be: - doubled at each successive wakeup from IO staring from the minimum frequency supported by a CPU - reset when a CPU is not updated for more then one tick by either disabling the IO wait boost or resetting its value to the minimum frequency if this new update requires an IO boost. This approach is supposed to "ignore" boosting for sporadic wakeups from IO, while still getting the frequency boosted to the maximum to benefit long sequence of wakeup from IO operations. However, these assumptions are not always satisfied. For example, when an IO boosted CPU enters idle for more the one tick and then wakes up after an IO wait, since in sugov_set_iowait_boost() we first check the IOWAIT flag, we keep doubling the iowait boost instead of restarting from the minimum frequency value. This misbehavior could happen mainly on non-shared frequency domains, thus defeating the energy efficiency optimization, but it can also happen on shared frequency domain systems. Let fix this issue in sugov_set_iowait_boost() by: - first check the IO wait boost reset conditions to eventually reset the boost value - then applying the correct IO boost value if required by the caller Fixes: `a5a0809` (cpufreq: schedutil: Make iowait boost more energy efficient) Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>	2023-08-10 12:25:31 -05:00
Vincent Donnefort	e37a5886fe	sched/pelt: Fix task util_est update filtering [ Upstream commit b89997aa88f0b07d8a6414c908af75062103b8c9 ] Being called for each dequeue, util_est reduces the number of its updates by filtering out when the EWMA signal is different from the task util_avg by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the decay from a previous high util_avg, EWMA might now be close enough to the new util_avg. No update would then happen while it would leave ue.enqueued with an out-of-date value. Taking into consideration the two util_est members, EWMA and enqueued for the filtering, ensures, for both, an up-to-date value. This is for now an issue only for the trace probe that might return the stale value. Functional-wise, it isn't a problem, as the value is always accessed through max(enqueued, ewma). This problem has been observed using LISA's UtilConvergence:test_means on the sd845c board. No regression observed with Hackbench on sd845c and Perf-bench sched pipe on hikey/hikey960. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210225165820.1377125-1-vincent.donnefort@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-08-10 12:25:31 -05:00
Satya Durga Srinivasu Prabhala	182fc3909a	sched/fair: honor uclamp restrictions in fbt() While calculating untilization of CPU during task placement in fbt(), current code doesn't take uclamp into account which would lead to selection of incorrect CPU for the task when uclamp restrictions are in place for the task. Change-Id: I8371affe3b37733d222e5c57953e53f91fc19a53 Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>	2023-08-10 12:25:30 -05:00
Shaleen Agrawal	c9510f56c8	sched: Enable latency sensitive feature Make use of the existing need_idle feature to incorporate upstream latency sensitive tasks. Change-Id: Ie1513187d024b93c8b619d9e0a35d84195488696 Signed-off-by: Shaleen Agrawal <shalagra@codeaurora.org>	2023-08-10 12:25:30 -05:00
John Galt	f0c098c6b4	sched/fair: further improve migration margins	2023-08-10 12:25:30 -05:00
John Galt	a8e5d7d98f	sched/fair: reset migration margins for balance Reapply: Even properly calculated from table there are significant efficiency regressions.	2023-08-10 12:25:29 -05:00
kondors1995	525bb7715c	Revert "sched/fair: Revert Google's capacity margin hacks" This reverts commit `e7c82d8c4d`.	2023-08-10 12:25:29 -05:00
Adam W. Willis	f792575a16	sched/energy: Checkout to branch android-4.14 of https://android.googlesource.com/kernel/common Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com>	2023-08-10 12:25:28 -05:00
Tyler Nijmeh	bc8e8d1549	sched: Increase the time a task is considered cache-hot According to RedHat, increasing this tunable reduces the number of task migrations. This should reduce time spent balancing tasks and increase per-task performance. https://www.redhat.com/files/summit/session-assets/2018/Performance-analysis-and-tuning-of-Red-Hat-Enterprise-Linux-Part-1.pdf Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>	2023-08-09 17:57:12 -05:00
Qais Yousef	fff0de1059	BACKPORT: sched/uclamp: Filter out uclamp_max for small tasks DISCLAIMER: ===================================================================== This patch is intended to go upstream after collecting feedback from Android community that it resolves the issues reported by various partners. It is not meant to be merged into android-mainline. ===================================================================== uclamp_max effectiveness could be easily impacted by small transient tasks that wake up frequency to do small work then go back to sleep. If there's a busy task that is capped by uclamp_max to run at a smaller frequency, due to max-aggregation rule tasks that wake up on the same cpu will increase the rq->uclamp_max value if they were higher than the capped task. Given that all tasks by default have a uclamp_max = 1024, this is the likely case by default. Note that since the capped task is likely to be a busy and throttled one, its util, and hence the rq->util, will be very high and as soon as we lift the capping the requested frequency will be very high. To address this issue of increasing the resilience of uclamp_max against these transient tasks that don't really need to run at a higher frequency, we implement a simple filter mechanism to ignore uclamp_max for those tasks. The algorithm looks at the runtime of the task and compares it to sched_slice(). By default we assume any task that its runtime is 1/4th of sched_slice() or less is a small transient task that we can ignore its uclamp_max requirement. runtime < sched_slice() / divider We can tweak the divider by /proc/sys/kernel/sched_util_uclamp_max_filter_divider sysctl. It accepts values 0-4. divider = 1 << sched_util_uclamp_max_filter_divider We add a new task_tick_uclamp() function to verify this condition periodically and ensure the conditions checked at wake up are still true - in case this transient task suddenly becomes a busy one. For EAS, we can't use sched_slice() there to figure out if uclamp_max will be ignored because the task is not enqueued yet. So we leave it as-is to figure out the placement based on worst case scenario. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Change-Id: Ie3afa93a7d70dab5b7c22e820cc078ffd0e891ef [yaro: ported to msm-5.4 and remove sysctl parts for now] Signed-off-by: Yaroslav Furman <yaro330@gmail.com>	2023-08-09 17:24:56 -05:00
Valentin Schneider	36ff8821a5	BACKPORT: sched/fair: Make task_fits_capacity() consider uclamp restrictions task_fits_capacity() drives CPU selection at wakeup time, and is also used to detect misfit tasks. Right now it does so by comparing task_util_est() with a CPU's capacity, but doesn't take into account uclamp restrictions. There's a few interesting uses that can come out of doing this. For instance, a low uclamp.max value could prevent certain tasks from being flagged as misfit tasks, so they could merrily remain on low-capacity CPUs. Similarly, a high uclamp.min value would steer tasks towards high capacity CPUs at wakeup (and, should that fail, later steered via misfit balancing), so such "boosted" tasks would favor CPUs of higher capacity. Introduce uclamp_task_util() and make task_fits_capacity() use it. [QP: fixed missing dependency on fits_capacity() by using the open coded alternative] Bug: 120440300 Tested-By: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Quentin Perret <qperret@google.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191211113851.24241-5-valentin.schneider@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit a7008c07a568278ed2763436404752a98004c7ff) Signed-off-by: Quentin Perret <qperret@google.com> Change-Id: Iabde2eda7252c3bcc273e61260a7a12a7de991b1	2023-08-09 17:24:53 -05:00
balgxmr	3fbf6a0d0e	Revert "Kernel/sched: Reduce latency for better responsiveness" This reverts commit `e94cab5ff1`.	2023-08-09 17:21:54 -05:00
balgxmr	5b5f97adea	Revert "Kernel/sched: Reduce Latency [Pafcholini]" This reverts commit `382c9d35cf`.	2023-08-09 17:21:50 -05:00
balgxmr	434f599332	Merge branch 'upstream-linux-4.14.y' of https://android.googlesource.com/kernel/common into rebase	2023-08-09 16:44:33 -05:00
Linus Torvalds	e8c280f3ab	sched_getaffinity: don't assume 'cpumask_size()' is fully initialized [ Upstream commit 6015b1aca1a233379625385feb01dd014aca60b5 ] The getaffinity() system call uses 'cpumask_size()' to decide how big the CPU mask is - so far so good. It is indeed the allocation size of a cpumask. But the code also assumes that the whole allocation is initialized without actually doing so itself. That's wrong, because we might have fixed-size allocations (making copying and clearing more efficient), but not all of it is then necessarily used if 'nr_cpu_ids' is smaller. Having checked other users of 'cpumask_size()', they all seem to be ok, either using it purely for the allocation size, or explicitly zeroing the cpumask before using the size in bytes to copy it. See for example the ublk_ctrl_get_queue_affinity() function that uses the proper 'zalloc_cpumask_var()' to make sure that the whole mask is cleared, whether the storage is on the stack or if it was an external allocation. Fix this by just zeroing the allocation before using it. Do the same for the compat version of sched_getaffinity(), which had the same logic. Also, for consistency, make sched_getaffinity() use 'cpumask_bits()' to access the bits. For a cpumask_var_t, it ends up being a pointer to the same data either way, but it's just a good idea to treat it like you would a 'cpumask_t'. The compat case already did that. Reported-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/lkml/7d026744-6bd6-6827-0471-b5e8eae0be3f@arm.com/ Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2023-04-05 11:14:19 +02:00
Vincent Guittot	6d26b74599	sched/fair: Sanitize vruntime of entity being migrated commit a53ce18cacb477dd0513c607f187d16f0fa96f71 upstream. Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed") fixes an overflowing bug, but ignore a case that se->exec_start is reset after a migration. For fixing this case, we delay the reset of se->exec_start after placing the entity which se->exec_start to detect long sleeping task. In order to take into account a possible divergence between the clock_task of 2 rqs, we increase the threshold to around 104 days. Fixes: 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed") Originally-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Zhang Qiao <zhangqiao22@huawei.com> Link: https://lore.kernel.org/r/20230317160810.107988-1-vincent.guittot@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-04-05 11:14:19 +02:00
Zhang Qiao	53ab79e0ba	sched/fair: sanitize vruntime of entity being placed commit 829c1651e9c4a6f78398d3e67651cef9bb6b42cc upstream. When a scheduling entity is placed onto cfs_rq, its vruntime is pulled to the base level (around cfs_rq->min_vruntime), so that the entity doesn't gain extra boost when placed backwards. However, if the entity being placed wasn't executed for a long time, its vruntime may get too far behind (e.g. while cfs_rq was executing a low-weight hog), which can inverse the vruntime comparison due to s64 overflow. This results in the entity being placed with its original vruntime way forwards, so that it will effectively never get to the cpu. To prevent that, ignore the vruntime of the entity being placed if it didn't execute for much longer than the characteristic sheduler time scale. [rkagan: formatted, adjusted commit log, comments, cutoff value] Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Co-developed-by: Roman Kagan <rkagan@amazon.de> Signed-off-by: Roman Kagan <rkagan@amazon.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-04-05 11:14:18 +02:00
Kees Cook	a83bcc5fc4	panic: Consolidate open-coded panic_on_warn checks commit 79cc1ba7badf9e7a12af99695a557e9ce27ee967 upstream. Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll their own warnings, and each check "panic_on_warn". Consolidate this into a single function so that future instrumentation can be added in a single location. Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Gow <davidgow@google.com> Cc: tangmeng <tangmeng@uniontech.com> Cc: Jann Horn <jannh@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Petr Mladek <pmladek@suse.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Link: https://lore.kernel.org/r/20221117234328.594699-4-keescook@chromium.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-02-06 07:46:34 +01:00
baalgx	a585a8d508	Merge remote-tracking branch 'soviet/13.0' into soviet-13	2022-11-05 17:05:43 -05:00
John Stultz	daa0edde11	ANDROID: sched: Fix off-by-one with cpupri MAX_RT_PRIO evaluation This patch addresses an issue seen where SCHED_FIFO prio 99 tasks were being woken up on a cpu where a long-running softirq was executing, and the RT task was not being migrated, causing long (10+ms wakeup latencies). Prior to upstream commit 934fc3314b39 ("sched/cpupri: Remap CPUPRI_NORMAL to MAX_RT_PRIO-1") the task->prio -> cpupri mapping is a little ackward. For RT tasks, its calculated as: cpupri = MAX_RT_PRIO - prio + 1; See: https://android.googlesource.com/kernel/common/+/refs/heads/android13-5.10/kernel/sched/cpupri.c#39 This is added ontop of the also ackward detail that the task->prio is inverted (RT prio99 -> 0), means the cpupri mapping for RT tasks goes from 2->101. This makes it easy to evaluate the cpupri incorrectly. Which it turns out happened In commit 3adfd8e344ac ("ANDROID: sched: avoid placing RT threads on cores handling softirqs"): `3adfd8e344`%5E%21/ With the lines: int task_pri = convert_prio(p->prio); bool drop_nopreempts = task_pri <= MAX_RT_PRIO; Where the added logic to decide to migrate a rt task off of a cpu depended on this drop_nopreempts being true. This works properly for rt tasks from prio 99 to 1, but for the case of task->prio == 0 (userland rt prio 99 tasks) it breaks, as the cpupri will be MAX_RT_PRIO - 0 + 1, which then gets checked as <= MAX_RT_PRIO. This prevents the cpu from being dropped from the scheduling set and prevents the rt user prio 99 task from migrating, which results in high priority rt tasks being left on cpus where long running softirqs are executing, causing long latencies. This patch fixes the off by one by changing the evaulation to MAX_RT_PRIO + 1, so that user-prio 99 tasks will also be migrated if appropriate. Luckilly this odd cpupri mapping has been fixed upstream, making this patch no longer necessary in 5.15 and newer kernels. Fixes: 3adfd8e344ac ("ANDROID: sched: avoid placing RT threads on cores handling softirqs") Signed-off-by: John Stultz <jstultz@google.com> Change-Id: Ia2db7cd461eb4c90f5850b791de1ae95582f7530	2022-10-17 12:37:07 +03:00
Hailong Liu	65284d78d9	psi: Fix trigger being fired unexpectedly at initial When a trigger being created, its win.start_value and win.start_time are reset to zero. If group->total[PSI_POLL][t->state] has accumulated before, this trigger will be fired unexpectedly in the next period, even if its growth time does not reach its threshold. So set the window of the new trigger to the current state value. Signed-off-by: Hailong Liu <liuhailong@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/r/1648789811-3788971-1-git-send-email-liuhailong@linux.alibaba.com Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-10-07 12:36:39 +03:00
Kees Cook	84e16102a1	treewide: kzalloc() -> kcalloc() The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kzalloc( - sizeof(u8) * COUNT + COUNT , ...) \| kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| kzalloc( - sizeof(char) * COUNT + COUNT , ...) \| kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) \| kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) \| kzalloc(sizeof(TYPE) * C2, ...) \| kzalloc(C1 * C2 * C3, ...) \| kzalloc(C1 * C2, ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) \| - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) \| - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> [fiqri19102002: Adapt to new extcon.c changes] Signed-off-by: Fiqri Ardyansyah <fiqri15072019@gmail.com> Conflicts: arch/arm64/mm/context.c pwnrazr: an upstream commit conflicted, but I think upstream one is better? Original commit: `299d38205a` `bitmap_zalloc`	2022-10-07 11:26:56 +03:00
Kees Cook	b6399c388a	treewide: kmalloc() -> kmalloc_array() The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| kmalloc( - sizeof(u8) * COUNT + COUNT , ...) \| kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) \| kmalloc( - sizeof(char) * COUNT + COUNT , ...) \| kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) \| kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) \| kmalloc(sizeof(TYPE) * C2, ...) \| kmalloc(C1 * C2 * C3, ...) \| kmalloc(C1 * C2, ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) \| - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) \| - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Adam W. Willis <return.of.octobot@gmail.com> Signed-off-by: Fiqri Ardyansyah <fiqri15072019@gmail.com>	2022-10-07 11:19:36 +03:00
friedrich420	382c9d35cf	Kernel/sched: Reduce Latency [Pafcholini] Signed-off-by: HolyAngel <slverwolf@gmail.com> Signed-off-by: Salllz <sal235222727@gmail.com> Signed-off-by: alanndz <alanndz7@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com>	2022-09-22 22:47:03 -05:00
holyangel	e94cab5ff1	Kernel/sched: Reduce latency for better responsiveness Signed-off-by: HolyAngel <slverwolf@gmail.com> Signed-off-by: Salllz <sal235222727@gmail.com> Signed-off-by: alanndz <alanndz7@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com>	2022-09-22 22:46:59 -05:00
Sultan Alsawaf	e7c82d8c4d	sched/fair: Revert Google's capacity margin hacks These aren't needed now that the task migration logic is fixed. xNombre: Add boosted margins Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2022-09-22 17:33:21 +03:00
Lingutla Chandrasekhar	c1e04a4d13	sched: fair: consider all running tasks in cpu for load balance Load_balancer considers only cfs running tasks for finding busiest cpu to do load balancing. But cpu may be busy with other type tasks (ex: RT), then that cpu might not selected as busy cpu due to weight vs nr_run checks fails). And possibley cfs tasks running on that cpu would suffer till other type tasks finishes or weight checks passes, while other cpus sitting idle and not able to do load balance. So, consider all running tasks to check cpu busieness. Change-Id: Iddf3f668507e20359f6388fc30ff5897d234c902 Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org> Signed-off-by: atndko <z1281552865@gmail.com> Signed-off-by: Cyber Knight <cyberknight755@gmail.com>	2022-09-22 17:33:20 +03:00
Lucas Stach	1f2196de96	sched/deadline: Fix stale throttling on de-/boosted tasks When a boosted task gets throttled, what normally happens is that it's immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the runtime and clears the dl_throttled flag. There is a special case however: if the throttling happened on sched-out and the task has been deboosted in the meantime, the replenish is skipped as the task will return to its normal scheduling class. This leaves the task with the dl_throttled flag set. Now if the task gets boosted up to the deadline scheduling class again while it is sleeping, it's still in the throttled state. The normal wakeup however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't actually place it on the rq. Thus we end up with a task that is runnable, but not actually on the rq and neither a immediate replenishment happens, nor is the replenishment timer set up, so the task is stuck in forever-throttled limbo. Clear the dl_throttled flag before dropping back to the normal scheduling class to fix this issue. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200831110719.2126930-1-l.stach@pengutronix.de Signed-off-by: Zlatan Radovanovic <zlatan.radovanovic@fet.ba> Signed-off-by: Cyber Knight <cyberknight755@gmail.com>	2022-09-22 17:33:20 +03:00
Vincent Guittot	47a4df59c1	sched/fair: Reduce minimal imbalance threshold The 25% default imbalance threshold for DIE and NUMA domain is large enough to generate significant unfairness between threads. A typical example is the case of 11 threads running on 2x4 CPUs. The imbalance of 20% between the 2 groups of 4 cores is just low enough to not trigger the load balance between the 2 groups. We will have always the same 6 threads on one group of 4 CPUs and the other 5 threads on the other group of CPUS. With a fair time sharing in each group, we ends up with +20% running time for the group of 5 threads. Consider decreasing the imbalance threshold for overloaded case where we use the load to balance task and to ensure fair time sharing. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Acked-by: Hillf Danton <hdanton@sina.com> Link: https://lkml.kernel.org/r/20200921072424.14813-3-vincent.guittot@linaro.org Signed-off-by: Zlatan Radovanovic <zlatan.radovanovic@fet.ba> Signed-off-by: Cyber Knight <cyberknight755@gmail.com>	2022-09-22 17:33:20 +03:00
Vincent Guittot	ca5a964774	sched/fair: Minimize concurrent LBs between domain level sched domains tend to trigger simultaneously the load balance loop but the larger domains often need more time to collect statistics. This slowness makes the larger domain trying to detach tasks from a rq whereas tasks already migrated somewhere else at a sub-domain level. This is not a real problem for idle LB because the period of smaller domains will increase with its CPUs being busy and this will let time for higher ones to pulled tasks. But this becomes a problem when all CPUs are already busy because all domains stay synced when they trigger their LB. A simple way to minimize simultaneous LB of all domains is to decrement the the busy interval by 1 jiffies. Because of the busy_factor, the interval of larger domain will not be a multiple of smaller ones anymore. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/20200921072424.14813-4-vincent.guittot@linaro.org Signed-off-by: Zlatan Radovanovic <zlatan.radovanovic@fet.ba> Signed-off-by: Cyber Knight <cyberknight755@gmail.com>	2022-09-22 17:33:20 +03:00
Vincent Guittot	c54355abd4	sched/fair: Reduce busy load balance interval The busy_factor, which increases load balance interval when a cpu is busy, is set to 32 by default. This value generates some huge LB interval on large system like the THX2 made of 2 node x 28 cores x 4 threads. For such system, the interval increases from 112ms to 3584ms at MC level. And from 228ms to 7168ms at NUMA level. Even on smaller system, a lower busy factor has shown improvement on the fair distribution of the running time so let reduce it for all. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Phil Auld <pauld@redhat.com> Link: https://lkml.kernel.org/r/20200921072424.14813-5-vincent.guittot@linaro.org	2022-09-22 17:33:20 +03:00
Qais Yousef	977a80d795	sched/uclamp: Fix rq->uclamp_max not set on first enqueue [ Upstream commit 315c4f884800c45cb6bd8c90422fad554a8b9588 ] Commit d81ae8aac85c ("sched/uclamp: Fix initialization of struct uclamp_rq") introduced a bug where uclamp_max of the rq is not reset to match the woken up task's uclamp_max when the rq is idle. The code was relying on rq->uclamp_max initialized to zero, so on first enqueue static inline void uclamp_rq_inc_id(struct rq rq, struct task_struct p, enum uclamp_id clamp_id) { ... if (uc_se->value > READ_ONCE(uc_rq->value)) WRITE_ONCE(uc_rq->value, uc_se->value); } was actually resetting it. But since commit d81ae8aac85c changed the default to 1024, this no longer works. And since rq->uclamp_flags is also initialized to 0, neither above code path nor uclamp_idle_reset() update the rq->uclamp_max on first wake up from idle. This is only visible from first wake up(s) until the first dequeue to idle after enabling the static key. And it only matters if the uclamp_max of this task is < 1024 since only then its uclamp_max will be effectively ignored. Fix it by properly initializing rq->uclamp_flags = UCLAMP_FLAG_IDLE to ensure uclamp_idle_reset() is called which then will update the rq uclamp_max value as expected. Fixes: d81ae8aac85c ("sched/uclamp: Fix initialization of struct uclamp_rq") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <Valentin.Schneider@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20211202112033.1705279-1-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:20 +03:00
Quentin Perret	e9a6a0f106	sched: Fix UCLAMP_FLAG_IDLE setting The UCLAMP_FLAG_IDLE flag is set on a runqueue when dequeueing the last uclamp active task (that is, when buckets.tasks reaches 0 for all buckets) to maintain the last uclamp.max and prevent blocked util from suddenly becoming visible. However, there is an asymmetry in how the flag is set and cleared which can lead to having the flag set whilst there are active tasks on the rq. Specifically, the flag is cleared in the uclamp_rq_inc() path, which is called at enqueue time, but set in uclamp_rq_dec_id() which is called both when dequeueing a task _and_ in the update_uclamp_active() path. As a result, when both uclamp_rq_{dec,ind}_id() are called from update_uclamp_active(), the flag ends up being set but not cleared, hence leaving the runqueue in a broken state. Fix this by clearing the flag in update_uclamp_active() as well. Fixes: e496187da710 ("sched/uclamp: Enforce last task's UCLAMP_MAX") Reported-by: Rick Yiu <rickyiu@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Qais Yousef <qais.yousef@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20210805102154.590709-2-qperret@google.com Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:20 +03:00
Quentin Perret	957d737358	ANDROID: sched: Make uclamp changes depend on CAP_SYS_NICE There is currently nothing preventing tasks from changing their per-task clamp values in anyway that they like. The rationale is probably that system administrators are still able to limit those clamps thanks to the cgroup interface. However, this causes pain in a system where both per-task and per-cgroup clamp values are expected to be under the control of core system components (as is the case for Android). To fix this, let's require CAP_SYS_NICE to change per-task clamp values. There are ongoing discussions upstream about more flexible approaches than this using the RLIMIT API -- see [1]. But the upstream discussion has not converged yet, and this is way too late for UAPI changes in android12-5.10 anyway, so let's apply this change which provides the behaviour we want without actually impacting UAPIs. [1] https://lore.kernel.org/lkml/20210623123441.592348-4-qperret@google.com/ Bug: 187186685 Signed-off-by: Quentin Perret <qperret@google.com> Change-Id: I749312a77306460318ac5374cf243d00b78120dd Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:20 +03:00
Xuewen Yan	08a6710f89	sched/uclamp: Ignore max aggregation if rq is idle [ Upstream commit 3e1493f46390618ea78607cb30c58fc19e2a5035 ] When a task wakes up on an idle rq, uclamp_rq_util_with() would max aggregate with rq value. But since there is no task enqueued yet, the values are stale based on the last task that was running. When the new task actually wakes up and enqueued, then the rq uclamp values should reflect that of the newly woken up task effective uclamp values. This is a problem particularly for uclamp_max because it default to 1024. If a task p with uclamp_max = 512 wakes up, then max aggregation would ignore the capping that should apply when this task is enqueued, which is wrong. Fix that by ignoring max aggregation if the rq is idle since in that case the effective uclamp value of the rq will be the ones of the task that will wake up. Fixes: 9d20ad7dfc9a ("sched/uclamp: Add uclamp_util_with()") Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> [qias: Changelog] Reviewed-by: Qais Yousef <qais.yousef@arm.com> Link: https://lore.kernel.org/r/20210630141204.8197-1-xuewen.yan94@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:20 +03:00
Qais Yousef	7656386687	sched/uclamp: Fix uclamp_tg_restrict() [ Upstream commit 0213b7083e81f4acd69db32cb72eb4e5f220329a ] Now cpu.uclamp.min acts as a protection, we need to make sure that the uclamp request of the task is within the allowed range of the cgroup, that is it is clamp()'ed correctly by tg->uclamp[UCLAMP_MIN] and tg->uclamp[UCLAMP_MAX]. As reported by Xuewen [1] we can have some corner cases where there's inversion between uclamp requested by task (p) and the uclamp values of the taskgroup it's attached to (tg). Following table demonstrates 2 corner cases: \| p \| tg \| effective -----------+-----+------+----------- CASE 1 -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 60% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- CASE 2 -----------+-----+------+----------- uclamp_min \| 0% \| 30% \| 30% -----------+-----+------+----------- uclamp_max \| 20% \| 50% \| 20% -----------+-----+------+----------- With this fix we get: \| p \| tg \| effective -----------+-----+------+----------- CASE 1 -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 50% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- CASE 2 -----------+-----+------+----------- uclamp_min \| 0% \| 30% \| 30% -----------+-----+------+----------- uclamp_max \| 20% \| 50% \| 30% -----------+-----+------+----------- Additionally uclamp_update_active_tasks() must now unconditionally update both UCLAMP_MIN/MAX because changing the tg's UCLAMP_MAX for instance could have an impact on the effective UCLAMP_MIN of the tasks. \| p \| tg \| effective -----------+-----+------+----------- old -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 50% -----------+-----+------+----------- uclamp_max \| 80% \| 50% \| 50% -----------+-----+------+----------- new -----------+-----+------+----------- uclamp_min \| 60% \| 0% \| 60% -----------+-----+------+----------- uclamp_max \| 80% \|70% \| 70% -----------+-----+------+----------- [1] https://lore.kernel.org/lkml/CAB8ipk_a6VFNjiEnHRHkUMBKbA+qzPQvhtNjJ_YNzQhqV_o8Zw@mail.gmail.com/ Fixes: 0c18f2ecfcc2 ("sched/uclamp: Fix wrong implementation of cpu.uclamp.min") Reported-by: Xuewen Yan <xuewen.yan94@gmail.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210617165155.3774110-1-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:20 +03:00
Qais Yousef	07c3a9af2f	sched/uclamp: Fix wrong implementation of cpu.uclamp.min [ Upstream commit 0c18f2ecfcc274a4bcc1d122f79ebd4001c3b445 ] cpu.uclamp.min is a protection as described in cgroup-v2 Resource Distribution Model Documentation/admin-guide/cgroup-v2.rst which means we try our best to preserve the minimum performance point of tasks in this group. See full description of cpu.uclamp.min in the cgroup-v2.rst. But the current implementation makes it a limit, which is not what was intended. For example: tg->cpu.uclamp.min = 20% p0->uclamp[UCLAMP_MIN] = 0 p1->uclamp[UCLAMP_MIN] = 50% Previous Behavior (limit): p0->effective_uclamp = 0 p1->effective_uclamp = 20% New Behavior (Protection): p0->effective_uclamp = 20% p1->effective_uclamp = 50% Which is inline with how protections should work. With this change the cgroup and per-task behaviors are the same, as expected. Additionally, we remove the confusing relationship between cgroup and !user_defined flag. We don't want for example RT tasks that are boosted by default to max to change their boost value when they attach to a cgroup. If a cgroup wants to limit the max performance point of tasks attached to it, then cpu.uclamp.max must be set accordingly. Or if they want to set different boost value based on cgroup, then sysctl_sched_util_clamp_min_rt_default must be used to NOT boost to max and set the right cpu.uclamp.min for each group to let the RT tasks obtain the desired boost value when attached to that group. As it stands the dependency on !user_defined flag adds an extra layer of complexity that is not required now cpu.uclamp.min behaves properly as a protection. The propagation model of effective cpu.uclamp.min in child cgroups as implemented by cpu_util_update_eff() is still correct. The parent protection sets an upper limit of what the child cgroups will effectively get. Fixes: 3eac870a3247 (sched/uclamp: Use TG's clamps to restrict TASK's clamps) Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210510145032.1934078-2-qais.yousef@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:20 +03:00
Quentin Perret	973affb92e	FROMLIST: sched: Fix out-of-bound access in uclamp Util-clamp places tasks in different buckets based on their clamp values for performance reasons. However, the size of buckets is currently computed using a rounding division, which can lead to an off-by-one error in some configurations. For instance, with 20 buckets, the bucket size will be 1024/20=51. A task with a clamp of 1024 will be mapped to bucket id 1024/51=20. Sadly, correct indexes are in range [0,19], hence leading to an out of bound memory access. Clamp the bucket id to fix the issue. Bug: 186415778 Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting") Suggested-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20210430151412.160913-1-qperret@google.com Change-Id: Ibc28662de5554f80f97533b60e747f8a6e871c56 Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:20 +03:00
Qinglang Miao	31714e404d	sched/uclamp: Remove unnecessary mutex_init() The uclamp_mutex lock is initialized statically via DEFINE_MUTEX(), it is unnecessary to initialize it runtime via mutex_init(). Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20200725085629.98292-1-miaoqinglang@huawei.com Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:19 +03:00
Qais Yousef	e954be1dbf	sched/uclamp: Fix locking around cpu_util_update_eff() cpu_cgroup_css_online() calls cpu_util_update_eff() without holding the uclamp_mutex or rcu_read_lock() like other call sites, which is a mistake. The uclamp_mutex is required to protect against concurrent reads and writes that could update the cgroup hierarchy. The rcu_read_lock() is required to traverse the cgroup data structures in cpu_util_update_eff(). Surround the caller with the required locks and add some asserts to better document the dependency in cpu_util_update_eff(). Fixes: 7226017ad37a ("sched/uclamp: Fix a bug in propagating uclamp value in new cgroups") Reported-by: Quentin Perret <qperret@google.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210510145032.1934078-3-qais.yousef@arm.com Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:19 +03:00
Qais Yousef	16d73cafea	FROMGIT: sched/uclamp: Fix a bug in propagating uclamp value in new cgroups When a new cgroup is created, the effective uclamp value wasn't updated with a call to cpu_util_update_eff() that looks at the hierarchy and update to the most restrictive values. Fix it by ensuring to call cpu_util_update_eff() when a new cgroup becomes online. Without this change, the newly created cgroup uses the default root_task_group uclamp values, which is 1024 for both uclamp_{min, max}, which will cause the rq to to be clamped to max, hence cause the system to run at max frequency. The problem was observed on Ubuntu server and was reproduced on Debian and Buildroot rootfs. By default, Ubuntu and Debian create a cpu controller cgroup hierarchy and add all tasks to it - which creates enough noise to keep the rq uclamp value at max most of the time. Imitating this behavior makes the problem visible in Buildroot too which otherwise looks fine since it's a minimal userspace. Bug: 120440300 Fixes: 0b60ba2dd342 ("sched/uclamp: Propagate parent clamps") Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Doug Smythies <dsmythies@telus.net> Link: https://lore.kernel.org/lkml/000701d5b965$361b6c60$a2524520$@net/ (cherry picked from commit 7226017ad37a888915628e59a84a2d1e57b40707 https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core) Signed-off-by: Qais Yousef <qais.yousef@arm.com> Change-Id: I9636c60e04d58bbfc5041df1305b34a12b5a3f46 Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:19 +03:00
Qais Yousef	ce1a62791e	UPSTREAM: sched/uclamp: Fix incorrect condition uclamp_update_active() should perform the update when p->uclamp[clamp_id].active is true. But when the logic was inverted in [1], the if condition wasn't inverted correctly too. [1] https://lore.kernel.org/lkml/20190902073836.GO2369@hirez.programming.kicks-ass.net/ Bug: 120440300 Reported-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Patrick Bellasi <patrick.bellasi@matbug.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: babbe170e053 ("sched/uclamp: Update CPU's refcount on TG's clamp changes") Link: https://lkml.kernel.org/r/20191114211052.15116-1-qais.yousef@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 6e1ff0773f49c7d38e8b4a9df598def6afb9f415) Signed-off-by: Qais Yousef <qais.yousef@arm.com> Change-Id: I51b58a6089290277e08a0aaa72b86f852eec1512 Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:19 +03:00
Peter Zijlstra	880c8ae64d	sched/swait: Switch to full exclusive mode Linus noted that swait basically implements exclusive mode -- because swake_up() only wakes a single waiter. And because of that it should take care to properly deal with the interruptible case. In short, the problem is that swake_up() can race with a signal. In this this case it is possible the swake_up() 'wakes' the waiter that is already on the way out because it just got a signal and the wakeup gets lost. The normal wait code is very careful and avoids this situation, make sure we do too. Copy the exact exclusive semantics from wait. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.209762413@infradead.org	2022-09-22 17:33:18 +03:00
Andrzej Perczak	45834382f7	sched: Set uclamp_util_min_rt_default to 0 Taken from oriole init script. This fixes a problem with stuck freqs on little core. Signed-off-by: Andrzej Perczak <linux@andrzejperczak.com>	2022-09-22 17:33:18 +03:00
darkhz	c57fac5de0	kernel: tune_dummy: Add prefer_high_cap node	2022-09-22 17:33:18 +03:00
darkhz	5d079db98a	kernel: Restore schedtune compatibility When SCHED_TUNE is disabled, certain elements in userspace, like libperfmgr, may throw errors when it doesn't detect SCHEDTUNE-related nodes. So, create a few dummy nodes to satisfy said userspace elements.	2022-09-22 17:33:18 +03:00
Dietmar Eggemann	b6b69eece5	sched/uclamp: Allow to reset a task uclamp constraint value In case the user wants to stop controlling a uclamp constraint value for a task, use the magic value -1 in sched_util_{min,max} with the appropriate sched_flags (SCHED_FLAG_UTIL_CLAMP_{MIN,MAX}) to indicate the reset. The advantage over the 'additional flag' approach (i.e. introducing SCHED_FLAG_UTIL_CLAMP_RESET) is that no additional flag has to be exported via uapi. This avoids the need to document how this new flag has be used in conjunction with the existing uclamp related flags. The following subtle issue is fixed as well. When a uclamp constraint value is set on a !user_defined uclamp_se it is currently first reset and then set. Fix this by AND'ing !user_defined with !SCHED_FLAG_UTIL_CLAMP which stands for the 'sched class change' case. The related condition 'if (uc_se->user_defined)' moved from __setscheduler_uclamp() into uclamp_reset(). Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Yun Hsiang <hsiang023167@gmail.com> Link: https://lkml.kernel.org/r/20201113113454.25868-1-dietmar.eggemann@arm.com	2022-09-22 17:33:18 +03:00
darkhz	b9be58948d	sched/uclamp: Use wrappers for _{read, write, show} functions In commit: sched/uclamp: Move all tunables to cpusets, We directly modified and exported _{read, write, show} functions to cpuset code.This makes the resulting code quite inconsistent with the upstream version. We now use wrappers to reflect the original function, and to preserve the code in accordance with upstream/mainline. This is a purely cosmetic change, and no new behaviour should be exhibited on applying this commit.	2022-09-22 17:33:18 +03:00

1 2 3 4 5 ...

3421 Commits