30045 Commits

Author SHA1 Message Date
Rafael J. Wysocki
a6a20216ad cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit()
Notice that ignore_dl_rate_limit() need not piggy back on the
limits_changed handling to achieve its goal (which is to enforce a
frequency update before its due time).

Namely, if sugov_should_update_freq() is updated to check
sg_policy->need_freq_update and return 'true' if it is set when
sg_policy->limits_changed is not set, ignore_dl_rate_limit() may
set the former directly instead of setting the latter, so it can
avoid hitting the memory barrier in sugov_should_update_freq().

Update the code accordingly.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/10666429.nUPlyArG6x@rjwysocki.net
2025-05-20 21:57:27 +03:00
Rafael J. Wysocki
38ac58ce7f cpufreq/sched: Explicitly synchronize limits_changed flag handling
The handling of the limits_changed flag in struct sugov_policy needs to
be explicitly synchronized to ensure that cpufreq policy limits updates
will not be missed in some cases.

Without that synchronization it is theoretically possible that
the limits_changed update in sugov_should_update_freq() will be
reordered with respect to the reads of the policy limits in
cpufreq_driver_resolve_freq() and in that case, if the limits_changed
update in sugov_limits() clobbers the one in sugov_should_update_freq(),
the new policy limits may not take effect for a long time.

Likewise, the limits_changed update in sugov_limits() may theoretically
get reordered with respect to the updates of the policy limits in
cpufreq_set_policy() and if sugov_should_update_freq() runs between
them, the policy limits change may be missed.

To ensure that the above situations will not take place, add memory
barriers preventing the reordering in question from taking place and
add READ_ONCE() and WRITE_ONCE() annotations around all of the
limits_changed flag updates to prevent the compiler from messing up
with that code.

Fixes: 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change")
Cc: 5.3+ <stable@vger.kernel.org> # 5.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3376719.44csPzL39Z@rjwysocki.net
2025-05-20 21:57:27 +03:00
Rafael J. Wysocki
f56461366c cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS
Commit 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused
by need_freq_update") modified sugov_should_update_freq() to set the
need_freq_update flag only for drivers with CPUFREQ_NEED_UPDATE_LIMITS
set, but that flag generally needs to be set when the policy limits
change because the driver callback may need to be invoked for the new
limits to take effect.

However, if the return value of cpufreq_driver_resolve_freq() after
applying the new limits is still equal to the previously selected
frequency, the driver callback needs to be invoked only in the case
when CPUFREQ_NEED_UPDATE_LIMITS is set (which means that the driver
specifically wants its callback to be invoked every time the policy
limits change).

Update the code accordingly to avoid missing policy limits changes for
drivers without CPUFREQ_NEED_UPDATE_LIMITS.

Fixes: 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update")
Closes: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org/
Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/3010358.e9J7NaK4W3@rjwysocki.net
2025-05-20 21:57:22 +03:00
zihan zhou
dcc4fc8469 sched: Reduce the default slice to avoid tasks getting an extra tick
The old default value for slice is 0.75 msec * (1 + ilog(ncpus)) which
means that we have a default slice of:

  0.75 for 1 cpu
  1.50 up to 3 cpus
  2.25 up to 7 cpus
  3.00 for 8 cpus and above.

For HZ=250 and HZ=100, because of the tick accuracy, the runtime of
tasks is far higher than their slice.

For HZ=1000 with 8 cpus or more, the accuracy of tick is already
satisfactory, but there is still an issue that tasks will get an extra
tick because the tick often arrives a little faster than expected. In
this case, the task can only wait until the next tick to consider that it
has reached its deadline, and will run 1ms longer.

vruntime + sysctl_sched_base_slice =     deadline
        |-----------|-----------|-----------|-----------|
             1ms          1ms         1ms         1ms
                   ^           ^           ^           ^
                 tick1       tick2       tick3       tick4(nearly 4ms)

There are two reasons for tick error: clockevent precision and the
CONFIG_IRQ_TIME_ACCOUNTING/CONFIG_PARAVIRT_TIME_ACCOUNTING. with
CONFIG_IRQ_TIME_ACCOUNTING every tick will be less than 1ms, but even
without it, because of clockevent precision, tick still often less than
1ms.

In order to make scheduling more precise, we changed 0.75 to 0.70,
Using 0.70 instead of 0.75 should not change much for other configs
and would fix this issue:

  0.70 for 1 cpu
  1.40 up to 3 cpus
  2.10 up to 7 cpus
  2.8 for 8 cpus and above.

This does not guarantee that tasks can run the slice time accurately
every time, but occasionally running an extra tick has little impact.

Signed-off-by: zihan zhou <15645113830zzh@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20250208075322.13139-1-15645113830zzh@gmail.com
[Helium-Studio: Adapt for 8 cpus]
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
2025-05-20 21:54:13 +03:00
Alexander Winkowski
6d7dccfd20 sched: Apply Android tweaks manually
Tunables can't be changed with CONFIG_SCHED_DEBUG=n

b4b3950e52/rootdir/init.rc (L323)

Signed-off-by: Alexander Winkowski <dereference23@outlook.com>
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
2025-05-20 21:54:12 +03:00
Helium-Studio
fc1fb82ca2 Revert "sched: promote nodes out of CONFIG_SCHED_DEBUG"
* Let's apply android tunes manually.

This reverts commit c810b18857.

Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
2025-05-20 21:54:12 +03:00
kondors1995
2d1e1c8056 kernel:rcu: Drop Ofast flag
I have no idea way i even had it
2025-05-20 21:54:12 +03:00
Sultan Alsawaf
56058927a1 sched/cass: Don't pack tasks with uclamp boosts below minimum CPU capacity
To save energy, CASS may prefer non-idle CPUs for uclamp-boosted tasks in
order to pack them onto a single performance domain rather than spreading
them across multiple performance domains. This way, it is more likely for
only one performance domain to be boosted a higher P-state when there is
more than one uclamp-boosted task running.

However, when a task has a uclamp boost value that is below a CPU's minimum
capacity, it is nearly the same thing as not having a uclamp boost at all.

In spite of that, CASS may still prefer non-idle CPUs for tasks with bogus
uclamp boost values. This is not only worse for latency, but also energy
efficiency since the load on the CPU is spread less evenly as a result.

Therefore, don't pack tasks with uclamp boosts below a CPU's minimum
configured capacity, since such tasks do not force the CPU to run at a
higher P-state.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>
2025-03-08 11:51:49 +02:00
kondors1995
ec4ff863b0 Revert "sched/fair: Skip cpu if task does not fit in"
Causes issues with compilation and i am quite sure its not needed with
cass

This reverts commit 16bd86c028.
2025-03-08 11:51:49 +02:00
Viresh Kumar
317886d82c [ADAPTED/PARTIAL] sched/fair: Introduce fits_capacity()
The same formula to check utilization against capacity (after
considering capacity_margin) is already used at 5 different locations.

This patch creates a new macro, fits_capacity(), which can be used from
all these locations without exposing the details of it and hence
simplify code.

All the 5 code locations are updated as well to use it..

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/b477ac75a2b163048bdaeb37f57b4c3f04f75a31.1559631700.git.viresh.kumar@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-08 11:51:49 +02:00
Sultan Alsawaf
578309e112 sched/cass: Avoid using the prime CPU on systems that have one
On arm64 systems with a prime CPU, treating this CPU – which is the single
fastest one in the system – the same as any other CPU is extremely bad for
energy efficiency. This is because prime CPUs are designed to be very fast
at the expense of energy efficiency, serving as the single-core performance
workhorse of a system.

Since CASS hasn't been giving special treatment to prime CPUs, CASS has
been balancing relative load onto prime CPUs at great expense to energy.

Thanks to the checks in place for CPU overload and task fit, it's easy to
adjust CASS to avoid using prime CPUs without hurting single-core or
multi-core performance.

Place a check just below the task fit check for whether or not a candidate
is a prime CPU. This way, when CPUs are overloaded or a task only fits on
the prime CPU, CASS will still utilize the prime CPU without a loss of
performance.

This provides a double-digit percent improvement to energy efficiency
across the board as measured on Tensor G3 (Pixel 8) and Tensor G4 (Pixel 9
Pro).

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-03-08 11:51:49 +02:00
Sultan Alsawaf
d1d7603543 sched/cass: Don't fight the idle load balancer
The idle load balancer (ILB) is kicked whenever a task is misfit, meaning
that the task doesn't fit on its CPU (i.e., fits_capacity() == false).

Since CASS makes no attempt to place tasks such that they'll fit on the CPU
they're placed upon, the ILB works harder to correct this and rebalances
misfit tasks onto a CPU with sufficient capacity.

By fighting the ILB like this, CASS degrades both energy efficiency and
performance.

Play nicely with the ILB by trying to place tasks onto CPUs that fit.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-03-08 11:51:49 +02:00
Sultan Alsawaf
91cf9e8478 sched/fair: Remove throughput optimization that keeps tasks on big CPUs
When the load balancer looks for the busiest group to detach tasks from, it
deliberately ignores higher-capacity groups that aren't egregiously
imbalanced. This is done in an attempt to improve throughput, while
resulting in a significant hit to energy efficiency: on Tensor G4 (Pixel 9
Pro), removing this optimization reduces energy usage by around 7%
(400 mW -> 370 mW) for a light gaming scenario.

Since this optimization doesn't provide any notable throughput improvement
(hackbench actually performs slightly better without it), remove it to
improve energy efficiency.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-03-08 11:51:49 +02:00
Sultan Alsawaf
a5269bd49f sched/fair: Don't needlessly migrate a lone task to a higher capacity CPU
When a CPU has only one running CFS task and there's a higher capacity CPU
that's idle, that lone task may be migrated to the higher capacity CPU just
because of RT and IRQ load.

If the CPU running the lone CFS task has sufficient capacity for the task,
then let it run that task. Migrating the task up to a higher capacity CPU
causes that CPU to be kicked out of idle and degrades energy efficiency
without an appreciable performance improvement. The load balancer will take
care of migrating the task anyway if it becomes a misfit, so this heuristic
isn't needed.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2025-03-08 11:51:48 +02:00
Vincent Guittot
414d6d591f sched/fair: Fix unnecessary increase of balance interval
In case of active balancing, we increase the balance interval to cover
pinned tasks cases not covered by all_pinned logic. Neverthless, the
active migration triggered by asym packing should be treated as the normal
unbalanced case and reset the interval to default value, otherwise active
migration for asym_packing can be easily delayed for hundreds of ms
because of this pinned task detection mechanism.

The same happens to other conditions tested in need_active_balance() like
misfit task and when the capacity of src_cpu is reduced compared to
dst_cpu (see comments in need_active_balance() for details).

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: valentin.schneider@arm.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jesse Chan <jc@linux.com>
Signed-off-by: billaids <jimmy.nelle@hsw-stud.de>
2025-03-08 11:51:13 +02:00
NeilBrown
98567695d7 BACKPORT: cred: add get_cred_rcu()
Sometimes we want to opportunistically get a
ref to a cred in an rcu_read_lock protected section.
get_task_cred() does this, and NFS does as similar thing
with its own credential structures.
To prepare for NFS converting to use 'struct cred' more
uniformly, define get_cred_rcu(), and use it in
get_task_cred().

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
[neobuddy89: Backport for KernelSU-Next]
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
2025-03-08 11:41:42 +02:00
Sultan Alsawaf
7dde6beb26 cpufreq: schedutil: Set default rate limit to 2000 us
This is empirically observed to yield good performance with reduced power
consumption. With "cpufreq: schedutil: Ignore rate limit when scaling up
with FIE present", this only affects frequency reductions when FIE is
present, since there is no rate limit applied when scaling up.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-12-19 17:35:47 +02:00
Sultan Alsawaf
7e401ca729 cpufreq: schedutil: Ignore rate limit when scaling up with FIE present
When schedutil disregards a frequency transition due to the transition rate
limit, there is no guaranteed deadline as to when the frequency transition
will actually occur after the rate limit expires. For instance, depending
on how long a CPU spends in a preempt/IRQs disabled context, a rate-limited
frequency transition may be delayed indefinitely, until said CPU reaches
the scheduler again. This also hurts tasks boosted via UCLAMP_MIN.

For frequency transitions _down_, this only poses a theoretical loss of
energy savings since a CPU may remain at a higher frequency than necessary
for an indefinite period beyond the rate limit expiry.

For frequency transitions _up_, however, this poses a significant hit to
performance when a CPU is stuck at an insufficient frequency for an
indefinitely long time. In latency-sensitive and bursty workloads
especially, a missed frequency transition up can result in a significant
performance loss due to a CPU operating at an insufficient frequency for
too long.

When support for the Frequency Invariant Engine (FIE) _isn't_ present, a
rate limit is always required for the scheduler to compute CPU utilization
with some semblance of accuracy: any frequency transition that occurs
before the previous transition latches would result in the scheduler not
knowing the frequency a CPU is actually operating at, thereby trashing the
computed CPU utilization.

However, when FIE support _is_ present, there's no technical requirement to
rate limit all frequency transitions to a cpufreq driver's reported
transition latency. With FIE, the scheduler's CPU utilization tracking is
unaffected by any frequency transitions that occur before the previous
frequency is latched.

Therefore, ignore the frequency transition rate limit when scaling up on
systems where FIE is present. This guarantees that transitions to a higher
frequency cannot be indefinitely delayed, since they simply cannot be
delayed at all.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-12-19 17:35:47 +02:00
Sultan Alsawaf
27979b9423 cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
A redundant frequency update is only truly needed when there is a policy
limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.

In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
frequency update _all the time_, not just for a policy limits change,
because need_freq_update is never cleared.

Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
to a redundant frequency update, regardless of whether or not the driver
specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
same as the current one.

Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
when there's a policy limits change, and clearing need_freq_update when a
requisite redundant update occurs.

This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
and instead setting need_freq_update to false in sugov_update_next_freq().

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-12-19 17:35:47 +02:00
Rafael J. Wysocki
1ca115cdde cpufreq: schedutil: Simplify sugov_update_next_freq()
Rearrange a conditional to make it more straightforward.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-12-19 17:35:41 +02:00
Viresh Kumar
50bc315faf cpufreq: schedutil: Don't skip freq update if need_freq_update is set
The cpufreq policy's frequency limits (min/max) can get changed at any
point of time, while schedutil is trying to update the next frequency.
Though the schedutil governor has necessary locking and support in place
to make sure we don't miss any of those updates, there is a corner case
where the governor will find that the CPU is already running at the
desired frequency and so may skip an update.

For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
and is running at 1 GHz currently. Schedutil tries to update the
frequency to 1.2 GHz, during this time the policy limits get changed as
policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
frequency at various instances, we will eventually set the frequency to
1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.

Now lets say the policy limits get changed back at this time with
policy->min as 1 GHz. The next time schedutil is invoked by the
scheduler, we will reevaluate the next frequency (because
need_freq_update will get set due to limits change event) and lets say
we want to set the frequency to 1.2 GHz again. At this point
sugov_update_next_freq() will find the next_freq == current_freq and
will abort the update, while the CPU actually runs at 1.4 GHz.

Until now need_freq_update was used as a flag to indicate that the
policy's frequency limits have changed, and that we should consider the
new limits while reevaluating the next frequency.

This patch fixes the above mentioned issue by extending the purpose of
the need_freq_update flag. If this flag is set now, the schedutil
governor will not try to abort a frequency change even if next_freq ==
current_freq.

As similar behavior is required in the case of
CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
set to false if that flag is set for the driver.

We also don't need to consider the need_freq_update flag in
sugov_update_single() anymore to handle the special case of busy CPU, as
we won't abort a frequency update anymore.

Reported-by: zhuguangqing <zhuguangqing@xiaomi.com>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Rearrange code to avoid a branch ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-12-19 17:35:18 +02:00
Rafael J. Wysocki
bece027796 cpufreq: schedutil: Always call driver if CPUFREQ_NEED_UPDATE_LIMITS is set
Because sugov_update_next_freq() may skip a frequency update even if
the need_freq_update flag has been set for the policy at hand, policy
limits updates may not take effect as expected.

For example, if the intel_pstate driver operates in the passive mode
with HWP enabled, it needs to update the HWP min and max limits when
the policy min and max limits change, respectively, but that may not
happen if the target frequency does not change along with the limit
at hand.  In particular, if the policy min is changed first, causing
the target frequency to be adjusted to it, and the policy max limit
is changed later to the same value, the HWP max limit will not be
updated to follow it as expected, because the target frequency is
still equal to the policy min limit and it will not change until
that limit is updated.

To address this issue, modify get_next_freq() to let the driver
callback run if the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag
is set regardless of whether or not the new frequency to set is
equal to the previous one.

Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled")
Reported-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 1c534352f47f cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS ...
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: a62f68f5ca53 cpufreq: Introduce cpufreq_driver_test_flags()
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-12-19 17:35:18 +02:00
EmanuelCN
a84a39e2e9 cpufreq/schedutil: Remove up/down rate limits
To make way for new changes
2024-12-19 17:35:17 +02:00
EmanuelCN
103eeccc95 cpufreq/schedutil: Don't limit util to max_cap in dvfs_headroom 2024-12-19 17:35:08 +02:00
kondors1995
2d5c481f29 bpf: squash revert spoofing and some backports:
Squashed commit of the following:

commit 259593385c05a430c4685b611c0e43b4272c22f8
Author: John Galt <johngaltfirstrun@gmail.com>
Date:   Fri Dec 13 08:30:37 2024 -0500

    bpf: squash revert spoofing and some backports:

    Squashed commit of the following:

    commit 8ac5df9c8bc9575059fff6cea0c40463b96fc129
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:58:17 2024 -0500

        Revert "BACKPORT: bpf: add skb_load_bytes_relative helper"

        This reverts commit 029893dcc5d67af16fdf0723bacaae37ec567f67.

    commit dbcbceafe848744ec188f74e87e9717916d359ea
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:58:13 2024 -0500

        Revert "BACKPORT: bpf: encapsulate verifier log state into a structure"

        This reverts commit d861145b97d247cbd9fe1400df52155f48639126.

    commit 478f4dfee0406b54525e68764cc9ba48af1624fc
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:58:10 2024 -0500

        Revert "BACKPORT: bpf: Rename bpf_verifer_log"

        This reverts commit 5d088635de1bf2d6ae9ea94e3dd1c601d30c0cce.

    commit 7bc7c24beb82168b49337530cb56b5dfeeafe19a
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:58:07 2024 -0500

        Revert "BACKPORT: bpf: btf: Introduce BPF Type Format (BTF)"

        This reverts commit 93d34e26514b4d9d15fd176706f57634b2e97485.

    commit 7106457ba90a459b6241fdd44df658c1b52c0e4b
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:58:03 2024 -0500

        Revert "bpf: Update logging functions to work with BTF"

        This reverts commit 97e6c528eb2f76c58a3b6a4c1e7fbeafcd97633a.

    commit 08e68c7ba56f5e78fd1afcd5a2164716a75b0fe3
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:58:00 2024 -0500

        Revert "bpf: btf: Validate type reference"

        This reverts commit c7b7eecbc1134e5d8865af2cc0692fc7156175d5.

    commit 7763cf0831970a64ed62f9b7362fca02ab6e83f1
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:57:51 2024 -0500

        Revert "bpf: btf: Check members of struct/union"

        This reverts commit 9a77b51cad6f04866ca067ca0e70a89b9f59ed56.

    commit eb033235f666b5f66995f4cf89702de7ab4721f8
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:57:47 2024 -0500

        Revert "bpf: btf: Add pretty print capability for data with BTF type info"

        This reverts commit 745692103435221d6e39bc177811769995540525.

    commit c32995674ace91e06c591d2f63177585e81adc75
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:57:43 2024 -0500

        Revert "BACKPORT: bpf: btf: Add BPF_BTF_LOAD command"

        This reverts commit 4e0afd38e20e5aa2df444361309bc07251ca6b2a.

    commit 1310bc8d4aca0015c8723e7624121eddf76b3244
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:57:38 2024 -0500

        Revert "bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd"

        This reverts commit d4b5d76d9101b97e6fe5181bcefe7f601ed19926.

    commit 881a49445608712bdb0a0f0c959838bdbc725f62
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:57:34 2024 -0500

        Revert "BACKPORT: bpf: btf: Clean up btf.h in uapi"

        This reverts commit 26b661822933d41b3feb59bb284334bfbbc82af4.

    commit e2109fd858ebd5fe392c8bf579b9350fbca35a35
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:57:29 2024 -0500

        Revert "bpf: btf: Avoid WARN_ON when CONFIG_REFCOUNT_FULL=y"

        This reverts commit 9abf878903404e649fef4ad0b189eec1c13d29fe.

    commit 088a7d9137f03da4e0fc1d72add3901823081ccd
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:57:23 2024 -0500

        Revert "bpf: Fix compiler warning on info.map_ids for 32bit platform"

        This reverts commit a3a278e1f6cf167d538ac52f4ad60bb9cf8d4129.

    commit 6e14aed6b63f2b266982454d83678445c062cf39
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:57:13 2024 -0500

        Revert "bpf: btf: Change how section is supported in btf_header"

        This reverts commit 4b60ffd683eb623a184b46761777838d7c49e707.

    commit 151a60855c23bf0317734031481d779efb369d6c
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:57:08 2024 -0500

        Revert "bpf: btf: Check array->index_type"

        This reverts commit b00e10f1a073fadce178b6fb62496722e16db303.

    commit 49775e9074a54ac5f60f518e6fc5a26172996eae
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:57:01 2024 -0500

        Revert "bpf: btf: Remove unused bits from uapi/linux/btf.h"

        This reverts commit c90c6ad34f7a8f565f351d21c2d5b9706838767d.

    commit b6d6c6ab28e4b018da6ce9e64125e63f4191d3d9
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:58 2024 -0500

        Revert "bpf: btf: Avoid variable length array"

        This reverts commit fe7d1f7750242e77a73839d173ac36c3e39d4171.

    commit a45bedecb9b1175fef96f2d64fba2d61777dbf35
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:49 2024 -0500

        Revert "bpf: btf: avoid -Wreturn-type warning"

        This reverts commit 78214f1e390bf1d69d9ae4ee80072ac85c34619e.

    commit 445efb8465b9fa5706d81098417f15656265322e
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:46 2024 -0500

        Revert "bpf: btf: Check array t->size"

        This reverts commit aed532e7466f77885a362e4b863bf90c41e834ba.

    commit 8aada590d525de735cf39196d88722e727c141e9
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:42 2024 -0500

        Revert "bpf: btf: Ensure t->type == 0 for BTF_KIND_FWD"

        This reverts commit 8c8b601dcc2e62e1276b73dfee8b49e40fb65944.

    commit ed67ad09e866c9c30897488088bbb4555ea3dc80
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:38 2024 -0500

        Revert "bpf: btf: Fix bitfield extraction for big endian"

        This reverts commit b0696a226c52868d64963f01665dd1a640a92f2b.

    commit 5cc64db782daf86cdf7ac77133ca94181bb29146
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:35 2024 -0500

        Revert "bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h"

        This reverts commit 0f008594540b09c667ea88fc87cf289b8db334da.

    commit 3a5c6b9010426449c08ecdcc10e758431b1e515f
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:31 2024 -0500

        Revert "bpf: btf: Ensure the member->offset is in the right order"

        This reverts commit c5e361ecd6d45a7cdbffda02e4691a7a37198bdd.

    commit bd6173c1ac458b08d6cedaf06e6e53c93e6b0cc5
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:26 2024 -0500

        Revert "bpf: fix bpf_skb_load_bytes_relative pkt length check"

        This reverts commit 9ea14969874cd7896588df435c890f6f2f547821.

    commit 0b61d26b25a65d9ded4611426c6da9c78e41567c
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:22 2024 -0500

        Revert "bpf: btf: Fix end boundary calculation for type section"

        This reverts commit 08ef221c7fb604cb60c490fa999ec7254d492f05.

    commit 72fb2b9bb5b90f60ab71915fe4e57eeee3308163
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:18 2024 -0500

        Revert "bpf: btf: Fix a missing check bug"

        This reverts commit 594687e3e01e26086f3b0173e5eda9b9f0b672f8.

    commit 575a34ceba4013ad0230038f29f6ea0b3ba41a7e
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:15 2024 -0500

        Revert "bpf, btf: fix a missing check bug in btf_parse"

        This reverts commit 6bf31bbc438663756e92fb0aad4f5a35fd730fb0.

    commit bcca98c0bc5e19b38af3ddcd0feee80ad26e1f96
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:11 2024 -0500

        Revert "bpf: fix BTF limits"

        This reverts commit e351b26ae671dfacd82f27c1c5f66cf8089d930d.

    commit f71c484e340041d8828c94b39a233ea587d8cc09
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:07 2024 -0500

        Revert "bpf/btf: Fix BTF verification of enum members in struct/union"

        This reverts commit 861e65b744c171d59850e61a01715f194f25e45c.

    commit eca310722a2624d33cd49884aa18c36d435b10f8
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:56:02 2024 -0500

        Revert "bpf: btf: fix truncated last_member_type_id in btf_struct_resolve"

        This reverts commit d6cd1eac41b10e606ec7f445162a0617c01be973.

    commit caae5c99a3ca7bed0e318b31b6aa7ca8260a1c52
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:58 2024 -0500

        Revert "BACKPORT: net: bpf: rename ndo_xdp to ndo_bpf"

        This reverts commit 2a1ddcb6a384745195d57b4e4cdda2a55d2cbe47.

    commit f90bdcdaa095a4f10268bb740470a3e0893be21b
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:54 2024 -0500

        Revert "BACKPORT: bpf: offload: add infrastructure for loading programs for a specific netdev"

        This reverts commit a9516d402726094eafccce26a99cf5110d188be9.

    commit c6e0ce9019c06d9a45c030a2bc38eed320afd45a
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:50 2024 -0500

        Revert "bpf: offload: rename the ifindex field"

        This reverts commit 36bc9c7351a1dc78b3e71571998af381e876b4cb.

    commit 88b6a4d41b69df804b846a8ebdca410517e08343
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:46 2024 -0500

        Revert "BACKPORT: bpf: Check attach type at prog load time"

        This reverts commit fe5a0d514e4970d86983458136d4a2f6caeee365.

    commit 9ccfaa66a5ea042331f0aacdb3667e23c8ed363e
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:43 2024 -0500

        Revert "BACKPORT: bpf: introduce BPF_PROG_QUERY command"

        This reverts commit a5720688858170f1054f9549b5a628db1c252a88.

    commit adab2743b3fa0853d0351b33b0a286de745025e5
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:37 2024 -0500

        Revert "BACKPORT: bpf: Hooks for sys_bind"

        This reverts commit e484887c7e7aa026521ddc1773233368a6304b24.

    commit d462e09db98ad89b3a836f9b9a925812b0d8cfe7
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:33 2024 -0500

        Revert "BACKPORT: net: Introduce __inet_bind() and __inet6_bind"

        This reverts commit 41a3131c3e94c28fd084dd6f4358baee3824fd17.

    commit cdf7f55dc65b4bdf7ecfc924be77c6a039709b3d
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:29 2024 -0500

        Revert "BACKPORT: bpf: Hooks for sys_connect"

        This reverts commit f26fe7233e2885ef489707ab5a5a5dda9f081b80.

    commit 97685d5058f76ba4ea6dd2db157f4537f3a8953d
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:23 2024 -0500

        Revert "BACKPORT: bpf: Post-hooks for sys_bind"

        This reverts commit 284ac5bc7c70dac338301445e94e1ad40fb40fdb.

    commit d03d9c05036d3109eae643f473cc5a5ad0a80721
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:19 2024 -0500

        Revert "kernel: bpf: devmap: Create __dev_map_alloc_node"

        This reverts commit db726149fa9abfd1ca9add3e2db6b1524f7e90a3.

    commit 8c34bcb3e4c6630799764871b4af2e5f9344a371
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:15 2024 -0500

        Revert "BACKPORT: xdp: Add devmap_hash map type for looking up devices by hashed index"

        This reverts commit c4d4e1d201d8433e06b2ac66041d7105095a0204.

    commit ef277c7b3a08fd59943eb2b47af64afc513de008
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:11 2024 -0500

        Revert "BACKPORT: devmap: Allow map lookups from eBPF"

        This reverts commit 24d196375871c72de0de977de79afede5a7d1780.

    commit 4fcd87869c55c28ed59bff916d640147601816d2
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:07 2024 -0500

        Revert "gen_headers_{arm, arm64}: Add btf.h to the list"

        This reverts commit 37edfe7c90bac355885ffec3327b338a34619792.

    commit b89560e0b405b58ecc5fc12c15ad4f56147760d6
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:55:03 2024 -0500

        Revert "syscall: Fake uname to 4.19 for bpfloader/netd"

        This reverts commit 186e74af61269602d0c068d98928b1f25e03eba2.

    commit fd49f8c35eb7875d6810a5a52877ebc59bfd4530
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:54:59 2024 -0500

        Revert "syscall: Fake uname to 4.19 also for netbpfload"

        This reverts commit 34b9a1ab387d7dc83ede613b2c12b3741ea08edb.

    commit b853fcf2ff892664d0ff522ca7fd530bc94c023e
    Author: John Galt <johngaltfirstrun@gmail.com>
    Date:   Fri Dec 13 07:54:53 2024 -0500

        Revert "syscall: Increase bpf fake uname to 5.4"

        This reverts commit 9cdc014e11b410a7f03d8c968a35ee0dd6a28fff.

    # Conflicts:
    #	net/ipv4/af_inet.c
    #	net/ipv6/af_inet6.c

commit 4a0143fa36d300485650dc447b580151a69a3be2
Author: kondors1995 <normandija1945@gmail.com>
Date:   Wed Dec 18 13:48:16 2024 +0200

    Revert "syscall: Fake uname to 4.19 for bpfloader/netd"

    This reverts commit 417f37c97f.

commit 6f512c5c7341a51d7bbc9cdd93814764cae8868f
Author: kondors1995 <normandija1945@gmail.com>
Date:   Wed Dec 18 13:48:16 2024 +0200

    Revert "syscall: Fake uname to 4.19 also for netbpfload"

    This reverts commit a4c61c3d97.

commit 41f326616251f0122d81e518082ef7faaad4b2e5
Author: kondors1995 <normandija1945@gmail.com>
Date:   Wed Dec 18 13:48:15 2024 +0200

    Revert "syscall: Increase bpf fake uname to 5.4"

    This reverts commit 4a906017d4.

commit a0d3db72a836096cf533516d56c81a43150976ed
Author: kondors1995 <normandija1945@gmail.com>
Date:   Wed Dec 18 13:46:12 2024 +0200

    Revert "bpf: Hooks for sys_sendmsg"

    This reverts commit 735c155332.

commit 246eb3d90b95e0ab5aee8d5a9e9cd639c7beb174
Author: kondors1995 <normandija1945@gmail.com>
Date:   Wed Dec 18 13:45:08 2024 +0200

    Revert "syscall: Increase fake uname to 6.6.40"

    This reverts commit 92494b9920.

commit c56eaa5b7f170f58f2ade14bb71aaad2964b9018
Author: kondors1995 <normandija1945@gmail.com>
Date:   Mon Dec 9 21:35:20 2024 +0200

    raphael_defconfig: increase sbalance pooling rate to 10s

commit 54d190b8af
Author: Sultan Alsawaf <sultan@kerneltoast.com>
Date:   Wed Dec 4 15:53:22 2024 -0800

    sbalance: Fix severe misattribution of movable IRQs to the last active CPU

    Due to a horrible omission in the big IRQ list traversal, all movable IRQs
    are misattributed to the last active CPU in the system since that's what
    `bd` is last set to in the loop prior. This horribly breaks SBalance's
    notion of balance, producing nonsensical balancing decisions and failing to
    balance IRQs even when they are heavily imbalanced.

    Fix the massive breakage by adding the missing line of code to set `bd` to
    the CPU an IRQ actually belongs to, so that it's added to the correct CPU's
    movable IRQs list.

    Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>

commit f2fa2db581
Author: Sultan Alsawaf <sultan@kerneltoast.com>
Date:   Wed Dec 4 14:31:52 2024 -0800

    sbalance: Don't race with CPU hotplug

    When a CPU is hotplugged, cpu_active_mask is modified without any RCU
    synchronization. As a result, the only synchronization for cpu_active_mask
    provided by the hotplug code is the CPU hotplug lock.

    Furthermore, since IRQ balance is majorly disrupted during CPU hotplug due
    to mass IRQ migration off a dying CPU, SBalance just shouldn't operate
    while a CPU hotplug is in progress.

    Take the CPU hotplug lock in balance_irqs() to prevent races and mishaps
    during CPU hotplugs.

    Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>

commit a4e81ff60a
Author: Sultan Alsawaf <sultan@kerneltoast.com>
Date:   Wed Dec 4 14:16:48 2024 -0800

    sbalance: Convert various IRQ counter types to unsigned ints

    These counted values are actually unsigned ints, not unsigned longs.
    Convert them to unsigned ints since there's no reason for them to be longs.

    Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-12-19 17:34:31 +02:00
Sultan Alsawaf
ab091b84b5 sbalance: Fix severe misattribution of movable IRQs to the last active CPU
Due to a horrible omission in the big IRQ list traversal, all movable IRQs
are misattributed to the last active CPU in the system since that's what
`bd` is last set to in the loop prior. This horribly breaks SBalance's
notion of balance, producing nonsensical balancing decisions and failing to
balance IRQs even when they are heavily imbalanced.

Fix the massive breakage by adding the missing line of code to set `bd` to
the CPU an IRQ actually belongs to, so that it's added to the correct CPU's
movable IRQs list.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-12-19 17:33:27 +02:00
Sultan Alsawaf
70b23e7894 sbalance: Don't race with CPU hotplug
When a CPU is hotplugged, cpu_active_mask is modified without any RCU
synchronization. As a result, the only synchronization for cpu_active_mask
provided by the hotplug code is the CPU hotplug lock.

Furthermore, since IRQ balance is majorly disrupted during CPU hotplug due
to mass IRQ migration off a dying CPU, SBalance just shouldn't operate
while a CPU hotplug is in progress.

Take the CPU hotplug lock in balance_irqs() to prevent races and mishaps
during CPU hotplugs.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-12-19 17:33:27 +02:00
Sultan Alsawaf
023a941087 sbalance: Convert various IRQ counter types to unsigned ints
These counted values are actually unsigned ints, not unsigned longs.
Convert them to unsigned ints since there's no reason for them to be longs.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-12-19 17:33:22 +02:00
kondors1995
373d5574ec Merge branch 'linux-4.14.y' of https://github.com/openela/kernel-lts into 15.0 2024-10-20 18:10:19 +03:00
Tze-nan Wu
b28271a442 tracing: Fix overflow in get_free_elt()
commit bcf86c01ca4676316557dd482c8416ece8c2e143 upstream.

"tracing_map->next_elt" in get_free_elt() is at risk of overflowing.

Once it overflows, new elements can still be inserted into the tracing_map
even though the maximum number of elements (`max_elts`) has been reached.
Continuing to insert elements after the overflow could result in the
tracing_map containing "tracing_map->max_size" elements, leaving no empty
entries.
If any attempt is made to insert an element into a full tracing_map using
`__tracing_map_insert()`, it will cause an infinite loop with preemption
disabled, leading to a CPU hang problem.

Fix this by preventing any further increments to "tracing_map->next_elt"
once it reaches "tracing_map->max_elt".

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 08d43a5fa0 ("tracing: Add lock-free tracing_map")
Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Link: https://lore.kernel.org/20240805055922.6277-1-Tze-nan.Wu@mediatek.com
Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 302ceb625d7b990db205a15e371f9a71238de91c)
[Vegard: s/atomic_fetch_add_unless/__atomic_add_unless/ due to missing
 commit bfc18e389c7a09fbbbed6bf4032396685b14246e ("atomics/treewide:
 Rename __atomic_add_unless() => atomic_fetch_add_unless()".]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:30 +00:00
Justin Stitt
53390d85b1 ntp: Safeguard against time_constant overflow
commit 06c03c8edce333b9ad9c6b207d93d3a5ae7c10c0 upstream.

Using syzkaller with the recently reintroduced signed integer overflow
sanitizer produces this UBSAN report:

UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:738:18
9223372036854775806 + 4 cannot be represented in type 'long'
Call Trace:
 handle_overflow+0x171/0x1b0
 __do_adjtimex+0x1236/0x1440
 do_adjtimex+0x2be/0x740

The user supplied time_constant value is incremented by four and then
clamped to the operating range.

Before commit eea83d896e ("ntp: NTP4 user space bits update") the user
supplied value was sanity checked to be in the operating range. That change
removed the sanity check and relied on clamping after incrementing which
does not work correctly when the user supplied value is in the overflow
zone of the '+ 4' operation.

The operation requires CAP_SYS_TIME and the side effect of the overflow is
NTP getting out of sync.

Similar to the fixups for time_maxerror and time_esterror, clamp the user
space supplied value to the operating range.

[ tglx: Switch to clamping ]

Fixes: eea83d896e ("ntp: NTP4 user space bits update")
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240517-b4-sio-ntp-c-v2-1-f3a80096f36f@google.com
Closes: https://github.com/KSPP/linux/issues/352
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a13f8b269b6f4c9371ab149ecb65d2edb52e9669)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:30 +00:00
Justin Stitt
07f7f40df9 ntp: Clamp maxerror and esterror to operating range
[ Upstream commit 87d571d6fb77ec342a985afa8744bb9bb75b3622 ]

Using syzkaller alongside the newly reintroduced signed integer overflow
sanitizer spits out this report:

UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:461:16
9223372036854775807 + 500 cannot be represented in type 'long'
Call Trace:
 handle_overflow+0x171/0x1b0
 second_overflow+0x2d6/0x500
 accumulate_nsecs_to_secs+0x60/0x160
 timekeeping_advance+0x1fe/0x890
 update_wall_time+0x10/0x30

time_maxerror is unconditionally incremented and the result is checked
against NTP_PHASE_LIMIT, but the increment itself can overflow, resulting
in wrap-around to negative space.

Before commit eea83d896e ("ntp: NTP4 user space bits update") the user
supplied value was sanity checked to be in the operating range. That change
removed the sanity check and relied on clamping in handle_overflow() which
does not work correctly when the user supplied value is in the overflow
zone of the '+ 500' operation.

The operation requires CAP_SYS_TIME and the side effect of the overflow is
NTP getting out of sync.

Miroslav confirmed that the input value should be clamped to the operating
range and the same applies to time_esterror. The latter is not used by the
kernel, but the value still should be in the operating range as it was
before the sanity check got removed.

Clamp them to the operating range.

[ tglx: Changed it to clamping and included time_esterror ]

Fixes: eea83d896e ("ntp: NTP4 user space bits update")
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Link: https://lore.kernel.org/all/20240517-b4-sio-ntp-usec-v2-1-d539180f2b79@google.com
Closes: https://github.com/KSPP/linux/issues/354
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ cast things to __kernel_long_t to fix compiler warnings - gregkh ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9dfe2eef1ecfbb1f29e678700247de6010784eb9)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:29 +00:00
Thomas Gleixner
6fad54cc7a tick/broadcast: Move per CPU pointer access into the atomic section
commit 6881e75237a84093d0986f56223db3724619f26e upstream.

The recent fix for making the take over of the broadcast timer more
reliable retrieves a per CPU pointer in preemptible context.

This went unnoticed as compilers hoist the access into the non-preemptible
region where the pointer is actually used. But of course it's valid that
the compiler keeps it at the place where the code puts it which rightfully
triggers:

  BUG: using smp_processor_id() in preemptible [00000000] code:
       caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0

Move it to the actual usage site which is in a non-preemptible region.

Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable")
Reported-by: David Wang <00107082@163.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/87ttg56ers.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f54abf332a2bc0413cfa8bd6a8511f7aa99faea0)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:29 +00:00
Douglas Anderson
4925aa995a kdb: Use the passed prompt in kdb_position_cursor()
[ Upstream commit e2e821095949cde46256034975a90f88626a2a73 ]

The function kdb_position_cursor() takes in a "prompt" parameter but
never uses it. This doesn't _really_ matter since all current callers
of the function pass the same value and it's a global variable, but
it's a bit ugly. Let's clean it up.

Found by code inspection. This patch is expected to functionally be a
no-op.

Fixes: 09b35989421d ("kdb: Use format-strings rather than '\0' injection in kdb_read()")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240528071144.1.I0feb49839c6b6f4f2c4bf34764f5e95de3f55a66@changeid
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 90f2409c1d552f27a2b2bf8dc598d147c4173128)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:25 +00:00
Arnd Bergmann
fbcf6bbfac kdb: address -Wformat-security warnings
[ Upstream commit 70867efacf4370b6c7cdfc7a5b11300e9ef7de64 ]

When -Wformat-security is not disabled, using a string pointer
as a format causes a warning:

kernel/debug/kdb/kdb_io.c: In function 'kdb_read':
kernel/debug/kdb/kdb_io.c:365:36: error: format not a string literal and no format arguments [-Werror=format-security]
  365 |                         kdb_printf(kdb_prompt_str);
      |                                    ^~~~~~~~~~~~~~
kernel/debug/kdb/kdb_io.c: In function 'kdb_getstr':
kernel/debug/kdb/kdb_io.c:456:20: error: format not a string literal and no format arguments [-Werror=format-security]
  456 |         kdb_printf(kdb_prompt_str);
      |                    ^~~~~~~~~~~~~~

Use an explcit "%s" format instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5d5314d679 ("kdb: core for kgdb back end (1 of 2)")
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240528121154.3662553-1-arnd@kernel.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 22a100556ceab8b906ad180788bd6bdc07390f50)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:25 +00:00
Wenlin Kang
2527458f09 kdb: Fix bound check compiler warning
[ Upstream commit ca976bfb3154c7bc67c4651ecd144fdf67ccaee7 ]

The strncpy() function may leave the destination string buffer
unterminated, better use strscpy() instead.

This fixes the following warning with gcc 8.2:

kernel/debug/kdb/kdb_io.c: In function 'kdb_getstr':
kernel/debug/kdb/kdb_io.c:449:3: warning: 'strncpy' specified bound 256 equals destination size [-Wstringop-truncation]
   strncpy(kdb_prompt_str, prompt, CMD_BUFLEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Stable-dep-of: 70867efacf43 ("kdb: address -Wformat-security warnings")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b15593e2904d2ff0094b7170f806dba0eeefac75)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:25 +00:00
Thomas Gleixner
dbffea43e8 watchdog/perf: properly initialize the turbo mode timestamp and rearm counter
commit f944ffcbc2e1c759764850261670586ddf3bdabb upstream.

For systems on which the performance counter can expire early due to turbo
modes the watchdog handler has a safety net in place which validates that
since the last watchdog event there has at least 4/5th of the watchdog
period elapsed.

This works reliably only after the first watchdog event because the per
CPU variable which holds the timestamp of the last event is never
initialized.

So a first spurious event will validate against a timestamp of 0 which
results in a delta which is likely to be way over the 4/5 threshold of the
period.  As this might happen before the first watchdog hrtimer event
increments the watchdog counter, this can lead to false positives.

Fix this by initializing the timestamp before enabling the hardware event.
Reset the rearm counter as well, as that might be non zero after the
watchdog was disabled and reenabled.

Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx
Fixes: 7edaeb6841 ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6d94ca5d571dfdb34f12dc3f63273ea275e8f40c)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:24 +00:00
Yu Liao
3065612975 tick/broadcast: Make takeover of broadcast hrtimer reliable
commit f7d43dd206e7e18c182f200e67a8db8c209907fa upstream.

Running the LTP hotplug stress test on a aarch64 machine results in
rcu_sched stall warnings when the broadcast hrtimer was owned by the
un-plugged CPU. The issue is the following:

CPU1 (owns the broadcast hrtimer)	CPU2

				tick_broadcast_enter()
				  // shutdown local timer device
				  broadcast_shutdown_local()
				...
				tick_broadcast_exit()
				  clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT)
				  // timer device is not programmed
				  cpumask_set_cpu(cpu, tick_broadcast_force_mask)

				initiates offlining of CPU1
take_cpu_down()
/*
 * CPU1 shuts down and does not
 * send broadcast IPI anymore
 */
				takedown_cpu()
				  hotplug_cpu__broadcast_tick_pull()
				    // move broadcast hrtimer to this CPU
				    clockevents_program_event()
				      bc_set_next()
					hrtimer_start()
					/*
					 * timer device is not programmed
					 * because only the first expiring
					 * timer will trigger clockevent
					 * device reprogramming
					 */

What happens is that CPU2 exits broadcast mode with force bit set, then the
local timer device is not reprogrammed and CPU2 expects to receive the
expired event by the broadcast IPI. But this does not happen because CPU1
is offlined by CPU2. CPU switches the clockevent device to ONESHOT state,
but does not reprogram the device.

The subsequent reprogramming of the hrtimer broadcast device does not
program the clockevent device of CPU2 either because the pending expiry
time is already in the past and the CPU expects the event to be delivered.
As a consequence all CPUs which wait for a broadcast event to be delivered
are stuck forever.

Fix this issue by reprogramming the local timer device if the broadcast
force bit of the CPU is set so that the broadcast hrtimer is delivered.

[ tglx: Massage comment and change log. Add Fixes tag ]

Fixes: 989dcb645c ("tick: Handle broadcast wakeup of multiple cpus")
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240711124843.64167-1-liaoyu15@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dfe19aa91378972f10530635ad83b2d77f481044)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:23 +00:00
Adrian Hunter
26864f03cc perf: Prevent passing zero nr_pages to rb_alloc_aux()
[ Upstream commit dbc48c8f41c208082cfa95e973560134489e3309 ]

nr_pages is unsigned long but gets passed to rb_alloc_aux() as an int,
and is stored as an int.

Only power-of-2 values are accepted, so if nr_pages is a 64_bit value, it
will be passed to rb_alloc_aux() as zero.

That is not ideal because:
 1. the value is incorrect
 2. rb_alloc_aux() is at risk of misbehaving, although it manages to
 return -ENOMEM in that case, it is a result of passing zero to get_order()
 even though the get_order() result is documented to be undefined in that
 case.

Fix by simply validating the maximum supported value in the first place.
Use -ENOMEM error code for consistency with the current error code that
is returned in that case.

Fixes: 45bfb2e504 ("perf: Add AUX area to ring buffer for raw data streams")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240624201101.60186-6-adrian.hunter@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d7b1a76f33e6fc93924725b4410126740c890c44)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:20 +00:00
Adrian Hunter
6f7bc617b3 perf: Fix perf_aux_size() for greater-than 32-bit size
[ Upstream commit 3df94a5b1078dfe2b0c03f027d018800faf44c82 ]

perf_buffer->aux_nr_pages uses a 32-bit type, so a cast is needed to
calculate a 64-bit size.

Fixes: 45bfb2e504 ("perf: Add AUX area to ring buffer for raw data streams")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240624201101.60186-5-adrian.hunter@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 542abbf58e88f34dfc659b63476a5976acf52c0e)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
2024-10-10 10:27:20 +00:00
EmanuelCN
9ff663517c schedutil: Inline clo code with 5.4 2024-09-30 16:34:15 +03:00
Joel Fernandes (Google)
3e312c8e7b schedutil: Allow cpufreq requests to be made even when kthread kicked
Currently there is a chance of a schedutil cpufreq update request to be
dropped if there is a pending update request. This pending request can
be delayed if there is a scheduling delay of the irq_work and the wake
up of the schedutil governor kthread.

A very bad scenario is when a schedutil request was already just made,
such as to reduce the CPU frequency, then a newer request to increase
CPU frequency (even sched deadline urgent frequency increase requests)
can be dropped, even though the rate limits suggest that its Ok to
process a request. This is because of the way the work_in_progress flag
is used.

This patch improves the situation by allowing new requests to happen
even though the old one is still being processed. Note that in this
approach, if an irq_work was already issued, we just update next_freq
and don't bother to queue another request so there's no extra work being
done to make this happen.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-09-30 16:31:50 +03:00
Tim Zimmermann
92494b9920 syscall: Increase fake uname to 6.6.40
Change-Id: I77a8c5cd0e74eff97ae9b6b7e37812e2972cb3c8
2024-09-07 12:26:37 +03:00
Tim Zimmermann
31f7a8793a syscall: Only fake uname on very first call of netbpfload
* The bpf programs actually still support older kernels,
  we just need to bypass the very first check for kernel version

Change-Id: I4264782ee63efb26b95abd94774938d5456200a3
2024-09-07 12:26:24 +03:00
Daniel Borkmann
248cbcedae BACKPORT: bpf: fix unconnected udp hooks
Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently
to applications as also stated in original motivation in 7828f20e3779 ("Merge
branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter
two hooks into Cilium to enable host based load-balancing with Kubernetes,
I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes
typically sets up DNS as a service and is thus subject to load-balancing.

Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API
is currently insufficient and thus not usable as-is for standard applications
shipped with most distros. To break down the issue we ran into with a simple
example:

  # cat /etc/resolv.conf
  nameserver 147.75.207.207
  nameserver 147.75.207.208

For the purpose of a simple test, we set up above IPs as service IPs and
transparently redirect traffic to a different DNS backend server for that
node:

  # cilium service list
  ID   Frontend            Backend
  1    147.75.207.207:53   1 => 8.8.8.8:53
  2    147.75.207.208:53   1 => 8.8.8.8:53

The attached BPF program is basically selecting one of the backends if the
service IP/port matches on the cgroup hook. DNS breaks here, because the
hooks are not transparent enough to applications which have built-in msg_name
address checks:

  # nslookup 1.1.1.1
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
  [...]
  ;; connection timed out; no servers could be reached

  # dig 1.1.1.1
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
  ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
  [...]

  ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
  ;; global options: +cmd
  ;; connection timed out; no servers could be reached

For comparison, if none of the service IPs is used, and we tell nslookup
to use 8.8.8.8 directly it works just fine, of course:

  # nslookup 1.1.1.1 8.8.8.8
  1.1.1.1.in-addr.arpa	name = one.one.one.one.

In order to fix this and thus act more transparent to the application,
this needs reverse translation on recvmsg() side. A minimal fix for this
API is to add similar recvmsg() hooks behind the BPF cgroups static key
such that the program can track state and replace the current sockaddr_in{,6}
with the original service IP. From BPF side, this basically tracks the
service tuple plus socket cookie in an LRU map where the reverse NAT can
then be retrieved via map value as one example. Side-note: the BPF cgroups
static key should be converted to a per-hook static key in future.

Same example after this fix:

  # cilium service list
  ID   Frontend            Backend
  1    147.75.207.207:53   1 => 8.8.8.8:53
  2    147.75.207.208:53   1 => 8.8.8.8:53

Lookups work fine now:

  # nslookup 1.1.1.1
  1.1.1.1.in-addr.arpa    name = one.one.one.one.

  Authoritative answers can be found from:

  # dig 1.1.1.1

  ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
  ;; global options: +cmd
  ;; Got answer:
  ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550
  ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

  ;; OPT PSEUDOSECTION:
  ; EDNS: version: 0, flags:; udp: 512
  ;; QUESTION SECTION:
  ;1.1.1.1.                       IN      A

  ;; AUTHORITY SECTION:
  .                       23426   IN      SOA     a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400

  ;; Query time: 17 msec
  ;; SERVER: 147.75.207.207#53(147.75.207.207)
  ;; WHEN: Tue May 21 12:59:38 UTC 2019
  ;; MSG SIZE  rcvd: 111

And from an actual packet level it shows that we're using the back end
server when talking via 147.75.207.20{7,8} front end:

  # tcpdump -i any udp
  [...]
  12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
  12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
  12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
  12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
  [...]

In order to be flexible and to have same semantics as in sendmsg BPF
programs, we only allow return codes in [1,1] range. In the sendmsg case
the program is called if msg->msg_name is present which can be the case
in both, connected and unconnected UDP.

The former only relies on the sockaddr_in{,6} passed via connect(2) if
passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar
way to call into the BPF program whenever a non-NULL msg->msg_name was
passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note
that for TCP case, the msg->msg_name is ignored in the regular recvmsg
path and therefore not relevant.

For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE,
the hook is not called. This is intentional as it aligns with the same
semantics as in case of TCP cgroup BPF hooks right now. This might be
better addressed in future through a different bpf_attach_type such
that this case can be distinguished from the regular recvmsg paths,
for example.

Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg")
Change-Id: If2bab00efe5f37a591083fe2676e76f35f8cecc3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-09-07 12:24:22 +03:00
Alexei Starovoitov
1980188064 BACKPORT: bpf: enforce return code for cgroup-bpf programs
with addition of tnum logic the verifier got smart enough and
we can enforce return codes at program load time.
For now do so for cgroup-bpf program types.

Change-Id: Iae3a46c3d38810e47cbf4ec23356abae03ded736
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-09-07 12:24:22 +03:00
Andrey Ignatov
735c155332 bpf: Hooks for sys_sendmsg
In addition to already existing BPF hooks for sys_bind and sys_connect,
the patch provides new hooks for sys_sendmsg.

It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`
that provides access to socket itlself (properties like family, type,
protocol) and user-passed `struct sockaddr *` so that BPF program can
override destination IP and port for system calls such as sendto(2) or
sendmsg(2) and/or assign source IP to the socket.

The hooks are implemented as two new attach types:
`BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and
UDPv6 correspondingly.

UDPv4 and UDPv6 separate attach types for same reason as sys_bind and
sys_connect hooks, i.e. to prevent reading from / writing to e.g.
user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound.

The difference with already existing hooks is sys_sendmsg are
implemented only for unconnected UDP.

For TCP it doesn't make sense to change user-provided `struct sockaddr *`
at sendto(2)/sendmsg(2) time since socket either was already connected
and has source/destination set or wasn't connected and call to
sendto(2)/sendmsg(2) would lead to ENOTCONN anyway.

Connected UDP is already handled by sys_connect hooks that can override
source/destination at connect time and use fast-path later, i.e. these
hooks don't affect UDP fast-path.

Rewriting source IP is implemented differently than that in sys_connect
hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to
just bind socket to desired local IP address since source IP can be set
on per-packet basis by using ancillary data (cmsg(3)). So no matter if
socket is bound or not, source IP has to be rewritten on every call to
sys_sendmsg.

To do so two new fields are added to UAPI `struct bpf_sock_addr`;
* `msg_src_ip4` to set source IPv4 for UDPv4;
* `msg_src_ip6` to set source IPv6 for UDPv6.

Change-Id: Icf5938b0b69ddfb1e99dc2abc90204f7c97f0473
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2024-09-07 12:24:00 +03:00
kondors1995
478461a998 Merge branch 'linux-4.14.y' of https://github.com/openela/kernel-lts into 15.0 2024-09-07 12:22:53 +03:00
Sultan Alsawaf
d9324db241 cpufreq: schedutil: Allow single-CPU frequency to drop without idling
Given that a CPU's clock is gated at even the shallowest idle state,
waiting until a CPU idles at least once before reducing its frequency is
putting the cart before the horse. For long-running workloads with low
compute needs, requiring an idle call since the last frequency update to
lower the CPU's frequency results in significantly increased energy usage.

Given that there is already a mechanism in place to ratelimit frequency
changes, this heuristic is wholly unnecessary.

Allow single-CPU performance domains to drop their frequency without
requiring an idle call in between to improve energy. Right off the bat,
this reduces CPU power consumption by 7.5% playing a cat gif in Firefox on
a Pixel 8 (270 mW -> 250 mW). And there is no visible loss of performance.

Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
2024-09-01 17:45:32 +03:00
EmanuelCN
de9a4993ce Revert "cpufreq: schedutil: Ignore CPU load older than WALT window size"
This reverts commit b9d1ecec68e9750f819dca02af47caee55d06a58.
2024-09-01 17:44:20 +03:00