kernel_xiaomi_raphael

Evolution-X-Devices/kernel_xiaomi_raphael

Author	SHA1	Message	Date
Rafael J. Wysocki	a6a20216ad	cpufreq/sched: Set need_freq_update in ignore_dl_rate_limit() Notice that ignore_dl_rate_limit() need not piggy back on the limits_changed handling to achieve its goal (which is to enforce a frequency update before its due time). Namely, if sugov_should_update_freq() is updated to check sg_policy->need_freq_update and return 'true' if it is set when sg_policy->limits_changed is not set, ignore_dl_rate_limit() may set the former directly instead of setting the latter, so it can avoid hitting the memory barrier in sugov_should_update_freq(). Update the code accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/10666429.nUPlyArG6x@rjwysocki.net	2025-05-20 21:57:27 +03:00
Rafael J. Wysocki	38ac58ce7f	cpufreq/sched: Explicitly synchronize limits_changed flag handling The handling of the limits_changed flag in struct sugov_policy needs to be explicitly synchronized to ensure that cpufreq policy limits updates will not be missed in some cases. Without that synchronization it is theoretically possible that the limits_changed update in sugov_should_update_freq() will be reordered with respect to the reads of the policy limits in cpufreq_driver_resolve_freq() and in that case, if the limits_changed update in sugov_limits() clobbers the one in sugov_should_update_freq(), the new policy limits may not take effect for a long time. Likewise, the limits_changed update in sugov_limits() may theoretically get reordered with respect to the updates of the policy limits in cpufreq_set_policy() and if sugov_should_update_freq() runs between them, the policy limits change may be missed. To ensure that the above situations will not take place, add memory barriers preventing the reordering in question from taking place and add READ_ONCE() and WRITE_ONCE() annotations around all of the limits_changed flag updates to prevent the compiler from messing up with that code. Fixes: 600f5badb78c ("cpufreq: schedutil: Don't skip freq update when limits change") Cc: 5.3+ <stable@vger.kernel.org> # 5.3+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/3376719.44csPzL39Z@rjwysocki.net	2025-05-20 21:57:27 +03:00
Rafael J. Wysocki	f56461366c	cpufreq/sched: Fix the usage of CPUFREQ_NEED_UPDATE_LIMITS Commit 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update") modified sugov_should_update_freq() to set the need_freq_update flag only for drivers with CPUFREQ_NEED_UPDATE_LIMITS set, but that flag generally needs to be set when the policy limits change because the driver callback may need to be invoked for the new limits to take effect. However, if the return value of cpufreq_driver_resolve_freq() after applying the new limits is still equal to the previously selected frequency, the driver callback needs to be invoked only in the case when CPUFREQ_NEED_UPDATE_LIMITS is set (which means that the driver specifically wants its callback to be invoked every time the policy limits change). Update the code accordingly to avoid missing policy limits changes for drivers without CPUFREQ_NEED_UPDATE_LIMITS. Fixes: 8e461a1cb43d ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update") Closes: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org/ Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/3010358.e9J7NaK4W3@rjwysocki.net	2025-05-20 21:57:22 +03:00
zihan zhou	dcc4fc8469	sched: Reduce the default slice to avoid tasks getting an extra tick The old default value for slice is 0.75 msec * (1 + ilog(ncpus)) which means that we have a default slice of: 0.75 for 1 cpu 1.50 up to 3 cpus 2.25 up to 7 cpus 3.00 for 8 cpus and above. For HZ=250 and HZ=100, because of the tick accuracy, the runtime of tasks is far higher than their slice. For HZ=1000 with 8 cpus or more, the accuracy of tick is already satisfactory, but there is still an issue that tasks will get an extra tick because the tick often arrives a little faster than expected. In this case, the task can only wait until the next tick to consider that it has reached its deadline, and will run 1ms longer. vruntime + sysctl_sched_base_slice = deadline \|-----------\|-----------\|-----------\|-----------\| 1ms 1ms 1ms 1ms ^ ^ ^ ^ tick1 tick2 tick3 tick4(nearly 4ms) There are two reasons for tick error: clockevent precision and the CONFIG_IRQ_TIME_ACCOUNTING/CONFIG_PARAVIRT_TIME_ACCOUNTING. with CONFIG_IRQ_TIME_ACCOUNTING every tick will be less than 1ms, but even without it, because of clockevent precision, tick still often less than 1ms. In order to make scheduling more precise, we changed 0.75 to 0.70, Using 0.70 instead of 0.75 should not change much for other configs and would fix this issue: 0.70 for 1 cpu 1.40 up to 3 cpus 2.10 up to 7 cpus 2.8 for 8 cpus and above. This does not guarantee that tasks can run the slice time accurately every time, but occasionally running an extra tick has little impact. Signed-off-by: zihan zhou <15645113830zzh@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20250208075322.13139-1-15645113830zzh@gmail.com [Helium-Studio: Adapt for 8 cpus] Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>	2025-05-20 21:54:13 +03:00
Alexander Winkowski	6d7dccfd20	sched: Apply Android tweaks manually Tunables can't be changed with CONFIG_SCHED_DEBUG=n `b4b3950e52/rootdir/init.rc (L323)` Signed-off-by: Alexander Winkowski <dereference23@outlook.com> Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>	2025-05-20 21:54:12 +03:00
Helium-Studio	fc1fb82ca2	Revert "sched: promote nodes out of CONFIG_SCHED_DEBUG" * Let's apply android tunes manually. This reverts commit `c810b18857`. Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>	2025-05-20 21:54:12 +03:00
kondors1995	2d1e1c8056	kernel:rcu: Drop Ofast flag I have no idea way i even had it	2025-05-20 21:54:12 +03:00
Sultan Alsawaf	56058927a1	sched/cass: Don't pack tasks with uclamp boosts below minimum CPU capacity To save energy, CASS may prefer non-idle CPUs for uclamp-boosted tasks in order to pack them onto a single performance domain rather than spreading them across multiple performance domains. This way, it is more likely for only one performance domain to be boosted a higher P-state when there is more than one uclamp-boosted task running. However, when a task has a uclamp boost value that is below a CPU's minimum capacity, it is nearly the same thing as not having a uclamp boost at all. In spite of that, CASS may still prefer non-idle CPUs for tasks with bogus uclamp boost values. This is not only worse for latency, but also energy efficiency since the load on the CPU is spread less evenly as a result. Therefore, don't pack tasks with uclamp boosts below a CPU's minimum configured capacity, since such tasks do not force the CPU to run at a higher P-state. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Helium-Studio <67852324+Helium-Studio@users.noreply.github.com>	2025-03-08 11:51:49 +02:00
kondors1995	ec4ff863b0	Revert "sched/fair: Skip cpu if task does not fit in" Causes issues with compilation and i am quite sure its not needed with cass This reverts commit `16bd86c028`.	2025-03-08 11:51:49 +02:00
Viresh Kumar	317886d82c	[ADAPTED/PARTIAL] sched/fair: Introduce fits_capacity() The same formula to check utilization against capacity (after considering capacity_margin) is already used at 5 different locations. This patch creates a new macro, fits_capacity(), which can be used from all these locations without exposing the details of it and hence simplify code. All the 5 code locations are updated as well to use it.. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/b477ac75a2b163048bdaeb37f57b4c3f04f75a31.1559631700.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-03-08 11:51:49 +02:00
Sultan Alsawaf	578309e112	sched/cass: Avoid using the prime CPU on systems that have one On arm64 systems with a prime CPU, treating this CPU – which is the single fastest one in the system – the same as any other CPU is extremely bad for energy efficiency. This is because prime CPUs are designed to be very fast at the expense of energy efficiency, serving as the single-core performance workhorse of a system. Since CASS hasn't been giving special treatment to prime CPUs, CASS has been balancing relative load onto prime CPUs at great expense to energy. Thanks to the checks in place for CPU overload and task fit, it's easy to adjust CASS to avoid using prime CPUs without hurting single-core or multi-core performance. Place a check just below the task fit check for whether or not a candidate is a prime CPU. This way, when CPUs are overloaded or a task only fits on the prime CPU, CASS will still utilize the prime CPU without a loss of performance. This provides a double-digit percent improvement to energy efficiency across the board as measured on Tensor G3 (Pixel 8) and Tensor G4 (Pixel 9 Pro). Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2025-03-08 11:51:49 +02:00
Sultan Alsawaf	d1d7603543	sched/cass: Don't fight the idle load balancer The idle load balancer (ILB) is kicked whenever a task is misfit, meaning that the task doesn't fit on its CPU (i.e., fits_capacity() == false). Since CASS makes no attempt to place tasks such that they'll fit on the CPU they're placed upon, the ILB works harder to correct this and rebalances misfit tasks onto a CPU with sufficient capacity. By fighting the ILB like this, CASS degrades both energy efficiency and performance. Play nicely with the ILB by trying to place tasks onto CPUs that fit. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2025-03-08 11:51:49 +02:00
Sultan Alsawaf	91cf9e8478	sched/fair: Remove throughput optimization that keeps tasks on big CPUs When the load balancer looks for the busiest group to detach tasks from, it deliberately ignores higher-capacity groups that aren't egregiously imbalanced. This is done in an attempt to improve throughput, while resulting in a significant hit to energy efficiency: on Tensor G4 (Pixel 9 Pro), removing this optimization reduces energy usage by around 7% (400 mW -> 370 mW) for a light gaming scenario. Since this optimization doesn't provide any notable throughput improvement (hackbench actually performs slightly better without it), remove it to improve energy efficiency. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2025-03-08 11:51:49 +02:00
Sultan Alsawaf	a5269bd49f	sched/fair: Don't needlessly migrate a lone task to a higher capacity CPU When a CPU has only one running CFS task and there's a higher capacity CPU that's idle, that lone task may be migrated to the higher capacity CPU just because of RT and IRQ load. If the CPU running the lone CFS task has sufficient capacity for the task, then let it run that task. Migrating the task up to a higher capacity CPU causes that CPU to be kicked out of idle and degrades energy efficiency without an appreciable performance improvement. The load balancer will take care of migrating the task anyway if it becomes a misfit, so this heuristic isn't needed. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2025-03-08 11:51:48 +02:00
Vincent Guittot	414d6d591f	sched/fair: Fix unnecessary increase of balance interval In case of active balancing, we increase the balance interval to cover pinned tasks cases not covered by all_pinned logic. Neverthless, the active migration triggered by asym packing should be treated as the normal unbalanced case and reset the interval to default value, otherwise active migration for asym_packing can be easily delayed for hundreds of ms because of this pinned task detection mechanism. The same happens to other conditions tested in need_active_balance() like misfit task and when the capacity of src_cpu is reduced compared to dst_cpu (see comments in need_active_balance() for details). Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: valentin.schneider@arm.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jesse Chan <jc@linux.com> Signed-off-by: billaids <jimmy.nelle@hsw-stud.de>	2025-03-08 11:51:13 +02:00
NeilBrown	98567695d7	BACKPORT: cred: add get_cred_rcu() Sometimes we want to opportunistically get a ref to a cred in an rcu_read_lock protected section. get_task_cred() does this, and NFS does as similar thing with its own credential structures. To prepare for NFS converting to use 'struct cred' more uniformly, define get_cred_rcu(), and use it in get_task_cred(). Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> [neobuddy89: Backport for KernelSU-Next] Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>	2025-03-08 11:41:42 +02:00
Sultan Alsawaf	7dde6beb26	cpufreq: schedutil: Set default rate limit to 2000 us This is empirically observed to yield good performance with reduced power consumption. With "cpufreq: schedutil: Ignore rate limit when scaling up with FIE present", this only affects frequency reductions when FIE is present, since there is no rate limit applied when scaling up. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2024-12-19 17:35:47 +02:00
Sultan Alsawaf	7e401ca729	cpufreq: schedutil: Ignore rate limit when scaling up with FIE present When schedutil disregards a frequency transition due to the transition rate limit, there is no guaranteed deadline as to when the frequency transition will actually occur after the rate limit expires. For instance, depending on how long a CPU spends in a preempt/IRQs disabled context, a rate-limited frequency transition may be delayed indefinitely, until said CPU reaches the scheduler again. This also hurts tasks boosted via UCLAMP_MIN. For frequency transitions _down_, this only poses a theoretical loss of energy savings since a CPU may remain at a higher frequency than necessary for an indefinite period beyond the rate limit expiry. For frequency transitions _up_, however, this poses a significant hit to performance when a CPU is stuck at an insufficient frequency for an indefinitely long time. In latency-sensitive and bursty workloads especially, a missed frequency transition up can result in a significant performance loss due to a CPU operating at an insufficient frequency for too long. When support for the Frequency Invariant Engine (FIE) _isn't_ present, a rate limit is always required for the scheduler to compute CPU utilization with some semblance of accuracy: any frequency transition that occurs before the previous transition latches would result in the scheduler not knowing the frequency a CPU is actually operating at, thereby trashing the computed CPU utilization. However, when FIE support _is_ present, there's no technical requirement to rate limit all frequency transitions to a cpufreq driver's reported transition latency. With FIE, the scheduler's CPU utilization tracking is unaffected by any frequency transitions that occur before the previous frequency is latched. Therefore, ignore the frequency transition rate limit when scaling up on systems where FIE is present. This guarantees that transitions to a higher frequency cannot be indefinitely delayed, since they simply cannot be delayed at all. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2024-12-19 17:35:47 +02:00
Sultan Alsawaf	27979b9423	cpufreq: schedutil: Fix superfluous updates caused by need_freq_update A redundant frequency update is only truly needed when there is a policy limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS. In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a frequency update _all the time_, not just for a policy limits change, because need_freq_update is never cleared. Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads to a redundant frequency update, regardless of whether or not the driver specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the same as the current one. Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS when there's a policy limits change, and clearing need_freq_update when a requisite redundant update occurs. This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test and instead setting need_freq_update to false in sugov_update_next_freq(). Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2024-12-19 17:35:47 +02:00
Rafael J. Wysocki	1ca115cdde	cpufreq: schedutil: Simplify sugov_update_next_freq() Rearrange a conditional to make it more straightforward. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2024-12-19 17:35:41 +02:00
Viresh Kumar	50bc315faf	cpufreq: schedutil: Don't skip freq update if need_freq_update is set The cpufreq policy's frequency limits (min/max) can get changed at any point of time, while schedutil is trying to update the next frequency. Though the schedutil governor has necessary locking and support in place to make sure we don't miss any of those updates, there is a corner case where the governor will find that the CPU is already running at the desired frequency and so may skip an update. For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz and is running at 1 GHz currently. Schedutil tries to update the frequency to 1.2 GHz, during this time the policy limits get changed as policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the frequency at various instances, we will eventually set the frequency to 1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq. Now lets say the policy limits get changed back at this time with policy->min as 1 GHz. The next time schedutil is invoked by the scheduler, we will reevaluate the next frequency (because need_freq_update will get set due to limits change event) and lets say we want to set the frequency to 1.2 GHz again. At this point sugov_update_next_freq() will find the next_freq == current_freq and will abort the update, while the CPU actually runs at 1.4 GHz. Until now need_freq_update was used as a flag to indicate that the policy's frequency limits have changed, and that we should consider the new limits while reevaluating the next frequency. This patch fixes the above mentioned issue by extending the purpose of the need_freq_update flag. If this flag is set now, the schedutil governor will not try to abort a frequency change even if next_freq == current_freq. As similar behavior is required in the case of CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be set to false if that flag is set for the driver. We also don't need to consider the need_freq_update flag in sugov_update_single() anymore to handle the special case of busy CPU, as we won't abort a frequency update anymore. Reported-by: zhuguangqing <zhuguangqing@xiaomi.com> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Rearrange code to avoid a branch ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-12-19 17:35:18 +02:00
Rafael J. Wysocki	bece027796	cpufreq: schedutil: Always call driver if CPUFREQ_NEED_UPDATE_LIMITS is set Because sugov_update_next_freq() may skip a frequency update even if the need_freq_update flag has been set for the policy at hand, policy limits updates may not take effect as expected. For example, if the intel_pstate driver operates in the passive mode with HWP enabled, it needs to update the HWP min and max limits when the policy min and max limits change, respectively, but that may not happen if the target frequency does not change along with the limit at hand. In particular, if the policy min is changed first, causing the target frequency to be adjusted to it, and the policy max limit is changed later to the same value, the HWP max limit will not be updated to follow it as expected, because the target frequency is still equal to the policy min limit and it will not change until that limit is updated. To address this issue, modify get_next_freq() to let the driver callback run if the CPUFREQ_NEED_UPDATE_LIMITS cpufreq driver flag is set regardless of whether or not the new frequency to set is equal to the previous one. Fixes: f6ebbcf08f37 ("cpufreq: intel_pstate: Implement passive mode with HWP enabled") Reported-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: 1c534352f47f cpufreq: Introduce CPUFREQ_NEED_UPDATE_LIMITS ... Cc: 5.9+ <stable@vger.kernel.org> # 5.9+: a62f68f5ca53 cpufreq: Introduce cpufreq_driver_test_flags() Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-12-19 17:35:18 +02:00
EmanuelCN	a84a39e2e9	cpufreq/schedutil: Remove up/down rate limits To make way for new changes	2024-12-19 17:35:17 +02:00
EmanuelCN	103eeccc95	cpufreq/schedutil: Don't limit util to max_cap in dvfs_headroom	2024-12-19 17:35:08 +02:00
kondors1995	2d5c481f29	bpf: squash revert spoofing and some backports: Squashed commit of the following: commit 259593385c05a430c4685b611c0e43b4272c22f8 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 08:30:37 2024 -0500 bpf: squash revert spoofing and some backports: Squashed commit of the following: commit 8ac5df9c8bc9575059fff6cea0c40463b96fc129 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:58:17 2024 -0500 Revert "BACKPORT: bpf: add skb_load_bytes_relative helper" This reverts commit 029893dcc5d67af16fdf0723bacaae37ec567f67. commit dbcbceafe848744ec188f74e87e9717916d359ea Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:58:13 2024 -0500 Revert "BACKPORT: bpf: encapsulate verifier log state into a structure" This reverts commit d861145b97d247cbd9fe1400df52155f48639126. commit 478f4dfee0406b54525e68764cc9ba48af1624fc Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:58:10 2024 -0500 Revert "BACKPORT: bpf: Rename bpf_verifer_log" This reverts commit 5d088635de1bf2d6ae9ea94e3dd1c601d30c0cce. commit 7bc7c24beb82168b49337530cb56b5dfeeafe19a Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:58:07 2024 -0500 Revert "BACKPORT: bpf: btf: Introduce BPF Type Format (BTF)" This reverts commit 93d34e26514b4d9d15fd176706f57634b2e97485. commit 7106457ba90a459b6241fdd44df658c1b52c0e4b Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:58:03 2024 -0500 Revert "bpf: Update logging functions to work with BTF" This reverts commit 97e6c528eb2f76c58a3b6a4c1e7fbeafcd97633a. commit 08e68c7ba56f5e78fd1afcd5a2164716a75b0fe3 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:58:00 2024 -0500 Revert "bpf: btf: Validate type reference" This reverts commit c7b7eecbc1134e5d8865af2cc0692fc7156175d5. commit 7763cf0831970a64ed62f9b7362fca02ab6e83f1 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:57:51 2024 -0500 Revert "bpf: btf: Check members of struct/union" This reverts commit 9a77b51cad6f04866ca067ca0e70a89b9f59ed56. commit eb033235f666b5f66995f4cf89702de7ab4721f8 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:57:47 2024 -0500 Revert "bpf: btf: Add pretty print capability for data with BTF type info" This reverts commit 745692103435221d6e39bc177811769995540525. commit c32995674ace91e06c591d2f63177585e81adc75 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:57:43 2024 -0500 Revert "BACKPORT: bpf: btf: Add BPF_BTF_LOAD command" This reverts commit 4e0afd38e20e5aa2df444361309bc07251ca6b2a. commit 1310bc8d4aca0015c8723e7624121eddf76b3244 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:57:38 2024 -0500 Revert "bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd" This reverts commit d4b5d76d9101b97e6fe5181bcefe7f601ed19926. commit 881a49445608712bdb0a0f0c959838bdbc725f62 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:57:34 2024 -0500 Revert "BACKPORT: bpf: btf: Clean up btf.h in uapi" This reverts commit 26b661822933d41b3feb59bb284334bfbbc82af4. commit e2109fd858ebd5fe392c8bf579b9350fbca35a35 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:57:29 2024 -0500 Revert "bpf: btf: Avoid WARN_ON when CONFIG_REFCOUNT_FULL=y" This reverts commit 9abf878903404e649fef4ad0b189eec1c13d29fe. commit 088a7d9137f03da4e0fc1d72add3901823081ccd Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:57:23 2024 -0500 Revert "bpf: Fix compiler warning on info.map_ids for 32bit platform" This reverts commit a3a278e1f6cf167d538ac52f4ad60bb9cf8d4129. commit 6e14aed6b63f2b266982454d83678445c062cf39 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:57:13 2024 -0500 Revert "bpf: btf: Change how section is supported in btf_header" This reverts commit 4b60ffd683eb623a184b46761777838d7c49e707. commit 151a60855c23bf0317734031481d779efb369d6c Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:57:08 2024 -0500 Revert "bpf: btf: Check array->index_type" This reverts commit b00e10f1a073fadce178b6fb62496722e16db303. commit 49775e9074a54ac5f60f518e6fc5a26172996eae Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:57:01 2024 -0500 Revert "bpf: btf: Remove unused bits from uapi/linux/btf.h" This reverts commit c90c6ad34f7a8f565f351d21c2d5b9706838767d. commit b6d6c6ab28e4b018da6ce9e64125e63f4191d3d9 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:58 2024 -0500 Revert "bpf: btf: Avoid variable length array" This reverts commit fe7d1f7750242e77a73839d173ac36c3e39d4171. commit a45bedecb9b1175fef96f2d64fba2d61777dbf35 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:49 2024 -0500 Revert "bpf: btf: avoid -Wreturn-type warning" This reverts commit 78214f1e390bf1d69d9ae4ee80072ac85c34619e. commit 445efb8465b9fa5706d81098417f15656265322e Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:46 2024 -0500 Revert "bpf: btf: Check array t->size" This reverts commit aed532e7466f77885a362e4b863bf90c41e834ba. commit 8aada590d525de735cf39196d88722e727c141e9 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:42 2024 -0500 Revert "bpf: btf: Ensure t->type == 0 for BTF_KIND_FWD" This reverts commit 8c8b601dcc2e62e1276b73dfee8b49e40fb65944. commit ed67ad09e866c9c30897488088bbb4555ea3dc80 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:38 2024 -0500 Revert "bpf: btf: Fix bitfield extraction for big endian" This reverts commit b0696a226c52868d64963f01665dd1a640a92f2b. commit 5cc64db782daf86cdf7ac77133ca94181bb29146 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:35 2024 -0500 Revert "bpf: btf: Clean up BTF_INT_BITS() in uapi btf.h" This reverts commit 0f008594540b09c667ea88fc87cf289b8db334da. commit 3a5c6b9010426449c08ecdcc10e758431b1e515f Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:31 2024 -0500 Revert "bpf: btf: Ensure the member->offset is in the right order" This reverts commit c5e361ecd6d45a7cdbffda02e4691a7a37198bdd. commit bd6173c1ac458b08d6cedaf06e6e53c93e6b0cc5 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:26 2024 -0500 Revert "bpf: fix bpf_skb_load_bytes_relative pkt length check" This reverts commit 9ea14969874cd7896588df435c890f6f2f547821. commit 0b61d26b25a65d9ded4611426c6da9c78e41567c Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:22 2024 -0500 Revert "bpf: btf: Fix end boundary calculation for type section" This reverts commit 08ef221c7fb604cb60c490fa999ec7254d492f05. commit 72fb2b9bb5b90f60ab71915fe4e57eeee3308163 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:18 2024 -0500 Revert "bpf: btf: Fix a missing check bug" This reverts commit 594687e3e01e26086f3b0173e5eda9b9f0b672f8. commit 575a34ceba4013ad0230038f29f6ea0b3ba41a7e Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:15 2024 -0500 Revert "bpf, btf: fix a missing check bug in btf_parse" This reverts commit 6bf31bbc438663756e92fb0aad4f5a35fd730fb0. commit bcca98c0bc5e19b38af3ddcd0feee80ad26e1f96 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:11 2024 -0500 Revert "bpf: fix BTF limits" This reverts commit e351b26ae671dfacd82f27c1c5f66cf8089d930d. commit f71c484e340041d8828c94b39a233ea587d8cc09 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:07 2024 -0500 Revert "bpf/btf: Fix BTF verification of enum members in struct/union" This reverts commit 861e65b744c171d59850e61a01715f194f25e45c. commit eca310722a2624d33cd49884aa18c36d435b10f8 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:56:02 2024 -0500 Revert "bpf: btf: fix truncated last_member_type_id in btf_struct_resolve" This reverts commit d6cd1eac41b10e606ec7f445162a0617c01be973. commit caae5c99a3ca7bed0e318b31b6aa7ca8260a1c52 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:58 2024 -0500 Revert "BACKPORT: net: bpf: rename ndo_xdp to ndo_bpf" This reverts commit 2a1ddcb6a384745195d57b4e4cdda2a55d2cbe47. commit f90bdcdaa095a4f10268bb740470a3e0893be21b Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:54 2024 -0500 Revert "BACKPORT: bpf: offload: add infrastructure for loading programs for a specific netdev" This reverts commit a9516d402726094eafccce26a99cf5110d188be9. commit c6e0ce9019c06d9a45c030a2bc38eed320afd45a Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:50 2024 -0500 Revert "bpf: offload: rename the ifindex field" This reverts commit 36bc9c7351a1dc78b3e71571998af381e876b4cb. commit 88b6a4d41b69df804b846a8ebdca410517e08343 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:46 2024 -0500 Revert "BACKPORT: bpf: Check attach type at prog load time" This reverts commit fe5a0d514e4970d86983458136d4a2f6caeee365. commit 9ccfaa66a5ea042331f0aacdb3667e23c8ed363e Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:43 2024 -0500 Revert "BACKPORT: bpf: introduce BPF_PROG_QUERY command" This reverts commit a5720688858170f1054f9549b5a628db1c252a88. commit adab2743b3fa0853d0351b33b0a286de745025e5 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:37 2024 -0500 Revert "BACKPORT: bpf: Hooks for sys_bind" This reverts commit e484887c7e7aa026521ddc1773233368a6304b24. commit d462e09db98ad89b3a836f9b9a925812b0d8cfe7 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:33 2024 -0500 Revert "BACKPORT: net: Introduce __inet_bind() and __inet6_bind" This reverts commit 41a3131c3e94c28fd084dd6f4358baee3824fd17. commit cdf7f55dc65b4bdf7ecfc924be77c6a039709b3d Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:29 2024 -0500 Revert "BACKPORT: bpf: Hooks for sys_connect" This reverts commit f26fe7233e2885ef489707ab5a5a5dda9f081b80. commit 97685d5058f76ba4ea6dd2db157f4537f3a8953d Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:23 2024 -0500 Revert "BACKPORT: bpf: Post-hooks for sys_bind" This reverts commit 284ac5bc7c70dac338301445e94e1ad40fb40fdb. commit d03d9c05036d3109eae643f473cc5a5ad0a80721 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:19 2024 -0500 Revert "kernel: bpf: devmap: Create __dev_map_alloc_node" This reverts commit db726149fa9abfd1ca9add3e2db6b1524f7e90a3. commit 8c34bcb3e4c6630799764871b4af2e5f9344a371 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:15 2024 -0500 Revert "BACKPORT: xdp: Add devmap_hash map type for looking up devices by hashed index" This reverts commit c4d4e1d201d8433e06b2ac66041d7105095a0204. commit ef277c7b3a08fd59943eb2b47af64afc513de008 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:11 2024 -0500 Revert "BACKPORT: devmap: Allow map lookups from eBPF" This reverts commit 24d196375871c72de0de977de79afede5a7d1780. commit 4fcd87869c55c28ed59bff916d640147601816d2 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:07 2024 -0500 Revert "gen_headers_{arm, arm64}: Add btf.h to the list" This reverts commit 37edfe7c90bac355885ffec3327b338a34619792. commit b89560e0b405b58ecc5fc12c15ad4f56147760d6 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:55:03 2024 -0500 Revert "syscall: Fake uname to 4.19 for bpfloader/netd" This reverts commit 186e74af61269602d0c068d98928b1f25e03eba2. commit fd49f8c35eb7875d6810a5a52877ebc59bfd4530 Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:54:59 2024 -0500 Revert "syscall: Fake uname to 4.19 also for netbpfload" This reverts commit 34b9a1ab387d7dc83ede613b2c12b3741ea08edb. commit b853fcf2ff892664d0ff522ca7fd530bc94c023e Author: John Galt <johngaltfirstrun@gmail.com> Date: Fri Dec 13 07:54:53 2024 -0500 Revert "syscall: Increase bpf fake uname to 5.4" This reverts commit 9cdc014e11b410a7f03d8c968a35ee0dd6a28fff. # Conflicts: # net/ipv4/af_inet.c # net/ipv6/af_inet6.c commit 4a0143fa36d300485650dc447b580151a69a3be2 Author: kondors1995 <normandija1945@gmail.com> Date: Wed Dec 18 13:48:16 2024 +0200 Revert "syscall: Fake uname to 4.19 for bpfloader/netd" This reverts commit `417f37c97f`. commit 6f512c5c7341a51d7bbc9cdd93814764cae8868f Author: kondors1995 <normandija1945@gmail.com> Date: Wed Dec 18 13:48:16 2024 +0200 Revert "syscall: Fake uname to 4.19 also for netbpfload" This reverts commit `a4c61c3d97`. commit 41f326616251f0122d81e518082ef7faaad4b2e5 Author: kondors1995 <normandija1945@gmail.com> Date: Wed Dec 18 13:48:15 2024 +0200 Revert "syscall: Increase bpf fake uname to 5.4" This reverts commit `4a906017d4`. commit a0d3db72a836096cf533516d56c81a43150976ed Author: kondors1995 <normandija1945@gmail.com> Date: Wed Dec 18 13:46:12 2024 +0200 Revert "bpf: Hooks for sys_sendmsg" This reverts commit `735c155332`. commit 246eb3d90b95e0ab5aee8d5a9e9cd639c7beb174 Author: kondors1995 <normandija1945@gmail.com> Date: Wed Dec 18 13:45:08 2024 +0200 Revert "syscall: Increase fake uname to 6.6.40" This reverts commit `92494b9920`. commit c56eaa5b7f170f58f2ade14bb71aaad2964b9018 Author: kondors1995 <normandija1945@gmail.com> Date: Mon Dec 9 21:35:20 2024 +0200 raphael_defconfig: increase sbalance pooling rate to 10s commit `54d190b8af` Author: Sultan Alsawaf <sultan@kerneltoast.com> Date: Wed Dec 4 15:53:22 2024 -0800 sbalance: Fix severe misattribution of movable IRQs to the last active CPU Due to a horrible omission in the big IRQ list traversal, all movable IRQs are misattributed to the last active CPU in the system since that's what `bd` is last set to in the loop prior. This horribly breaks SBalance's notion of balance, producing nonsensical balancing decisions and failing to balance IRQs even when they are heavily imbalanced. Fix the massive breakage by adding the missing line of code to set `bd` to the CPU an IRQ actually belongs to, so that it's added to the correct CPU's movable IRQs list. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> commit `f2fa2db581` Author: Sultan Alsawaf <sultan@kerneltoast.com> Date: Wed Dec 4 14:31:52 2024 -0800 sbalance: Don't race with CPU hotplug When a CPU is hotplugged, cpu_active_mask is modified without any RCU synchronization. As a result, the only synchronization for cpu_active_mask provided by the hotplug code is the CPU hotplug lock. Furthermore, since IRQ balance is majorly disrupted during CPU hotplug due to mass IRQ migration off a dying CPU, SBalance just shouldn't operate while a CPU hotplug is in progress. Take the CPU hotplug lock in balance_irqs() to prevent races and mishaps during CPU hotplugs. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> commit `a4e81ff60a` Author: Sultan Alsawaf <sultan@kerneltoast.com> Date: Wed Dec 4 14:16:48 2024 -0800 sbalance: Convert various IRQ counter types to unsigned ints These counted values are actually unsigned ints, not unsigned longs. Convert them to unsigned ints since there's no reason for them to be longs. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2024-12-19 17:34:31 +02:00
Sultan Alsawaf	ab091b84b5	sbalance: Fix severe misattribution of movable IRQs to the last active CPU Due to a horrible omission in the big IRQ list traversal, all movable IRQs are misattributed to the last active CPU in the system since that's what `bd` is last set to in the loop prior. This horribly breaks SBalance's notion of balance, producing nonsensical balancing decisions and failing to balance IRQs even when they are heavily imbalanced. Fix the massive breakage by adding the missing line of code to set `bd` to the CPU an IRQ actually belongs to, so that it's added to the correct CPU's movable IRQs list. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2024-12-19 17:33:27 +02:00
Sultan Alsawaf	70b23e7894	sbalance: Don't race with CPU hotplug When a CPU is hotplugged, cpu_active_mask is modified without any RCU synchronization. As a result, the only synchronization for cpu_active_mask provided by the hotplug code is the CPU hotplug lock. Furthermore, since IRQ balance is majorly disrupted during CPU hotplug due to mass IRQ migration off a dying CPU, SBalance just shouldn't operate while a CPU hotplug is in progress. Take the CPU hotplug lock in balance_irqs() to prevent races and mishaps during CPU hotplugs. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2024-12-19 17:33:27 +02:00
Sultan Alsawaf	023a941087	sbalance: Convert various IRQ counter types to unsigned ints These counted values are actually unsigned ints, not unsigned longs. Convert them to unsigned ints since there's no reason for them to be longs. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2024-12-19 17:33:22 +02:00
kondors1995	373d5574ec	Merge branch 'linux-4.14.y' of https://github.com/openela/kernel-lts into 15.0	2024-10-20 18:10:19 +03:00
Tze-nan Wu	b28271a442	tracing: Fix overflow in get_free_elt() commit bcf86c01ca4676316557dd482c8416ece8c2e143 upstream. "tracing_map->next_elt" in get_free_elt() is at risk of overflowing. Once it overflows, new elements can still be inserted into the tracing_map even though the maximum number of elements (`max_elts`) has been reached. Continuing to insert elements after the overflow could result in the tracing_map containing "tracing_map->max_size" elements, leaving no empty entries. If any attempt is made to insert an element into a full tracing_map using `__tracing_map_insert()`, it will cause an infinite loop with preemption disabled, leading to a CPU hang problem. Fix this by preventing any further increments to "tracing_map->next_elt" once it reaches "tracing_map->max_elt". Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: `08d43a5fa0` ("tracing: Add lock-free tracing_map") Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> Link: https://lore.kernel.org/20240805055922.6277-1-Tze-nan.Wu@mediatek.com Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 302ceb625d7b990db205a15e371f9a71238de91c) [Vegard: s/atomic_fetch_add_unless/__atomic_add_unless/ due to missing commit bfc18e389c7a09fbbbed6bf4032396685b14246e ("atomics/treewide: Rename __atomic_add_unless() => atomic_fetch_add_unless()".] Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:30 +00:00
Justin Stitt	53390d85b1	ntp: Safeguard against time_constant overflow commit 06c03c8edce333b9ad9c6b207d93d3a5ae7c10c0 upstream. Using syzkaller with the recently reintroduced signed integer overflow sanitizer produces this UBSAN report: UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:738:18 9223372036854775806 + 4 cannot be represented in type 'long' Call Trace: handle_overflow+0x171/0x1b0 __do_adjtimex+0x1236/0x1440 do_adjtimex+0x2be/0x740 The user supplied time_constant value is incremented by four and then clamped to the operating range. Before commit `eea83d896e` ("ntp: NTP4 user space bits update") the user supplied value was sanity checked to be in the operating range. That change removed the sanity check and relied on clamping after incrementing which does not work correctly when the user supplied value is in the overflow zone of the '+ 4' operation. The operation requires CAP_SYS_TIME and the side effect of the overflow is NTP getting out of sync. Similar to the fixups for time_maxerror and time_esterror, clamp the user space supplied value to the operating range. [ tglx: Switch to clamping ] Fixes: `eea83d896e` ("ntp: NTP4 user space bits update") Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Miroslav Lichvar <mlichvar@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240517-b4-sio-ntp-c-v2-1-f3a80096f36f@google.com Closes: https://github.com/KSPP/linux/issues/352 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit a13f8b269b6f4c9371ab149ecb65d2edb52e9669) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:30 +00:00
Justin Stitt	07f7f40df9	ntp: Clamp maxerror and esterror to operating range [ Upstream commit 87d571d6fb77ec342a985afa8744bb9bb75b3622 ] Using syzkaller alongside the newly reintroduced signed integer overflow sanitizer spits out this report: UBSAN: signed-integer-overflow in ../kernel/time/ntp.c:461:16 9223372036854775807 + 500 cannot be represented in type 'long' Call Trace: handle_overflow+0x171/0x1b0 second_overflow+0x2d6/0x500 accumulate_nsecs_to_secs+0x60/0x160 timekeeping_advance+0x1fe/0x890 update_wall_time+0x10/0x30 time_maxerror is unconditionally incremented and the result is checked against NTP_PHASE_LIMIT, but the increment itself can overflow, resulting in wrap-around to negative space. Before commit `eea83d896e` ("ntp: NTP4 user space bits update") the user supplied value was sanity checked to be in the operating range. That change removed the sanity check and relied on clamping in handle_overflow() which does not work correctly when the user supplied value is in the overflow zone of the '+ 500' operation. The operation requires CAP_SYS_TIME and the side effect of the overflow is NTP getting out of sync. Miroslav confirmed that the input value should be clamped to the operating range and the same applies to time_esterror. The latter is not used by the kernel, but the value still should be in the operating range as it was before the sanity check got removed. Clamp them to the operating range. [ tglx: Changed it to clamping and included time_esterror ] Fixes: `eea83d896e` ("ntp: NTP4 user space bits update") Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Miroslav Lichvar <mlichvar@redhat.com> Link: https://lore.kernel.org/all/20240517-b4-sio-ntp-usec-v2-1-d539180f2b79@google.com Closes: https://github.com/KSPP/linux/issues/354 Signed-off-by: Sasha Levin <sashal@kernel.org> [ cast things to __kernel_long_t to fix compiler warnings - gregkh ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 9dfe2eef1ecfbb1f29e678700247de6010784eb9) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:29 +00:00
Thomas Gleixner	6fad54cc7a	tick/broadcast: Move per CPU pointer access into the atomic section commit 6881e75237a84093d0986f56223db3724619f26e upstream. The recent fix for making the take over of the broadcast timer more reliable retrieves a per CPU pointer in preemptible context. This went unnoticed as compilers hoist the access into the non-preemptible region where the pointer is actually used. But of course it's valid that the compiler keeps it at the place where the code puts it which rightfully triggers: BUG: using smp_processor_id() in preemptible [00000000] code: caller is hotplug_cpu__broadcast_tick_pull+0x1c/0xc0 Move it to the actual usage site which is in a non-preemptible region. Fixes: f7d43dd206e7 ("tick/broadcast: Make takeover of broadcast hrtimer reliable") Reported-by: David Wang <00107082@163.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Yu Liao <liaoyu15@huawei.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/87ttg56ers.ffs@tglx Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit f54abf332a2bc0413cfa8bd6a8511f7aa99faea0) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:29 +00:00
Douglas Anderson	4925aa995a	kdb: Use the passed prompt in kdb_position_cursor() [ Upstream commit e2e821095949cde46256034975a90f88626a2a73 ] The function kdb_position_cursor() takes in a "prompt" parameter but never uses it. This doesn't _really_ matter since all current callers of the function pass the same value and it's a global variable, but it's a bit ugly. Let's clean it up. Found by code inspection. This patch is expected to functionally be a no-op. Fixes: 09b35989421d ("kdb: Use format-strings rather than '\0' injection in kdb_read()") Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20240528071144.1.I0feb49839c6b6f4f2c4bf34764f5e95de3f55a66@changeid Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 90f2409c1d552f27a2b2bf8dc598d147c4173128) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:25 +00:00
Arnd Bergmann	fbcf6bbfac	kdb: address -Wformat-security warnings [ Upstream commit 70867efacf4370b6c7cdfc7a5b11300e9ef7de64 ] When -Wformat-security is not disabled, using a string pointer as a format causes a warning: kernel/debug/kdb/kdb_io.c: In function 'kdb_read': kernel/debug/kdb/kdb_io.c:365:36: error: format not a string literal and no format arguments [-Werror=format-security] 365 \| kdb_printf(kdb_prompt_str); \| ^~~~~~~~~~~~~~ kernel/debug/kdb/kdb_io.c: In function 'kdb_getstr': kernel/debug/kdb/kdb_io.c:456:20: error: format not a string literal and no format arguments [-Werror=format-security] 456 \| kdb_printf(kdb_prompt_str); \| ^~~~~~~~~~~~~~ Use an explcit "%s" format instead. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `5d5314d679` ("kdb: core for kgdb back end (1 of 2)") Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20240528121154.3662553-1-arnd@kernel.org Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 22a100556ceab8b906ad180788bd6bdc07390f50) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:25 +00:00
Wenlin Kang	2527458f09	kdb: Fix bound check compiler warning [ Upstream commit ca976bfb3154c7bc67c4651ecd144fdf67ccaee7 ] The strncpy() function may leave the destination string buffer unterminated, better use strscpy() instead. This fixes the following warning with gcc 8.2: kernel/debug/kdb/kdb_io.c: In function 'kdb_getstr': kernel/debug/kdb/kdb_io.c:449:3: warning: 'strncpy' specified bound 256 equals destination size [-Wstringop-truncation] strncpy(kdb_prompt_str, prompt, CMD_BUFLEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Stable-dep-of: 70867efacf43 ("kdb: address -Wformat-security warnings") Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit b15593e2904d2ff0094b7170f806dba0eeefac75) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:25 +00:00
Thomas Gleixner	dbffea43e8	watchdog/perf: properly initialize the turbo mode timestamp and rearm counter commit f944ffcbc2e1c759764850261670586ddf3bdabb upstream. For systems on which the performance counter can expire early due to turbo modes the watchdog handler has a safety net in place which validates that since the last watchdog event there has at least 4/5th of the watchdog period elapsed. This works reliably only after the first watchdog event because the per CPU variable which holds the timestamp of the last event is never initialized. So a first spurious event will validate against a timestamp of 0 which results in a delta which is likely to be way over the 4/5 threshold of the period. As this might happen before the first watchdog hrtimer event increments the watchdog counter, this can lead to false positives. Fix this by initializing the timestamp before enabling the hardware event. Reset the rearm counter as well, as that might be non zero after the watchdog was disabled and reenabled. Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx Fixes: `7edaeb6841` ("kernel/watchdog: Prevent false positives with turbo modes") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 6d94ca5d571dfdb34f12dc3f63273ea275e8f40c) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:24 +00:00
Yu Liao	3065612975	tick/broadcast: Make takeover of broadcast hrtimer reliable commit f7d43dd206e7e18c182f200e67a8db8c209907fa upstream. Running the LTP hotplug stress test on a aarch64 machine results in rcu_sched stall warnings when the broadcast hrtimer was owned by the un-plugged CPU. The issue is the following: CPU1 (owns the broadcast hrtimer) CPU2 tick_broadcast_enter() // shutdown local timer device broadcast_shutdown_local() ... tick_broadcast_exit() clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT) // timer device is not programmed cpumask_set_cpu(cpu, tick_broadcast_force_mask) initiates offlining of CPU1 take_cpu_down() /* * CPU1 shuts down and does not * send broadcast IPI anymore / takedown_cpu() hotplug_cpu__broadcast_tick_pull() // move broadcast hrtimer to this CPU clockevents_program_event() bc_set_next() hrtimer_start() / * timer device is not programmed * because only the first expiring * timer will trigger clockevent * device reprogramming */ What happens is that CPU2 exits broadcast mode with force bit set, then the local timer device is not reprogrammed and CPU2 expects to receive the expired event by the broadcast IPI. But this does not happen because CPU1 is offlined by CPU2. CPU switches the clockevent device to ONESHOT state, but does not reprogram the device. The subsequent reprogramming of the hrtimer broadcast device does not program the clockevent device of CPU2 either because the pending expiry time is already in the past and the CPU expects the event to be delivered. As a consequence all CPUs which wait for a broadcast event to be delivered are stuck forever. Fix this issue by reprogramming the local timer device if the broadcast force bit of the CPU is set so that the broadcast hrtimer is delivered. [ tglx: Massage comment and change log. Add Fixes tag ] Fixes: `989dcb645c` ("tick: Handle broadcast wakeup of multiple cpus") Signed-off-by: Yu Liao <liaoyu15@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240711124843.64167-1-liaoyu15@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit dfe19aa91378972f10530635ad83b2d77f481044) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:23 +00:00
Adrian Hunter	26864f03cc	perf: Prevent passing zero nr_pages to rb_alloc_aux() [ Upstream commit dbc48c8f41c208082cfa95e973560134489e3309 ] nr_pages is unsigned long but gets passed to rb_alloc_aux() as an int, and is stored as an int. Only power-of-2 values are accepted, so if nr_pages is a 64_bit value, it will be passed to rb_alloc_aux() as zero. That is not ideal because: 1. the value is incorrect 2. rb_alloc_aux() is at risk of misbehaving, although it manages to return -ENOMEM in that case, it is a result of passing zero to get_order() even though the get_order() result is documented to be undefined in that case. Fix by simply validating the maximum supported value in the first place. Use -ENOMEM error code for consistency with the current error code that is returned in that case. Fixes: `45bfb2e504` ("perf: Add AUX area to ring buffer for raw data streams") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240624201101.60186-6-adrian.hunter@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit d7b1a76f33e6fc93924725b4410126740c890c44) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:20 +00:00
Adrian Hunter	6f7bc617b3	perf: Fix perf_aux_size() for greater-than 32-bit size [ Upstream commit 3df94a5b1078dfe2b0c03f027d018800faf44c82 ] perf_buffer->aux_nr_pages uses a 32-bit type, so a cast is needed to calculate a 64-bit size. Fixes: `45bfb2e504` ("perf: Add AUX area to ring buffer for raw data streams") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240624201101.60186-5-adrian.hunter@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 542abbf58e88f34dfc659b63476a5976acf52c0e) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>	2024-10-10 10:27:20 +00:00
EmanuelCN	9ff663517c	schedutil: Inline clo code with 5.4	2024-09-30 16:34:15 +03:00
Joel Fernandes (Google)	3e312c8e7b	schedutil: Allow cpufreq requests to be made even when kthread kicked Currently there is a chance of a schedutil cpufreq update request to be dropped if there is a pending update request. This pending request can be delayed if there is a scheduling delay of the irq_work and the wake up of the schedutil governor kthread. A very bad scenario is when a schedutil request was already just made, such as to reduce the CPU frequency, then a newer request to increase CPU frequency (even sched deadline urgent frequency increase requests) can be dropped, even though the rate limits suggest that its Ok to process a request. This is because of the way the work_in_progress flag is used. This patch improves the situation by allowing new requests to happen even though the old one is still being processed. Note that in this approach, if an irq_work was already issued, we just update next_freq and don't bother to queue another request so there's no extra work being done to make this happen. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2024-09-30 16:31:50 +03:00
Tim Zimmermann	92494b9920	syscall: Increase fake uname to 6.6.40 Change-Id: I77a8c5cd0e74eff97ae9b6b7e37812e2972cb3c8	2024-09-07 12:26:37 +03:00
Tim Zimmermann	31f7a8793a	syscall: Only fake uname on very first call of netbpfload * The bpf programs actually still support older kernels, we just need to bypass the very first check for kernel version Change-Id: I4264782ee63efb26b95abd94774938d5456200a3	2024-09-07 12:26:24 +03:00
Daniel Borkmann	248cbcedae	BACKPORT: bpf: fix unconnected udp hooks Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently to applications as also stated in original motivation in 7828f20e3779 ("Merge branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter two hooks into Cilium to enable host based load-balancing with Kubernetes, I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes typically sets up DNS as a service and is thus subject to load-balancing. Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API is currently insufficient and thus not usable as-is for standard applications shipped with most distros. To break down the issue we ran into with a simple example: # cat /etc/resolv.conf nameserver 147.75.207.207 nameserver 147.75.207.208 For the purpose of a simple test, we set up above IPs as service IPs and transparently redirect traffic to a different DNS backend server for that node: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 The attached BPF program is basically selecting one of the backends if the service IP/port matches on the cgroup hook. DNS breaks here, because the hooks are not transparent enough to applications which have built-in msg_name address checks: # nslookup 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ;; connection timed out; no servers could be reached # dig 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; connection timed out; no servers could be reached For comparison, if none of the service IPs is used, and we tell nslookup to use 8.8.8.8 directly it works just fine, of course: # nslookup 1.1.1.1 8.8.8.8 1.1.1.1.in-addr.arpa name = one.one.one.one. In order to fix this and thus act more transparent to the application, this needs reverse translation on recvmsg() side. A minimal fix for this API is to add similar recvmsg() hooks behind the BPF cgroups static key such that the program can track state and replace the current sockaddr_in{,6} with the original service IP. From BPF side, this basically tracks the service tuple plus socket cookie in an LRU map where the reverse NAT can then be retrieved via map value as one example. Side-note: the BPF cgroups static key should be converted to a per-hook static key in future. Same example after this fix: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 Lookups work fine now: # nslookup 1.1.1.1 1.1.1.1.in-addr.arpa name = one.one.one.one. Authoritative answers can be found from: # dig 1.1.1.1 ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;1.1.1.1. IN A ;; AUTHORITY SECTION: . 23426 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400 ;; Query time: 17 msec ;; SERVER: 147.75.207.207#53(147.75.207.207) ;; WHEN: Tue May 21 12:59:38 UTC 2019 ;; MSG SIZE rcvd: 111 And from an actual packet level it shows that we're using the back end server when talking via 147.75.207.20{7,8} front end: # tcpdump -i any udp [...] 12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) [...] In order to be flexible and to have same semantics as in sendmsg BPF programs, we only allow return codes in [1,1] range. In the sendmsg case the program is called if msg->msg_name is present which can be the case in both, connected and unconnected UDP. The former only relies on the sockaddr_in{,6} passed via connect(2) if passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar way to call into the BPF program whenever a non-NULL msg->msg_name was passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note that for TCP case, the msg->msg_name is ignored in the regular recvmsg path and therefore not relevant. For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE, the hook is not called. This is intentional as it aligns with the same semantics as in case of TCP cgroup BPF hooks right now. This might be better addressed in future through a different bpf_attach_type such that this case can be distinguished from the regular recvmsg paths, for example. Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg") Change-Id: If2bab00efe5f37a591083fe2676e76f35f8cecc3 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-09-07 12:24:22 +03:00
Alexei Starovoitov	1980188064	BACKPORT: bpf: enforce return code for cgroup-bpf programs with addition of tnum logic the verifier got smart enough and we can enforce return codes at program load time. For now do so for cgroup-bpf program types. Change-Id: Iae3a46c3d38810e47cbf4ec23356abae03ded736 Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-09-07 12:24:22 +03:00
Andrey Ignatov	735c155332	bpf: Hooks for sys_sendmsg In addition to already existing BPF hooks for sys_bind and sys_connect, the patch provides new hooks for sys_sendmsg. It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that provides access to socket itlself (properties like family, type, protocol) and user-passed `struct sockaddr ` so that BPF program can override destination IP and port for system calls such as sendto(2) or sendmsg(2) and/or assign source IP to the socket. The hooks are implemented as two new attach types: `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and UDPv6 correspondingly. UDPv4 and UDPv6 separate attach types for same reason as sys_bind and sys_connect hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound. The difference with already existing hooks is sys_sendmsg are implemented only for unconnected UDP. For TCP it doesn't make sense to change user-provided `struct sockaddr ` at sendto(2)/sendmsg(2) time since socket either was already connected and has source/destination set or wasn't connected and call to sendto(2)/sendmsg(2) would lead to ENOTCONN anyway. Connected UDP is already handled by sys_connect hooks that can override source/destination at connect time and use fast-path later, i.e. these hooks don't affect UDP fast-path. Rewriting source IP is implemented differently than that in sys_connect hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to just bind socket to desired local IP address since source IP can be set on per-packet basis by using ancillary data (cmsg(3)). So no matter if socket is bound or not, source IP has to be rewritten on every call to sys_sendmsg. To do so two new fields are added to UAPI `struct bpf_sock_addr`; * `msg_src_ip4` to set source IPv4 for UDPv4; * `msg_src_ip6` to set source IPv6 for UDPv6. Change-Id: Icf5938b0b69ddfb1e99dc2abc90204f7c97f0473 Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2024-09-07 12:24:00 +03:00
kondors1995	478461a998	Merge branch 'linux-4.14.y' of https://github.com/openela/kernel-lts into 15.0	2024-09-07 12:22:53 +03:00
Sultan Alsawaf	d9324db241	cpufreq: schedutil: Allow single-CPU frequency to drop without idling Given that a CPU's clock is gated at even the shallowest idle state, waiting until a CPU idles at least once before reducing its frequency is putting the cart before the horse. For long-running workloads with low compute needs, requiring an idle call since the last frequency update to lower the CPU's frequency results in significantly increased energy usage. Given that there is already a mechanism in place to ratelimit frequency changes, this heuristic is wholly unnecessary. Allow single-CPU performance domains to drop their frequency without requiring an idle call in between to improve energy. Right off the bat, this reduces CPU power consumption by 7.5% playing a cat gif in Firefox on a Pixel 8 (270 mW -> 250 mW). And there is no visible loss of performance. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>	2024-09-01 17:45:32 +03:00
EmanuelCN	de9a4993ce	Revert "cpufreq: schedutil: Ignore CPU load older than WALT window size" This reverts commit b9d1ecec68e9750f819dca02af47caee55d06a58.	2024-09-01 17:44:20 +03:00

1 2 3 4 5 ...

30045 Commits