Commit Graph

38873 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
34eebce472 Merge 3d5e28bff7 ("Merge branch 'stable/for-linus-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb") into android-mainline
Steps on the way to 5.10-rc4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I74d7b93742dce6256e2d4fe636d7b0ad93d90467
2020-11-12 09:16:49 +01:00
Greg Kroah-Hartman
7f6480e40c Merge eccc876724 ("Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs") into android-mainline
Steps on the way to 5.10-rc4

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I9e0fa89c0f6f306fe802ae95c8d01d9ba558e111
2020-11-12 08:02:21 +01:00
Martin KaFai Lau
09a3dac7b5 bpf: Fix NULL dereference in bpf_task_storage
In bpf_pid_task_storage_update_elem(), it missed to
test the !task_storage_ptr(task) which then could trigger a NULL
pointer exception in bpf_local_storage_update().

Fixes: 4cf1bc1f10 ("bpf: Implement task local storage")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Roman Gushchin <guro@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20201112001919.2028357-1-kafai@fb.com
2020-11-11 18:14:49 -08:00
Linus Torvalds
3d5e28bff7 Merge branch 'stable/for-linus-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
 "Two tiny fixes for issues that make drivers under Xen unhappy under
  certain conditions"

* 'stable/for-linus-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single
  swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
2020-11-11 14:15:06 -08:00
Nikolay Borisov
584da07686 printk: ringbuffer: Reference text_data_ring directly in callees.
A bunch of functions in the new ringbuffer code take both a
printk_ringbuffer struct and a separate prb_data_ring. This is a relic
from an earlier version of the code when a second data ring was present.
Since this is no longer the case remove the extra function argument
from:
 - data_make_reusable()
 - data_push_tail()
 - data_alloc()
 - data_realloc()

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2020-11-11 16:55:39 +01:00
Kaixu Xia
f16e631333 bpf: Fix unsigned 'datasec_id' compared with zero in check_pseudo_btf_id
The unsigned variable datasec_id is assigned a return value from the call
to check_pseudo_btf_id(), which may return negative error code.

This fixes the following coccicheck warning:

  ./kernel/bpf/verifier.c:9616:5-15: WARNING: Unsigned expression compared with zero: datasec_id > 0

Fixes: eaa6bcb71e ("bpf: Introduce bpf_per_cpu_ptr()")
Reported-by: Tosk Robot <tencent_os_robot@tencent.com>
Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/1605071026-25906-1-git-send-email-kaixuxia@tencent.com
2020-11-11 10:50:22 +01:00
Andrii Nakryiko
7112d12798 bpf: Compile out btf_parse_module() if module BTF is not enabled
Make sure btf_parse_module() is compiled out if module BTFs are not enabled.

Fixes: 36e68442d1 ("bpf: Load and verify kernel module BTFs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201111040645.903494-1-andrii@kernel.org
2020-11-10 20:15:07 -08:00
Qiujun Huang
2b5894cc33 tracing: Fix some typos in comments
s/detetector/detector/
s/enfoced/enforced/
s/writen/written/
s/actualy/actually/
s/bascially/basically/
s/Regarldess/Regardless/
s/zeroes/zeros/
s/followd/followed/
s/incrememented/incremented/
s/separatelly/separately/
s/accesible/accessible/
s/sythetic/synthetic/
s/enabed/enabled/
s/heurisitc/heuristic/
s/assocated/associated/
s/otherwides/otherwise/
s/specfied/specified/
s/seaching/searching/
s/hierachry/hierarchy/
s/internel/internal/
s/Thise/This/

Link: https://lkml.kernel.org/r/20201029150554.3354-1-hqjagain@gmail.com

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-10 20:39:40 -05:00
Alex Shi
045e269c1e ftrace: Remove unused varible 'ret'
'ret' in 2 functions are not used. and one of them is a void function.
So remove them to avoid gcc warning:
kernel/trace/ftrace.c:4166:6: warning: variable ‘ret’ set but not used
[-Wunused-but-set-variable]
kernel/trace/ftrace.c:5571:6: warning: variable ‘ret’ set but not used
[-Wunused-but-set-variable]

Link: https://lkml.kernel.org/r/1604674486-52350-1-git-send-email-alex.shi@linux.alibaba.com

Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-10 20:39:40 -05:00
Steven Rostedt (VMware)
28575c61ea ring-buffer: Add recording of ring buffer recursion into recursed_functions
Add a new config RING_BUFFER_RECORD_RECURSION that will place functions that
recurse from the ring buffer into the ftrace recused_functions file.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-10 20:39:40 -05:00
Steven Rostedt (VMware)
60602cb549 fgraph: Make overruns 4 bytes in graph stack structure
Inspecting the data structures of the function graph tracer, I found that
the overrun value is unsigned long, which is 8 bytes on a 64 bit machine,
and not only that, the depth is an int (4 bytes). The overrun can be simply
an unsigned int (4 bytes) and pack the ftrace_graph_ret structure better.

The depth is moved up next to the func, as it is used more often with func,
and improves cache locality.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-10 20:39:36 -05:00
Paul E. McKenney
c583bcb8f5 rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
The try_invoke_on_locked_down_task() function requires that
interrupts be enabled, but it is called with interrupts disabled from
rcu_print_task_stall(), resulting in an "IRQs not enabled as expected"
diagnostic.  This commit therefore updates rcu_print_task_stall()
to accumulate a list of the first few tasks while holding the current
leaf rcu_node structure's ->lock, then releases that lock and only then
uses try_invoke_on_locked_down_task() to attempt to obtain per-task
detailed information.  Of course, as soon as ->lock is released, the
task might exit, so the get_task_struct() function is used to prevent
the task structure from going away in the meantime.

Link: https://lore.kernel.org/lkml/000000000000903d5805ab908fc4@google.com/
Fixes: 5bef8da66a ("rcu: Add per-task state to RCU CPU stall warnings")
Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com
Reported-by: syzbot+f04854e1c5c9e913cc27@syzkaller.appspotmail.com
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-11-10 17:10:38 -08:00
Andrii Nakryiko
36e68442d1 bpf: Load and verify kernel module BTFs
Add kernel module listener that will load/validate and unload module BTF.
Module BTFs gets ID generated for them, which makes it possible to iterate
them with existing BTF iteration API. They are given their respective module's
names, which will get reported through GET_OBJ_INFO API. They are also marked
as in-kernel BTFs for tooling to distinguish them from user-provided BTFs.

Also, similarly to vmlinux BTF, kernel module BTFs are exposed through
sysfs as /sys/kernel/btf/<module-name>. This is convenient for user-space
tools to inspect module BTF contents and dump their types with existing tools:

[vmuser@archvm bpf]$ ls -la /sys/kernel/btf
total 0
drwxr-xr-x  2 root root       0 Nov  4 19:46 .
drwxr-xr-x 13 root root       0 Nov  4 19:46 ..

...

-r--r--r--  1 root root     888 Nov  4 19:46 irqbypass
-r--r--r--  1 root root  100225 Nov  4 19:46 kvm
-r--r--r--  1 root root   35401 Nov  4 19:46 kvm_intel
-r--r--r--  1 root root     120 Nov  4 19:46 pcspkr
-r--r--r--  1 root root     399 Nov  4 19:46 serio_raw
-r--r--r--  1 root root 4094095 Nov  4 19:46 vmlinux

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org
2020-11-10 15:25:53 -08:00
Andrii Nakryiko
5329722057 bpf: Assign ID to vmlinux BTF and return extra info for BTF in GET_OBJ_INFO
Allocate ID for vmlinux BTF. This makes it visible when iterating over all BTF
objects in the system. To allow distinguishing vmlinux BTF (and later kernel
module BTF) from user-provided BTFs, expose extra kernel_btf flag, as well as
BTF name ("vmlinux" for vmlinux BTF, will equal to module's name for module
BTF).  We might want to later allow specifying BTF name for user-provided BTFs
as well, if that makes sense. But currently this is reserved only for
in-kernel BTFs.

Having in-kernel BTFs exposed IDs will allow to extend BPF APIs that require
in-kernel BTF type with ability to specify BTF types from kernel modules, not
just vmlinux BTF. This will be implemented in a follow up patch set for
fentry/fexit/fmod_ret/lsm/etc.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-3-andrii@kernel.org
2020-11-10 15:25:53 -08:00
Andrii Nakryiko
951bb64621 bpf: Add in-kernel split BTF support
Adjust in-kernel BTF implementation to support a split BTF mode of operation.
Changes are mostly mirroring libbpf split BTF changes, with the exception of
start_id being 0 for in-kernel implementation due to simpler read-only mode.

Otherwise, for split BTF logic, most of the logic of jumping to base BTF,
where necessary, is encapsulated in few helper functions. Type numbering and
string offset in a split BTF are logically continuing where base BTF ends, so
most of the high-level logic is kept without changes.

Type verification and size resolution is only doing an added resolution of new
split BTF types and relies on already cached size and type resolution results
in the base BTF.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20201110011932.3201430-2-andrii@kernel.org
2020-11-10 15:25:53 -08:00
Amir Vajid
9bdaa3fa87 ANDROID: vendor_hooks: Add hook for jiffies updates
Create a vendor hook for jiffies updates by the
tick_do_timer_cpu.

Bug: 148928265
Change-Id: Ia442e20d446b8ce4f2b3f2be76655e72919c76eb
Signed-off-by: Amir Vajid <avajid@codeaurora.org>
2020-11-10 21:50:56 +00:00
Prasad Sodagudi
7aaa29b82b ANDROID: printk: add vendor hook for console flush
Add vendor hook for skipping console flush in cpu hotplug.

Bug: 165340180
Change-Id: I167e1595bbb50e57371bfabfde638624761d5f8a
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2020-11-10 20:07:49 +00:00
Prasad Sodagudi
074a2b36c1 ANDROID: Reduce log level for couple of prints in hotplug flow
During the cpu hot plug stress testing, couple of messages
continuous flooding on to the console is causing timers
migration delay.

Bug: 165340180
Change-Id: I18f96613242a7a821ff707bcdaac794ccefd0bba
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2020-11-10 20:07:32 +00:00
Lukasz Luba
f2c90b12e7 PM: EM: update the comments related to power scale
The Energy Model supports power values expressed in milli-Watts or in an
'abstract scale'. Update the related comments is the code to reflect that
state.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10 20:29:28 +01:00
Lukasz Luba
c250d50fe2 PM: EM: Add a flag indicating units of power values in Energy Model
There are different platforms and devices which might use different scale
for the power values. Kernel sub-systems might need to check if all
Energy Model (EM) devices are using the same scale. Address that issue and
store the information inside EM for each device. Thanks to that they can
be easily compared and proper action triggered.

Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10 20:22:00 +01:00
Lingutla Chandrasekhar
9889f08de4 ANDROID: trace: Add trace points for tasklet entry/exit
Tasklets are supposed to finish their work quickly and
should not block the current running process, but it is not
guaranteed that. Currently softirq_entry/exit can be used to
know total tasklets execution time, but not helpful to track
individual tasklet's execution time. With that we can't find
any culprit tasklet function, which is taking more time.

Add {hi}-tasklet_entry/exit trace point support to track
individual tasklet execution.

Bug: 168521633
Change-Id: I3496d15f64d020916774e673ccb4a8116ea2f2c9
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
[elavila: Port to mainline]
Signed-off-by: J. Avila <elavila@google.com>
2020-11-10 19:07:12 +00:00
Lingutla Chandrasekhar
1bcefd3883 ANDROID: Revert "softirq: Let ksoftirqd do its job"
Ksfotirqd is a normal priority CFS task. It can experience higher
scheduling latency under heavy load conditions. Currently once
asynchronous softirq processing is deferred to ksoftirqd, softirqs
are not processed further until ksoftirqd task gets a chance to run.
High latencies for softirqs like TIMER, HI TASKLET is not acceptable.

So revert 'commit 4cd13c21b2 ("softirq: Let ksoftirqd do its job")'.

Bug: 168521633
Change-Id: I38a1a88b5f42dd534c65d739dbb7e4321a7904db
Signed-off-by: Lingutla Chandrasekhar <clingutla@codeaurora.org>
[satyap@codeaurora.org: Fix trivial merge conflicts]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[elavila: Port to mainline]
Signed-off-by: J. Avila <elavila@google.com>
2020-11-10 19:07:12 +00:00
Satya Durga Srinivasu Prabhala
ea05d25c78 ANDROID: Revert "Mark HI and TASKLET softirq synchronous"
This reverts commit 3c53776e29 because it
makes HI_SOFTIRQ and TASKLET_SOFTIRQ run immediately i.e. not get
deferred to ksfotirqd. The commit text calls out that this is a stopgap
until a better solution is deviced and that it should have included
TIMER_SOFTIRQ as well, but chose not to.

Patch ebc7b75d772647813cdaf085c7f5adf3c90b033b from Qualcomm is a more
comprehensive solution which defers long running tasklets to ksfotirqd
when a rt task is interrupted.

We cannot use both these together as they conflict; e.g. the original
makes TASKLET softirq run immediately while the Qualcomm one defers it
if rt gets interrupted. Choose to use the Qualcomm one.

Bug: 168521633
Change-Id: I4af64cd7e2c4291dda5f503bf2d74ede459a76c6
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[elavila: port to mainline, resolve conflicts, add commit text]
Signed-off-by: J. Avila <elavila@google.com>
2020-11-10 19:07:12 +00:00
Pavankumar Kondeti
0578248bed ANDROID: softirq: defer softirq processing to ksoftirqd if CPU is busy with RT
Defer the softirq processing to ksoftirqd if a RT task is running
or queued on the current CPU. This complements the RT task placement
algorithm which tries to find a CPU that is not currently busy with
softirqs.

Currently NET_TX, NET_RX, BLOCK and TASKLET softirqs are only deferred
as they can potentially run for long time.

Bug: 168521633
Change-Id: Id7665244af6bbd5a96d9e591cf26154e9eaa860c
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[satyap@codeaurora.org: trivial merge conflict resolution.]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
[elavila: Port to mainline, squash with bugfix]
Signed-off-by: J. Avila <elavila@google.com>
2020-11-10 19:07:11 +00:00
John Dias
8d19443b0b ANDROID: sched: avoid migrating when softint on tgt cpu should be short
The scheduling change to avoid putting RT threads on cores that
are handling softint's was catching cases where there was no reason
to believe the softint would take a long time, resulting in unnecessary
migration overhead. This patch reduces the migration to cases where
the core has a softint that is actually likely to take a long time,
as opposed to the RCU, SCHED, and TIMER softints that are rather quick.

Bug: 31752786
Bug: 168521633
Change-Id: Ib4e179f1e15c736b2fdba31070494e357e9fbbe2
Signed-off-by: John Dias <joaodias@google.com>
[elavila: Amend commit text for AOSP, port to mainline]
Signed-off-by: J. Avila <elavila@google.com>
2020-11-10 19:07:11 +00:00
John Dias
3adfd8e344 ANDROID: sched: avoid placing RT threads on cores handling softirqs
In certain audio use cases, scheduling RT threads on cores that are
handling softirqs can lead to glitches. Prevent this behavior.

Bug: 31501544
Bug: 168521633
Change-Id: I99dd7aaa12c11270b28dbabea484bcc8fb8ba0c1
Signed-off-by: John Dias <joaodias@google.com>
[elavila: Port to mainline, amend commit text]
Signed-off-by: J. Avila <elavila@google.com>
2020-11-10 19:07:11 +00:00
Linus Torvalds
eccc876724 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull core dump fix from Al Viro:
 "Fix for multithreaded coredump playing fast and loose with getting
  registers of secondary threads; if a secondary gets caught in the
  middle of exit(2), the conditition it will be stopped in for dumper to
  examine might be unusual enough for things to go wrong.

  Quite a few architectures are fine with that, but some are not."

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  don't dump the threads that had been already exiting when zapped.
2020-11-10 10:33:55 -08:00
Kai-Heng Feng
d60cd06331 PM: ACPI: reboot: Use S5 for reboot
After reboot, it's not possible to use hotkeys to enter BIOS setup
and boot menu on some HP laptops.

BIOS folks identified the root cause is the missing _PTS call, and
BIOS is expecting _PTS to do proper reset.

Using S5 for reboot is default behavior under Windows, "A full
shutdown (S5) occurs when a system restart is requested" [1], so
let's do the same here.

[1] https://docs.microsoft.com/en-us/windows/win32/power/system-power-states

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10 19:00:44 +01:00
Valentin Schneider
dc824eb898 sched/fair: Dissociate wakeup decisions from SD flag value
The CFS wakeup code will only ever go through EAS / its fast path on
"regular" wakeups (i.e. not on forks or execs). These are currently gated
by a check against 'sd_flag', which would be SD_BALANCE_WAKE at wakeup.

However, we now have a flag that explicitly tells us whether a wakeup is a
"regular" one, so hinge those conditions on that flag instead.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201102184514.2733-4-valentin.schneider@arm.com
2020-11-10 18:39:06 +01:00
Valentin Schneider
3aef1551e9 sched: Remove select_task_rq()'s sd_flag parameter
Only select_task_rq_fair() uses that parameter to do an actual domain
search, other classes only care about what kind of wakeup is happening
(fork, exec, or "regular") and thus just translate the flag into a wakeup
type.

WF_TTWU and WF_EXEC have just been added, use these along with WF_FORK to
encode the wakeup types we care about. For select_task_rq_fair(), we can
simply use the shiny new WF_flag : SD_flag mapping.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201102184514.2733-3-valentin.schneider@arm.com
2020-11-10 18:39:06 +01:00
Valentin Schneider
1777057905 sched: Add WF_TTWU, WF_EXEC wakeup flags
To remove the sd_flag parameter of select_task_rq(), we need another way of
encoding wakeup types. There already is a WF_FORK flag, add the missing two.

With that said, we still need an easy way to turn WF_foo into
SD_bar (e.g. WF_TTWU into SD_BALANCE_WAKE). As suggested by Peter, let's
make our lives easier and make them match exactly, and throw in some
compile-time checks for good measure.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201102184514.2733-2-valentin.schneider@arm.com
2020-11-10 18:39:06 +01:00
Hui Su
cdb310474d sched/fair: Remove superfluous lock section in do_sched_cfs_slack_timer()
Since ab93a4bc95 ("sched/fair: Remove distribute_running fromCFS
bandwidth"), there is nothing to protect between
raw_spin_lock_irqsave/store() in do_sched_cfs_slack_timer().

Signed-off-by: Hui Su <sh_def@163.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Link: https://lkml.kernel.org/r/20201030144621.GA96974@rlk
2020-11-10 18:39:05 +01:00
Peter Zijlstra
12fa97c64d Merge branch 'sched/migrate-disable' 2020-11-10 18:39:04 +01:00
Valentin Schneider
c777d84710 sched: Comment affine_move_task()
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201013140116.26651-2-valentin.schneider@arm.com
2020-11-10 18:39:02 +01:00
Valentin Schneider
885b3ba47a sched: Deny self-issued __set_cpus_allowed_ptr() when migrate_disable()
migrate_disable();
  set_cpus_allowed_ptr(current, {something excluding task_cpu(current)});
  affine_move_task(); <-- never returns

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201013140116.26651-1-valentin.schneider@arm.com
2020-11-10 18:39:02 +01:00
Peter Zijlstra
a7c81556ec sched: Fix migrate_disable() vs rt/dl balancing
In order to minimize the interference of migrate_disable() on lower
priority tasks, which can be deprived of runtime due to being stuck
below a higher priority task. Teach the RT/DL balancers to push away
these higher priority tasks when a lower priority task gets selected
to run on a freshly demoted CPU (pull).

This adds migration interference to the higher priority task, but
restores bandwidth to system that would otherwise be irrevocably lost.
Without this it would be possible to have all tasks on the system
stuck on a single CPU, each task preempted in a migrate_disable()
section with a single high priority task running.

This way we can still approximate running the M highest priority tasks
on the system.

Migrating the top task away is (ofcourse) still subject to
migrate_disable() too, which means the lower task is subject to an
interference equivalent to the worst case migrate_disable() section.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102347.499155098@infradead.org
2020-11-10 18:39:01 +01:00
Peter Zijlstra
ded467dc83 sched, lockdep: Annotate ->pi_lock recursion
There's a valid ->pi_lock recursion issue where the actual PI code
tries to wake up the stop task. Make lockdep aware so it doesn't
complain about this.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102347.406912197@infradead.org
2020-11-10 18:39:01 +01:00
Peter Zijlstra
95158a89dd sched,rt: Use the full cpumask for balancing
We want migrate_disable() tasks to get PULLs in order for them to PUSH
away the higher priority task.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102347.310519774@infradead.org
2020-11-10 18:39:00 +01:00
Peter Zijlstra
14e292f8d4 sched,rt: Use cpumask_any*_distribute()
Replace a bunch of cpumask_any*() instances with
cpumask_any*_distribute(), by injecting this little bit of random in
cpu selection, we reduce the chance two competing balance operations
working off the same lowest_mask pick the same CPU.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102347.190759694@infradead.org
2020-11-10 18:39:00 +01:00
Thomas Gleixner
3015ef4b98 sched/core: Make migrate disable and CPU hotplug cooperative
On CPU unplug tasks which are in a migrate disabled region cannot be pushed
to a different CPU until they returned to migrateable state.

Account the number of tasks on a runqueue which are in a migrate disabled
section and make the hotplug wait mechanism respect that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102347.067278757@infradead.org
2020-11-10 18:39:00 +01:00
Peter Zijlstra
6d337eab04 sched: Fix migrate_disable() vs set_cpus_allowed_ptr()
Concurrent migrate_disable() and set_cpus_allowed_ptr() has
interesting features. We rely on set_cpus_allowed_ptr() to not return
until the task runs inside the provided mask. This expectation is
exported to userspace.

This means that any set_cpus_allowed_ptr() caller must wait until
migrate_enable() allows migrations.

At the same time, we don't want migrate_enable() to schedule, due to
patterns like:

	preempt_disable();
	migrate_disable();
	...
	migrate_enable();
	preempt_enable();

And:

	raw_spin_lock(&B);
	spin_unlock(&A);

this means that when migrate_enable() must restore the affinity
mask, it cannot wait for completion thereof. Luck will have it that
that is exactly the case where there is a pending
set_cpus_allowed_ptr(), so let that provide storage for the async stop
machine.

Much thanks to Valentin who used TLA+ most effective and found lots of
'interesting' cases.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.921768277@infradead.org
2020-11-10 18:39:00 +01:00
Peter Zijlstra
af449901b8 sched: Add migrate_disable()
Add the base migrate_disable() support (under protest).

While migrate_disable() is (currently) required for PREEMPT_RT, it is
also one of the biggest flaws in the system.

Notably this is just the base implementation, it is broken vs
sched_setaffinity() and hotplug, both solved in additional patches for
ease of review.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.818170844@infradead.org
2020-11-10 18:38:59 +01:00
Peter Zijlstra
9cfc3e18ad sched: Massage set_cpus_allowed()
Thread a u32 flags word through the *set_cpus_allowed*() callchain.
This will allow adding behavioural tweaks for future users.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.729082820@infradead.org
2020-11-10 18:38:59 +01:00
Peter Zijlstra
120455c514 sched: Fix hotplug vs CPU bandwidth control
Since we now migrate tasks away before DYING, we should also move
bandwidth unthrottle, otherwise we can gain tasks from unthrottle
after we expect all tasks to be gone already.

Also; it looks like the RT balancers don't respect cpu_active() and
instead rely on rq->online in part, complete this. This too requires
we do set_rq_offline() earlier to match the cpu_active() semantics.
(The bigger patch is to convert RT to cpu_active() entirely)

Since set_rq_online() is called from sched_cpu_activate(), place
set_rq_offline() in sched_cpu_deactivate().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.639538965@infradead.org
2020-11-10 18:38:59 +01:00
Thomas Gleixner
1cf12e08bc sched/hotplug: Consolidate task migration on CPU unplug
With the new mechanism which kicks tasks off the outgoing CPU at the end of
schedule() the situation on an outgoing CPU right before the stopper thread
brings it down completely is:

 - All user tasks and all unbound kernel threads have either been migrated
   away or are not running and the next wakeup will move them to a online CPU.

 - All per CPU kernel threads, except cpu hotplug thread and the stopper
   thread have either been unbound or parked by the responsible CPU hotplug
   callback.

That means that at the last step before the stopper thread is invoked the
cpu hotplug thread is the last legitimate running task on the outgoing
CPU.

Add a final wait step right before the stopper thread is kicked which
ensures that any still running tasks on the way to park or on the way to
kick themself of the CPU are either sleeping or gone.

This allows to remove the migrate_tasks() crutch in sched_cpu_dying(). If
sched_cpu_dying() detects that there is still another running task aside of
the stopper thread then it will explode with the appropriate fireworks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.547163969@infradead.org
2020-11-10 18:38:58 +01:00
Peter Zijlstra
06249738a4 workqueue: Manually break affinity on hotplug
Don't rely on the scheduler to force break affinity for us -- it will
stop doing that for per-cpu-kthreads.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.464718669@infradead.org
2020-11-10 18:38:58 +01:00
Thomas Gleixner
f2469a1fb4 sched/core: Wait for tasks being pushed away on hotplug
RT kernels need to ensure that all tasks which are not per CPU kthreads
have left the outgoing CPU to guarantee that no tasks are force migrated
within a migrate disabled section.

There is also some desire to (ab)use fine grained CPU hotplug control to
clear a CPU from active state to force migrate tasks which are not per CPU
kthreads away for power control purposes.

Add a mechanism which waits until all tasks which should leave the CPU
after the CPU active flag is cleared have moved to a different online CPU.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.377836842@infradead.org
2020-11-10 18:38:58 +01:00
Peter Zijlstra
2558aacff8 sched/hotplug: Ensure only per-cpu kthreads run during hotplug
In preparation for migrate_disable(), make sure only per-cpu kthreads
are allowed to run on !active CPUs.

This is ran (as one of the very first steps) from the cpu-hotplug
task which is a per-cpu kthread and completion of the hotplug
operation only requires such tasks.

This constraint enables the migrate_disable() implementation to wait
for completion of all migrate_disable regions on this CPU at hotplug
time without fear of any new ones starting.

This replaces the unlikely(rq->balance_callbacks) test at the tail of
context_switch with an unlikely(rq->balance_work), the fast path is
not affected.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.292709163@infradead.org
2020-11-10 18:38:57 +01:00
Peter Zijlstra
565790d28b sched: Fix balance_callback()
The intent of balance_callback() has always been to delay executing
balancing operations until the end of the current rq->lock section.
This is because balance operations must often drop rq->lock, and that
isn't safe in general.

However, as noted by Scott, there were a few holes in that scheme;
balance_callback() was called after rq->lock was dropped, which means
another CPU can interleave and touch the callback list.

Rework code to call the balance callbacks before dropping rq->lock
where possible, and otherwise splice the balance list onto a local
stack.

This guarantees that the balance list must be empty when we take
rq->lock. IOW, we'll only ever run our own balance callbacks.

Reported-by: Scott Wood <swood@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.203901269@infradead.org
2020-11-10 18:38:57 +01:00
Peter Zijlstra
a8b62fd085 stop_machine: Add function and caller debug info
Crashes in stop-machine are hard to connect to the calling code, add a
little something to help with that.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.116513635@infradead.org
2020-11-10 18:38:57 +01:00