Commit Graph

38873 Commits

Author SHA1 Message Date
Waiman Long
ec8cf94166 BACKPORT: FROMGIT: sched: Enforce user requested affinity
It was found that the user requested affinity via sched_setaffinity()
can be easily overwritten by other kernel subsystems without an easy way
to reset it back to what the user requested. For example, any change
to the current cpuset hierarchy may reset the cpumask of the tasks in
the affected cpusets to the default cpuset value even if those tasks
have pre-existing user requested affinity. That is especially easy to
trigger under a cgroup v2 environment where writing "+cpuset" to the
root cgroup's cgroup.subtree_control file will reset the cpus affinity
of all the processes in the system.

That is problematic in a nohz_full environment where the tasks running
in the nohz_full CPUs usually have their cpus affinity explicitly set
and will behave incorrectly if cpus affinity changes.

Fix this problem by looking at user_cpus_ptr in __set_cpus_allowed_ptr()
and use it to restrcit the given cpumask unless there is no overlap. In
that case, it will fallback to the given one. The SCA_USER flag is
reused to indicate intent to set user_cpus_ptr and so user_cpus_ptr
masking should be skipped. In addition, masking should also be skipped
if any of the SCA_MIGRATE_* flag is set.

All callers of set_cpus_allowed_ptr() will be affected by this change.
A scratch cpumask is added to percpu runqueues structure for doing
additional masking when user_cpus_ptr is set.

Change-Id: I2dc567931fbcf33dbd544fbfc7bfb0c35a0feea8
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220922180041.1768141-4-longman@redhat.com
BUG: 254447891
(cherry picked from commit 02a73f8239de47caa07c8a5f93cb3700ee30f3dc
 https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core)
[ashayj: Resolved minor conflict in kernel/sched/core.c]
Signed-off-by: Ashay Jaiswal <quic_ashayj@quicinc.com>
2022-10-24 08:40:21 +00:00
Waiman Long
50a3a47c14 BACKPORT: FROMGIT: sched: Always preserve the user requested cpumask
Unconditionally preserve the user requested cpumask on
sched_setaffinity() calls. This allows using it outside of the fairly
narrow restrict_cpus_allowed_ptr() use-case and fix some cpuset issues
that currently suffer destruction of cpumasks.

Change-Id: I24bb8639d91785e308188837b202a12fa3142d43
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220922180041.1768141-3-longman@redhat.com
BUG: 254447891
(cherry picked from commit c2a8c7b835ad6c72a1ff29d211f3aed454b29183
 https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core)
[ashayj: Resolved minor conflict in kernel/sched/core.c]
Signed-off-by: Ashay Jaiswal <quic_ashayj@quicinc.com>
2022-10-24 08:40:21 +00:00
Waiman Long
54aeb5c372 BACKPORT: FROMGIT: sched: Introduce affinity_context
In order to prepare for passing through additional data through the
affinity call-chains, convert the mask and flags argument into a
structure.

Change-Id: Ib6332dfb83895be127cf304800e7a65c4bc78c99
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220922180041.1768141-5-longman@redhat.com
BUG: 254447891
(cherry picked from commit 7ecbcde6e527c3828adee88b1e7edeb01e7895fb
 https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core)
[ashayj: Resolved minor conflict in kernel/sched/core.c]
Signed-off-by: Ashay Jaiswal <quic_ashayj@quicinc.com>
2022-10-24 08:40:20 +00:00
Waiman Long
5001781910 FROMGIT: sched: Add __releases annotations to affine_move_task()
affine_move_task() assumes task_rq_lock() has been called and it does
an implicit task_rq_unlock() before returning. Add the appropriate
__releases annotations to make this clear.

A typo error in comment is also fixed.

Change-Id: Ib02de3bb81d21489dc6e16a89a8cc0b0c64ccce8
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220922180041.1768141-2-longman@redhat.com
BUG: 254447891
(cherry picked from commit 9722bb2bbcb7ff70c9f7fdbe71a39b5f1f6aa428
 https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core)
Signed-off-by: Ashay Jaiswal <quic_ashayj@quicinc.com>
2022-10-24 08:40:20 +00:00
Kuniyuki Iwashima
545bb8eb6b FROMGIT: seccomp: Move copy_seccomp() to no failure path.
Our syzbot instance reported memory leaks in do_seccomp() [0], similar
to the report [1].  It shows that we miss freeing struct seccomp_filter
and some objects included in it.

We can reproduce the issue with the program below [2] which calls one
seccomp() and two clone() syscalls.

The first clone()d child exits earlier than its parent and sends a
signal to kill it during the second clone(), more precisely before the
fatal_signal_pending() test in copy_process().  When the parent receives
the signal, it has to destroy the embryonic process and return -EINTR to
user space.  In the failure path, we have to call seccomp_filter_release()
to decrement the filter's refcount.

Initially, we called it in free_task() called from the failure path, but
the commit 3a15fb6ed9 ("seccomp: release filter after task is fully
dead") moved it to release_task() to notify user space as early as possible
that the filter is no longer used.

To keep the change and current seccomp refcount semantics, let's move
copy_seccomp() just after the signal check and add a WARN_ON_ONCE() in
free_task() for future debugging.

[0]:
unreferenced object 0xffff8880063add00 (size 256):
  comm "repro_seccomp", pid 230, jiffies 4294687090 (age 9.914s)
  hex dump (first 32 bytes):
    01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
    ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ................
  backtrace:
    do_seccomp (./include/linux/slab.h:600 ./include/linux/slab.h:733 kernel/seccomp.c:666 kernel/seccomp.c:708 kernel/seccomp.c:1871 kernel/seccomp.c:1991)
    do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
unreferenced object 0xffffc90000035000 (size 4096):
  comm "repro_seccomp", pid 230, jiffies 4294687090 (age 9.915s)
  hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 00 00 00 00 05 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    __vmalloc_node_range (mm/vmalloc.c:3226)
    __vmalloc_node (mm/vmalloc.c:3261 (discriminator 4))
    bpf_prog_alloc_no_stats (kernel/bpf/core.c:91)
    bpf_prog_alloc (kernel/bpf/core.c:129)
    bpf_prog_create_from_user (net/core/filter.c:1414)
    do_seccomp (kernel/seccomp.c:671 kernel/seccomp.c:708 kernel/seccomp.c:1871 kernel/seccomp.c:1991)
    do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
unreferenced object 0xffff888003fa1000 (size 1024):
  comm "repro_seccomp", pid 230, jiffies 4294687090 (age 9.915s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    bpf_prog_alloc_no_stats (./include/linux/slab.h:600 ./include/linux/slab.h:733 kernel/bpf/core.c:95)
    bpf_prog_alloc (kernel/bpf/core.c:129)
    bpf_prog_create_from_user (net/core/filter.c:1414)
    do_seccomp (kernel/seccomp.c:671 kernel/seccomp.c:708 kernel/seccomp.c:1871 kernel/seccomp.c:1991)
    do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
unreferenced object 0xffff888006360240 (size 16):
  comm "repro_seccomp", pid 230, jiffies 4294687090 (age 9.915s)
  hex dump (first 16 bytes):
    01 00 37 00 76 65 72 6c e0 83 01 06 80 88 ff ff  ..7.verl........
  backtrace:
    bpf_prog_store_orig_filter (net/core/filter.c:1137)
    bpf_prog_create_from_user (net/core/filter.c:1428)
    do_seccomp (kernel/seccomp.c:671 kernel/seccomp.c:708 kernel/seccomp.c:1871 kernel/seccomp.c:1991)
    do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
unreferenced object 0xffff8880060183e0 (size 8):
  comm "repro_seccomp", pid 230, jiffies 4294687090 (age 9.915s)
  hex dump (first 8 bytes):
    06 00 00 00 00 00 ff 7f                          ........
  backtrace:
    kmemdup (mm/util.c:129)
    bpf_prog_store_orig_filter (net/core/filter.c:1144)
    bpf_prog_create_from_user (net/core/filter.c:1428)
    do_seccomp (kernel/seccomp.c:671 kernel/seccomp.c:708 kernel/seccomp.c:1871 kernel/seccomp.c:1991)
    do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

[1]: https://syzkaller.appspot.com/bug?id=2809bb0ac77ad9aa3f4afe42d6a610aba594a987

[2]:

void main(void)
{
	struct sock_filter filter[] = {
		BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
	};
	struct sock_fprog fprog = {
		.len = sizeof(filter) / sizeof(filter[0]),
		.filter = filter,
	};
	long i, pid;

	syscall(__NR_seccomp, SECCOMP_SET_MODE_FILTER, 0, &fprog);

	for (i = 0; i < 2; i++) {
		pid = syscall(__NR_clone, CLONE_NEWNET | SIGKILL, NULL, NULL, 0);
		if (pid == 0)
			return;
	}
}

Fixes: 3a15fb6ed9 ("seccomp: release filter after task is fully dead")
Reported-by: syzbot+ab17848fe269b573eb71@syzkaller.appspotmail.com
Reported-by: Ayushman Dutta <ayudutta@amazon.com>
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Bug: 253979890
Change-Id: I2b1b4fbdc21e5fb6ad303326c8a0a3f5fe7260e3
(cherry picked from commit 6d17452707cae0b8b271172a3a6cc25b1e14a868
         https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-linus/seccomp)
Reported-by: Jing Xia <jing.xia@unisoc.com>
Signed-off-by: Cixi Geng <cixi.geng1@unisoc.com>
2022-10-20 15:26:56 +00:00
Vamsi Krishna Lanka
41300cf104 BACKPORT: FROMLIST: tracing: Add register read/write tracing support
Generic MMIO read/write i.e., __raw_{read,write}{b,l,w,q} accessors
are typically used to read/write from/to memory mapped registers
and can cause hangs or some undefined behaviour in following few
cases,

* If the access to the register space is unclocked, for example: if
there is an access to multimedia(MM) block registers without MM
clocks.

* If the register space is protected and not set to be accessible from
non-secure world, for example: only EL3 (EL: Exception level) access
is allowed and any EL2/EL1 access is forbidden.

* If xPU(memory/register protection units) is controlling access to
certain memory/register space for specific clients.

and more...

Such cases usually results in instant reboot/SErrors/NOC or interconnect
hangs and tracing these register accesses can be very helpful to debug
such issues during initial development stages and also in later stages.

So use ftrace trace events to log such MMIO register accesses which
provides rich feature set such as early enablement of trace events,
filtering capability, dumping ftrace logs on console and many more.

Sample output:

rwmmio_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
rwmmio_post_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
rwmmio_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 addr=0xfffffbfffdbff610
rwmmio_post_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 val=0x0 addr=0xfffffbfffdbff610

Bug: 169045115
Change-Id: I54326beb3aeabfc9a57c2de76a916754bdb6fe76
Link: https://lore.kernel.org/lkml/f6d1b9e9d70968b506bdfd1b77129cb751b9df9d.1652891705.git.quic_saipraka@quicinc.com/
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Co-developed-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com>
Signed-off-by: Vamsi Krishna Lanka <quic_vamslank@quicinc.com>
2022-10-20 07:37:17 +00:00
keystone-kernel-automerger
842ebad6f6 Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15-2022-09:
  ANDROID: abi_gki_aarch64_qcom: Add hibernation APIs
  ANDROID: vendor hooks: Add hooks to support bootloader based hibernation
  ANDROID: GKI: Add symbol list file for sunxi

Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com>
Change-Id: I90eb14b0ae0636a9fbd5164b1ca17f248fe53e82
2022-10-20 06:22:22 +00:00
Shreyas K K
c5dcb9e1e5 ANDROID: vendor hooks: Add hooks to support bootloader based hibernation
Add vendor hooks to disable randomization of swap slot allocation for
swap partition used for saving hibernation image. Another level of
randomization of swap slots takes place at the firmware level as well
in order to address the wear leveling for UFS/MMC devices, so this
vendor hook checks if a block device represents the swap partition being
used for saving hibernation image, if yes, the swap slot allocation for
such partition is serialized at kernel level.

There is a performance advantage of reading contiguous pages of hibernation
image, it makes the restore logic of hibernation image simpler and faster
as there are no seeks involved in the secondary storage to read multiple
contiguous pages of the image.

Bug: 228815136
Change-Id: Id41a25fd15c5b71f8064f1c7d9cac9560fcdde85
Signed-off-by: Vivek Kumar <quic_vivekuma@quicinc.com>
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
2022-10-19 23:38:42 +00:00
Tetsuo Handa
95af22de88 UPSTREAM: cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()
syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at
cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix
threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that
cpuset_attach() is also called from cgroup_attach_task_all().
Add cpus_read_lock() like what cgroup_procs_write_start() does.

Bug:254143784

Link: https://syzkaller.appspot.com/bug?extid=29d3a3b4d86c8136ad9e [1]
Reported-by: syzbot <syzbot+29d3a3b4d86c8136ad9e@syzkaller.appspotmail.com>
Change-Id: I21d7ca425c91efe5773ad5e9aec2bbf1c09737f5
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock")
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 43626dade36fa74d3329046f4ae2d7fdefe401c6
git: //git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
2022-10-18 11:09:08 +08:00
wangting11
546f62e71b FROMLIST: locking/rwsem: Limit # of null owner retries for handoff writer
Commit 91d2a812df ("locking/rwsem: Make handoff writer optimistically
spin on owner") assumes that when the owner field is changed to NULL,
the lock will become free soon.  That assumption may not be correct
especially if the handoff writer doing the spinning is a RT task which
may preempt another task from completing its action of either freeing
the rwsem or properly setting up owner.

To prevent this live lock scenario, we have to limit the number of
trylock attempts without sleeping. The current limit is now set to 8
to allow enough time for the other task to hopefully complete its action.

By adding new lock events to track the number of NULL owner retries with
handoff flag set before a successful trylock when running a 96 threads
locking microbenchmark with equal number of readers and writers running
on a 2-core 96-thread system for 15 seconds, the following stats are
obtained. Note that none of locking threads are RT tasks.

  Retries of successful trylock    Count
  -----------------------------    -----
             1                     1738
             2                       19
             3                       11
             4                        2
             5                        1
             6                        1
             7                        1
             8                        0
             X                        1

The last row is the one failed attempt that needs more than 8 retries.
So a retry count maximum of 8 should capture most of them if no RT task
is in the mix.

Bug: 252734649

Fixes: 91d2a812df ("locking/rwsem: Make handoff writer optimistically spin on owner")
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/lkml/20221012133333.1265281-3-longman@redhat.com/
Signed-off-by: wangting11 <wangting11@xiaomi.com>
Change-Id: Iae4764e9a74d78e2ec7019947cb8bd9e64c26d78
2022-10-17 19:20:11 +00:00
wangting11
4ece302f35 FROMLIST: locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath
A non-first waiter can potentially spin in the for loop of
rwsem_down_write_slowpath() without sleeping but fail to acquire the
lock even if the rwsem is free if the following sequence happens:

  Non-first waiter       First waiter      Lock holder
  ----------------       ------------      -----------
  Acquire wait_lock
  rwsem_try_write_lock():
    Set handoff bit if RT or
      wait too long
    Set waiter->handoff_set
  Release wait_lock
                         Acquire wait_lock
                         Inherit waiter->handoff_set
                         Release wait_lock
					   Clear owner
                                           Release lock
  if (waiter.handoff_set) {
    rwsem_spin_on_owner(();
    if (OWNER_NULL)
      goto trylock_again;
  }
  trylock_again:
  Acquire wait_lock
  rwsem_try_write_lock():
     if (first->handoff_set && (waiter != first))
     	return false;
  Release wait_lock

It is especially problematic if the non-first waiter is an RT task and
it is running on the same CPU as the first waiter as this can lead to
live lock.

Bug: 252734649

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/lkml/20221012133333.1265281-2-longman@redhat.com/
Signed-off-by: wangting11 <wangting11@xiaomi.com>
Change-Id: I6ac35872e8aae61cb96cd87a365a2d44d66961eb
2022-10-17 19:20:11 +00:00
Waiman Long
24d27dff64 BACKPORT: locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
With commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
consistent"), the writer that sets the handoff bit can be interrupted
out without clearing the bit if the wait queue isn't empty. This disables
reader and writer optimistic lock spinning and stealing.

Now if a non-first writer in the queue is somehow woken up or a new
waiter enters the slowpath, it can't acquire the lock.  This is not the
case before commit d257cc8cb8d5 as the writer that set the handoff bit
will clear it when exiting out via the out_nolock path. This is less
efficient as the busy rwsem stays in an unlock state for a longer time.

In some cases, this new behavior may cause lockups as shown in [1] and
[2].

This patch allows a non-first writer to ignore the handoff bit if it
is not originally set or initiated by the first waiter. This patch is
shown to be effective in fixing the lockup problem reported in [1].

[1] https://lore.kernel.org/lkml/20220617134325.GC30825@techsingularity.net/
[2] https://lore.kernel.org/lkml/3f02975c-1a9d-be20-32cf-f1d8e3dfafcc@oracle.com/

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20220622200419.778799-1-longman@redhat.com

Bug: 252974225
Change-Id: I12bdf85a2d4f0011146c42256f49f365de3c96fa
(cherry picked from commit 6eebd5fb20838f5971ba17df9f55cc4f84a31053)
Signed-off-by: Cixi Geng <cixi.geng1@unisoc.com>
2022-10-14 19:05:44 +00:00
Daniel Mentz
16a71479cb ANDROID: sched: Export sched_domains_mutex for lockdep
If CONFIG_LOCKDEP is enabled, export sched_domains_mutex as it is
indirectly accessed by the macro for_each_domain, and that macro might
be used in module code.

Bug: 176254015
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Change-Id: Ia9f2989de41b2224c63855f2fd129cbeeac4f195
Signed-off-by: Will McVicker <willmcvicker@google.com>
(cherry picked from commit 7171a5de9832b53122a9787586e18e9523b5b47e)
2022-10-11 19:59:58 +00:00
Shreyas K K
d2cb755a43 ANDROID: vendor hooks: Add hooks to support bootloader based hibernation
Add vendor hooks to disable randomization of swap slot allocation for
swap partition used for saving hibernation image. Another level of
randomization of swap slots takes place at the firmware level as well
in order to address the wear leveling for UFS/MMC devices, so this
vendor hook checks if a block device represents the swap partition being
used for saving hibernation image, if yes, the swap slot allocation for
such partition is serialized at kernel level.

There is a performance advantage of reading contiguous pages of hibernation
image, it makes the restore logic of hibernation image simpler and faster
as there are no seeks involved in the secondary storage to read multiple
contiguous pages of the image.

Bug: 228815136
Change-Id: Id41a25fd15c5b71f8064f1c7d9cac9560fcdde85
Signed-off-by: Vivek Kumar <quic_vivekuma@quicinc.com>
Signed-off-by: Shreyas K K <quic_shrekk@quicinc.com>
2022-10-11 18:35:52 +00:00
Suren Baghdasaryan
2455f6610a ANDROID: fix ABI breakage in struct psi_group
New atomic member addition into struct psi_group breaks ABI. Fix the
breakage by reusing poll_wakeup atomic and using atomic bit operations
to represent both wakeup and scheduled states.

Bug: 251702862
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I6c25d756bd7afafe8590373f16a662d02bf15b50
2022-10-11 17:02:21 +00:00
Suren Baghdasaryan
9ed7219990 FROMLIST: psi: stop relying on timer_pending for poll_work rescheduling
Psi polling mechanism is trying to minimize the number of wakeups to
run psi_poll_work and is currently relying on timer_pending() to detect
when this work is already scheduled. This provides a window of opportunity
for psi_group_change to schedule an immediate psi_poll_work after
poll_timer_fn got called but before psi_poll_work could reschedule itself.
Below is the depiction of this entire window:

poll_timer_fn
  wake_up_interruptible(&group->poll_wait);

psi_poll_worker
  wait_event_interruptible(group->poll_wait, ...)
  psi_poll_work
    psi_schedule_poll_work
      if (timer_pending(&group->poll_timer)) return;
      ...
      mod_timer(&group->poll_timer, jiffies + delay);

Prior to 461daba06b we used to rely on poll_scheduled atomic which was
reset and set back inside psi_poll_work and therefore this race window
was much smaller.
The larger window causes increased number of wakeups and our partners
report visible power regression of ~10mA after applying 461daba06b.
Bring back the poll_scheduled atomic and make this race window even
narrower by resetting poll_scheduled only when we reach polling expiration
time. This does not completely eliminate the possibility of extra wakeups
caused by a race with psi_group_change however it will limit it to the
worst case scenario of one extra wakeup per every tracking window (0.5s
in the worst case).
This patch also ensures correct ordering between clearing poll_scheduled
flag and obtaining changed_states using memory barrier. Correct ordering
between updating changed_states and setting poll_scheduled is ensured by
atomic_xchg operation.
By tracing the number of immediate rescheduling attempts performed by
psi_group_change and the number of these attempts being blocked due to
psi monitor being already active, we can assess the effects of this change:

Before the patch:
                                           Run#1    Run#2      Run#3
Immediate reschedules attempted:           684365   1385156    1261240
Immediate reschedules blocked:             682846   1381654    1258682
Immediate reschedules (delta):             1519     3502       2558
Immediate reschedules (% of attempted):    0.22%    0.25%      0.20%

After the patch:
                                           Run#1    Run#2      Run#3
Immediate reschedules attempted:           882244   770298    426218
Immediate reschedules blocked:             881996   769796    426074
Immediate reschedules (delta):             248      502       144
Immediate reschedules (% of attempted):    0.03%    0.07%     0.03%

The number of non-blocked immediate reschedules dropped from 0.22-0.25%
to 0.03-0.07%. The drop is attributed to the decrease in the race window
size and the fact that we allow this race only when psi monitors reach
polling window expiration time.

Fixes: 461daba06b ("psi: eliminate kthread_worker from psi trigger scheduling mechanism")
Reported-by: Kathleen Chang <yt.chang@mediatek.com>
Reported-by: Wenju Xu <wenju.xu@mediatek.com>
Reported-by: Jonathan Chen <jonathan.jmchen@mediatek.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: SH Chen <show-hong.chen@mediatek.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Link: https://lore.kernel.org/patchwork/patch/1455172/
Bug: 191127654
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ie61547ca043e702442a9c6db1468cfb60ff2e729
2022-10-11 17:02:21 +00:00
deyaoren@google.com
ce2b7f1ba1 Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15-2022-09: (145 commits)
  ANDROID: GKI: Update abi_gki_aarch64_qcom
  ANDROID: abi_gki_aarch64_qcom: Add iio symbol list for qcom
  ANDROID: gki_config: enable F2FS_UNFAIR_RWSEM
  ANDROID: GKI: Update abi_gki_aarch64_qcom symbols.
  ANDROID: abi_gki_aarch64_qcom: Add protocol related symbols
  ANDROID: scsi: ufs: Improve MCQ error handling
  BACKPORT: writeback avoid use-after-free after removing device
  ANDROID: GKI: Add symbol list file for sunxi
  UPSTREAM: usb: dwc3: gadget: Avoid duplicate requests to enable Run/Stop
  Revert "FROMLIST: usb: dwc3: gadget: Avoid duplicate requests to enable Run/Stop"
  ANDROID: fix add vendor hooks for unusual abort cases
  UPSTREAM: Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()"
  UPSTREAM: kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22}
  ANDROID: ABI: Update symbols to unisoc whitelist
  ANDROID: ABI: Update symbols to unisoc whitelist for sync from local code
  ANDROID: ABI: Update symbols to unisoc whitelist
  ANDROID: GKI: Update symbol list for sunxi
  ANDROID: GKI: Update symbol list
  ANDROID: ABI: Update symbols for unisoc whitelist Android13-k5.15
  ANDROID: net: export symbol for tracepoint_consume_skb
  ...

Change-Id: I3398f01fd36158fe453000a61851e96440d03588
2022-10-11 03:32:05 +00:00
Greg Kroah-Hartman
43eb03f7ce Merge 5.15.72 into android13-5.15-lts
Changes in 5.15.72
	ALSA: hda: Do disconnect jacks at codec unbind
	ALSA: hda: Fix hang at HD-audio codec unbinding due to refcount saturation
	ALSA: hda: Fix Nvidia dp infoframe
	ALSA: hda/realtek: fix speakers and micmute on HP 855 G8
	cgroup: reduce dependency on cgroup_mutex
	cgroup: cgroup_get_from_id() must check the looked-up kn is a directory
	uas: add no-uas quirk for Hiksemi usb_disk
	usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS
	uas: ignore UAS for Thinkplus chips
	usb: typec: ucsi: Remove incorrect warning
	thunderbolt: Explicitly reset plug events delay back to USB4 spec value
	net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
	Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address
	can: c_can: don't cache TX messages for C_CAN cores
	clk: ingenic-tcu: Properly enable registers before accessing timers
	x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxd
	ARM: dts: integrator: Tag PCI host with device_type
	ntfs: fix BUG_ON in ntfs_lookup_inode_by_name()
	mm/damon/dbgfs: fix memory leak when using debugfs_lookup()
	net: mt7531: only do PLL once after the reset
	Revert "firmware: arm_scmi: Add clock management to the SCMI power domain"
	drm/i915/gt: Restrict forced preemption to the active context
	drm/amdgpu: Add amdgpu suspend-resume code path under SRIOV
	vduse: prevent uninitialized memory accesses
	libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
	mmc: moxart: fix 4-bit bus width and remove 8-bit bus width
	mmc: hsq: Fix data stomping during mmc recovery
	mm/page_alloc: fix race condition between build_all_zonelists and page allocation
	mm: prevent page_frag_alloc() from corrupting the memory
	mm: fix dereferencing possible ERR_PTR
	mm/migrate_device.c: flush TLB while holding PTL
	mm: fix madivse_pageout mishandling on non-LRU page
	mm,hwpoison: check mm when killing accessing process
	media: dvb_vb2: fix possible out of bound access
	media: rkvdec: Disable H.264 error detection
	media: v4l2-compat-ioctl32.c: zero buffer passed to v4l2_compat_get_array_args()
	swiotlb: max mapping size takes min align mask into account
	ARM: dts: am33xx: Fix MMCHS0 dma properties
	reset: imx7: Fix the iMX8MP PCIe PHY PERST support
	ARM: dts: am5748: keep usb4_tm disabled
	soc: sunxi: sram: Actually claim SRAM regions
	soc: sunxi: sram: Prevent the driver from being unbound
	soc: sunxi_sram: Make use of the helper function devm_platform_ioremap_resource()
	soc: sunxi: sram: Fix probe function ordering issues
	soc: sunxi: sram: Fix debugfs info for A64 SRAM C
	ASoC: imx-card: Fix refcount issue with of_node_put
	arm64: dts: qcom: sm8350: fix UFS PHY serdes size
	ASoC: tas2770: Reinit regcache on reset
	drm/bridge: lt8912b: add vsync hsync
	drm/bridge: lt8912b: set hdmi or dvi mode
	drm/bridge: lt8912b: fix corrupted image output
	Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"
	Input: melfas_mip4 - fix return value check in mip4_probe()
	gpio: mvebu: Fix check for pwm support on non-A8K platforms
	usbnet: Fix memory leak in usbnet_disconnect()
	net: sched: act_ct: fix possible refcount leak in tcf_ct_init()
	cxgb4: fix missing unlock on ETHOFLD desc collect fail path
	net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
	nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices
	wifi: mac80211: fix regression with non-QoS drivers
	net: stmmac: power up/down serdes in stmmac_open/release
	net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume()
	selftests: Fix the if conditions of in test_extra_filter()
	vdpa/ifcvf: fix the calculation of queuepair
	fs: split off setxattr_copy and do_setxattr function from setxattr
	clk: imx: imx6sx: remove the SET_RATE_PARENT flag for QSPI clocks
	clk: iproc: Do not rely on node name for correct PLL setup
	KVM: x86: Hide IA32_PLATFORM_DCA_CAP[31:0] from the guest
	x86/alternative: Fix race in try_get_desc()
	drm/i915/gem: Really move i915_gem_context.link under ref protection
	Linux 5.15.72

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I4f40f1ef2034412bab1d608ec6b17f3da2508de7
2022-10-05 18:58:29 +02:00
Tianyu Lan
bfe5dc2101 swiotlb: max mapping size takes min align mask into account
commit 82806744fd7dde603b64c151eeddaa4ee62193fd upstream.

swiotlb_find_slots() skips slots according to io tlb aligned mask
calculated from min aligned mask and original physical address
offset. This affects max mapping size. The mapping size can't
achieve the IO_TLB_SEGSIZE * IO_TLB_SIZE when original offset is
non-zero. This will cause system boot up failure in Hyper-V
Isolation VM where swiotlb force is enabled. Scsi layer use return
value of dma_max_mapping_size() to set max segment size and it
finally calls swiotlb_max_mapping_size(). Hyper-V storage driver
sets min align mask to 4k - 1. Scsi layer may pass 256k length of
request buffer with 0~4k offset and Hyper-V storage driver can't
get swiotlb bounce buffer via DMA API. Swiotlb_find_slots() can't
find 256k length bounce buffer with offset. Make swiotlb_max_mapping
_size() take min align mask into account.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rishabh Bhatnagar <risbhat@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-05 10:39:40 +02:00
Ming Lei
8484a356ce cgroup: cgroup_get_from_id() must check the looked-up kn is a directory
[ Upstream commit df02452f3df069a59bc9e69c84435bf115cb6e37 ]

cgroup has to be one kernfs dir, otherwise kernel panic is caused,
especially cgroup id is provide from userspace.

Reported-by: Marco Patalano <mpatalan@redhat.com>
Fixes: 6b658c4863 ("scsi: cgroup: Add cgroup_get_from_id()")
Cc: Muneendra <muneendra.kumar@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-05 10:39:36 +02:00
Shakeel Butt
ae04dd5ef1 cgroup: reduce dependency on cgroup_mutex
[ Upstream commit be288169712f3dea0bc6b50c00b3ab53d85f1435 ]

Currently cgroup_get_from_path() and cgroup_get_from_id() grab
cgroup_mutex before traversing the default hierarchy to find the
kernfs_node corresponding to the path/id and then extract the linked
cgroup. Since cgroup_mutex is still held, it is guaranteed that the
cgroup will be alive and the reference can be taken on it.

However similar guarantee can be provided without depending on the
cgroup_mutex and potentially reducing avenues of cgroup_mutex contentions.
The kernfs_node's priv pointer is RCU protected pointer and with just
rcu read lock we can grab the reference on the cgroup without
cgroup_mutex. So, remove cgroup_mutex from them.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Stable-dep-of: df02452f3df0 ("cgroup: cgroup_get_from_id() must check the looked-up kn is a directory")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-05 10:39:35 +02:00
keystone-kernel-automerger
5f81036db9 Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15-2022-08:
  ANDROID: GKI: Update oplus symbols to symbol list
  ANDROID: fix add vendor hooks for unusual abort cases
  ANDROID: GKI: Update symbols to symbol list
  ANDROID: mm: Add vendor hook in rmqueue()
  ANDROID: vendor_hooks: tune reclaim inactive ratio
  ANDROID: vendor_hooks: Add hooks for cpufreq_acct_update_power
  ANDROID: vendor_hooks: Add hooks for ipa
  ANDROID: vendor_hook: rename the the name of hooks
  ANDROID: vendor_hook: add hooks to protect locking-tsk in cpu scheduler
  ANDROID: vendor_hook: Add hook to not be stuck ro rmap lock in kswapd or direct_reclaim
  ANDROID: vendor_hooks: Add hooks for oem futex optimization
  ANDROID: export reclaim_pages
  ANDROID: Revert "psi: allow unprivileged users with CAP_SYS_RESOURCE to write psi files"

Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com>
Change-Id: Ie6e71bb0c9c55dfc76c25cf1ed592ce3b246b24f
2022-10-05 06:19:58 +00:00
Peifeng Li
eb23b4257d ANDROID: vendor_hook: rename the the name of hooks
Renamed trace_android_vh_record_percpu_rwsem_lock_starttime to
trace_android_vh_record_pcpu_rwsem_starttime.

Because the orignal name is too long, which results to the
compile-err of .ko that uses the symbol:

ERROR: modpost:
too long symbol "__tracepoint_android_vh_record_percpu_rwsem_lock_starttime"

There is not any users of the the orignal hooks so that it is safe to
rename it.

Bug: 241191475
Bug: 250331401

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: Ie246a933414db5e9e28a65a4c280fae3a1cbefe3
(cherry picked from commit ac82d34706)
2022-10-03 07:21:02 +00:00
Peifeng Li
05acd75c53 ANDROID: vendor_hook: add hooks to protect locking-tsk in cpu scheduler
Providing vendor hooks to record the start time of holding the lock, which
protects rwsem/mutex locking-process from being preemptedfor a short time
in some cases.

- android_vh_record_mutex_lock_starttime
- android_vh_record_rtmutex_lock_starttime
- android_vh_record_rwsem_lock_starttime
- android_vh_record_percpu_rwsem_lock_starttime

Bug: 241191475
Bug: 250331401

Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: I0e967a1e8b77c32a1ad588acd54028fae2f90c4e
(cherry picked from commit f729494767)
2022-10-03 07:20:38 +00:00
xieliujie
912f87af05 ANDROID: vendor_hooks: Add hooks for oem futex optimization
If an important task is going to sleep through do_futex(),
find out it's futex-owner by the pid comes from userspace,
and boost the owner by some means to shorten the sleep time.
How to boost? Depends on these hooks:
commit 53e809978443 ("ANDROID: vendor_hooks: Add hooks for scheduler")

Bug: 243110112
Bug: 250331401

Signed-off-by: xieliujie <xieliujie@oppo.com>
Change-Id: I9a315cfb414fd34e0ef7a2cf9d57df50d4dd984f
(cherry picked from commit 548da5d23d98b796cf9a478675622a606b3307c8)
(cherry picked from commit cc724041ce)
2022-10-03 07:19:33 +00:00
Greg Kroah-Hartman
4305285a35 Merge 5.15.71 into android13-5.15-lts
Changes in 5.15.71
	drm/amdgpu: Separate vf2pf work item init from virt data exchange
	drm/amdgpu: make sure to init common IP before gmc
	staging: r8188eu: Remove support for devices with 8188FU chipset (0bda:f179)
	staging: r8188eu: Add Rosewill USB-N150 Nano to device tables
	usb: dwc3: gadget: Avoid starting DWC3 gadget during UDC unbind
	usb: dwc3: Issue core soft reset before enabling run/stop
	usb: dwc3: gadget: Prevent repeat pullup()
	usb: dwc3: gadget: Refactor pullup()
	usb: dwc3: gadget: Don't modify GEVNTCOUNT in pullup()
	usb: dwc3: gadget: Avoid duplicate requests to enable Run/Stop
	usb: add quirks for Lenovo OneLink+ Dock
	usb: gadget: udc-xilinx: replace memcpy with memcpy_toio
	Revert "usb: add quirks for Lenovo OneLink+ Dock"
	Revert "usb: gadget: udc-xilinx: replace memcpy with memcpy_toio"
	drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES
	USB: core: Fix RST error in hub.c
	USB: serial: option: add Quectel BG95 0x0203 composition
	USB: serial: option: add Quectel RM520N
	Revert "ALSA: usb-audio: Split endpoint setups for hw_params and prepare"
	ALSA: core: Fix double-free at snd_card_new()
	ALSA: hda/tegra: set depop delay for tegra
	ALSA: hda: add Intel 5 Series / 3400 PCI DID
	ALSA: hda/realtek: Add quirk for Huawei WRT-WX9
	ALSA: hda/realtek: Enable 4-speaker output Dell Precision 5570 laptop
	ALSA: hda/realtek: Re-arrange quirk table entries
	ALSA: hda/realtek: Add pincfg for ASUS G513 HP jack
	ALSA: hda/realtek: Add pincfg for ASUS G533Z HP jack
	ALSA: hda/realtek: Add quirk for ASUS GA503R laptop
	ALSA: hda/realtek: Enable 4-speaker output Dell Precision 5530 laptop
	iommu/vt-d: Check correct capability for sagaw determination
	btrfs: fix hang during unmount when stopping block group reclaim worker
	btrfs: fix hang during unmount when stopping a space reclaim worker
	media: flexcop-usb: fix endpoint type check
	usb: dwc3: core: leave default DMA if the controller does not support 64-bit DMA
	thunderbolt: Add support for Intel Maple Ridge single port controller
	efi: x86: Wipe setup_data on pure EFI boot
	efi: libstub: check Shim mode using MokSBStateRT
	wifi: mt76: fix reading current per-tid starting sequence number for aggregation
	gpio: mockup: fix NULL pointer dereference when removing debugfs
	gpio: mockup: Fix potential resource leakage when register a chip
	gpiolib: cdev: Set lineevent_state::irq after IRQ register successfully
	riscv: fix a nasty sigreturn bug...
	kasan: call kasan_malloc() from __kmalloc_*track_caller()
	can: flexcan: flexcan_mailbox_read() fix return value for drop = true
	net: mana: Add rmb after checking owner bits
	mm/slub: fix to return errno if kmalloc() fails
	mm: slub: fix flush_cpu_slab()/__free_slab() invocations in task context.
	KVM: x86: Inject #UD on emulated XSETBV if XSAVES isn't enabled
	arm64: topology: fix possible overflow in amu_fie_setup()
	vmlinux.lds.h: CFI: Reduce alignment of jump-table to function alignment
	xfs: reorder iunlink remove operation in xfs_ifree
	xfs: fix xfs_ifree() error handling to not leak perag ref
	xfs: validate inode fork size against fork format
	firmware: arm_scmi: Harden accesses to the reset domains
	firmware: arm_scmi: Fix the asynchronous reset requests
	arm64: dts: rockchip: Pull up wlan wake# on Gru-Bob
	arm64: dts: rockchip: Fix typo in lisense text for PX30.Core
	drm/mediatek: dsi: Add atomic {destroy,duplicate}_state, reset callbacks
	arm64: dts: rockchip: Set RK3399-Gru PCLK_EDP to 24 MHz
	dmaengine: ti: k3-udma-private: Fix refcount leak bug in of_xudma_dev_get()
	arm64: dts: rockchip: Remove 'enable-active-low' from rk3399-puma
	netfilter: nf_conntrack_sip: fix ct_sip_walk_headers
	netfilter: nf_conntrack_irc: Tighten matching on DCC message
	netfilter: nfnetlink_osf: fix possible bogus match in nf_osf_find()
	ice: Don't double unplug aux on peer initiated reset
	iavf: Fix cached head and tail value for iavf_get_tx_pending
	ipvlan: Fix out-of-bound bugs caused by unset skb->mac_header
	net: core: fix flow symmetric hash
	net: phy: aquantia: wait for the suspend/resume operations to finish
	scsi: qla2xxx: Fix memory leak in __qlt_24xx_handle_abts()
	scsi: mpt3sas: Fix return value check of dma_get_required_mask()
	net: bonding: Share lacpdu_mcast_addr definition
	net: bonding: Unsync device addresses on ndo_stop
	net: team: Unsync device addresses on ndo_stop
	drm/panel: simple: Fix innolux_g121i1_l01 bus_format
	MIPS: lantiq: export clk_get_io() for lantiq_wdt.ko
	MIPS: Loongson32: Fix PHY-mode being left unspecified
	um: fix default console kernel parameter
	iavf: Fix bad page state
	mlxbf_gige: clear MDIO gateway lock after read
	iavf: Fix set max MTU size with port VLAN and jumbo frames
	i40e: Fix VF set max MTU size
	i40e: Fix set max_tx_rate when it is lower than 1 Mbps
	sfc: fix TX channel offset when using legacy interrupts
	sfc: fix null pointer dereference in efx_hard_start_xmit
	drm/hisilicon/hibmc: Allow to be built if COMPILE_TEST is enabled
	drm/hisilicon: Add depends on MMU
	of: mdio: Add of_node_put() when breaking out of for_each_xx
	net: ipa: properly limit modem routing table use
	wireguard: ratelimiter: disable timings test by default
	wireguard: netlink: avoid variable-sized memcpy on sockaddr
	net: enetc: move enetc_set_psfp() out of the common enetc_set_features()
	net: enetc: deny offload of tc-based TSN features on VF interfaces
	net/sched: taprio: avoid disabling offload when it was never enabled
	net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
	netfilter: nf_tables: fix nft_counters_enabled underflow at nf_tables_addchain()
	netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
	netfilter: ebtables: fix memory leak when blob is malformed
	net: ravb: Fix PHY state warning splat during system resume
	net: sh_eth: Fix PHY state warning splat during system resume
	can: gs_usb: gs_can_open(): fix race dev->can.state condition
	perf stat: Fix BPF program section name
	perf jit: Include program header in ELF files
	perf kcore_copy: Do not check /proc/modules is unchanged
	perf tools: Honor namespace when synthesizing build-ids
	drm/mediatek: dsi: Move mtk_dsi_stop() call back to mtk_dsi_poweroff()
	net/smc: Stop the CLC flow if no link to map buffers on
	bonding: fix NULL deref in bond_rr_gen_slave_id
	net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD
	net: sched: fix possible refcount leak in tc_new_tfilter()
	bnxt: prevent skb UAF after handing over to PTP worker
	selftests: forwarding: add shebang for sch_red.sh
	KVM: x86/mmu: Fold rmap_recycle into rmap_add
	serial: fsl_lpuart: Reset prior to registration
	serial: Create uart_xmit_advance()
	serial: tegra: Use uart_xmit_advance(), fixes icount.tx accounting
	serial: tegra-tcu: Use uart_xmit_advance(), fixes icount.tx accounting
	s390/dasd: fix Oops in dasd_alias_get_start_dev due to missing pavgroup
	drm/amd/amdgpu: fixing read wrong pf2vf data in SRIOV
	Drivers: hv: Never allocate anything besides framebuffer from framebuffer memory region
	drm/gma500: Fix BUG: sleeping function called from invalid context errors
	drm/amd/pm: disable BACO entry/exit completely on several sienna cichlid cards
	drm/amdgpu: use dirty framebuffer helper
	drm/amd/display: Limit user regamma to a valid value
	drm/amd/display: Reduce number of arguments of dml31's CalculateWatermarksAndDRAMSpeedChangeSupport()
	drm/amd/display: Reduce number of arguments of dml31's CalculateFlipSchedule()
	drm/amd/display: Mark dml30's UseMinimumDCFCLK() as noinline for stack usage
	drm/rockchip: Fix return type of cdn_dp_connector_mode_valid
	fsdax: Fix infinite loop in dax_iomap_rw()
	workqueue: don't skip lockdep work dependency in cancel_work_sync()
	i2c: imx: If pm_runtime_get_sync() returned 1 device access is possible
	i2c: mlxbf: incorrect base address passed during io write
	i2c: mlxbf: prevent stack overflow in mlxbf_i2c_smbus_start_transaction()
	i2c: mlxbf: Fix frequency calculation
	drm/amdgpu: don't register a dirty callback for non-atomic
	NFSv4: Fixes for nfs4_inode_return_delegation()
	devdax: Fix soft-reservation memory description
	ext4: make directory inode spreading reflect flexbg size
	ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0
	ext4: limit the number of retries after discarding preallocations blocks
	ext4: make mballoc try target group first even with mb_optimize_scan
	ext4: avoid unnecessary spreading of allocations among groups
	ext4: use locality group preallocation for small closed files
	Linux 5.15.71

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I51916c0ad287a15f0b132f3a87c5c704de31b632
2022-09-30 15:04:19 +02:00
Greg Kroah-Hartman
c12d4a18f3 Merge 5.15.70 into android13-5.15-lts
Changes in 5.15.70
	drm/tegra: vic: Fix build warning when CONFIG_PM=n
	serial: atmel: remove redundant assignment in rs485_config
	tty: serial: atmel: Preserve previous USART mode if RS485 disabled
	of: fdt: fix off-by-one error in unflatten_dt_nodes()
	pinctrl: qcom: sc8180x: Fix gpio_wakeirq_map
	pinctrl: qcom: sc8180x: Fix wrong pin numbers
	pinctrl: rockchip: Enhance support for IRQ_TYPE_EDGE_BOTH
	pinctrl: sunxi: Fix name for A100 R_PIO
	NFSv4: Turn off open-by-filehandle and NFS re-export for NFSv4.0
	gpio: mpc8xxx: Fix support for IRQ_TYPE_LEVEL_LOW flow_type in mpc85xx
	drm/meson: Correct OSD1 global alpha value
	drm/meson: Fix OSD1 RGB to YCbCr coefficient
	block: blk_queue_enter() / __bio_queue_enter() must return -EAGAIN for nowait
	parisc: ccio-dma: Add missing iounmap in error path in ccio_probe()
	of/device: Fix up of_dma_configure_id() stub
	cifs: revalidate mapping when doing direct writes
	cifs: don't send down the destination address to sendmsg for a SOCK_STREAM
	cifs: always initialize struct msghdr smb_msg completely
	parisc: Allow CONFIG_64BIT with ARCH=parisc
	tools/include/uapi: Fix <asm/errno.h> for parisc and xtensa
	drm/amdgpu: Don't enable LTR if not supported
	drm/amdgpu: move nbio ih_doorbell_range() into ih code for vega
	drm/amdgpu: move nbio sdma_doorbell_range() into sdma code for vega
	binder: remove inaccurate mmap_assert_locked()
	video: fbdev: i740fb: Error out if 'pixclock' equals zero
	arm64: dts: juno: Add missing MHU secure-irq
	ASoC: nau8824: Fix semaphore unbalance at error paths
	regulator: pfuze100: Fix the global-out-of-bounds access in pfuze100_regulator_probe()
	scsi: lpfc: Return DID_TRANSPORT_DISRUPTED instead of DID_REQUEUE
	rxrpc: Fix local destruction being repeated
	rxrpc: Fix calc of resend age
	wifi: mac80211_hwsim: check length for virtio packets
	ALSA: hda/sigmatel: Keep power up while beep is enabled
	ALSA: hda/tegra: Align BDL entry to 4KB boundary
	net: usb: qmi_wwan: add Quectel RM520N
	afs: Return -EAGAIN, not -EREMOTEIO, when a file already locked
	MIPS: OCTEON: irq: Fix octeon_irq_force_ciu_mapping()
	drm/panfrost: devfreq: set opp to the recommended one to configure regulator
	mksysmap: Fix the mismatch of 'L0' symbols in System.map
	video: fbdev: pxa3xx-gcu: Fix integer overflow in pxa3xx_gcu_write
	net: Find dst with sk's xfrm policy not ctl_sk
	KVM: SEV: add cache flush to solve SEV cache incoherency issues
	cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()
	ALSA: hda/sigmatel: Fix unused variable warning for beep power change
	Linux 5.15.70

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ied5238d93303b6b4b4951c21ecf1c4562c34ae20
2022-09-30 14:47:37 +02:00
Greg Kroah-Hartman
4248b89fb1 Merge 5.15.69 into android13-5.15-lts
Changes in 5.15.69
	NFS: Fix WARN_ON due to unionization of nfs_inode.nrequests
	ACPI: resource: skip IRQ override on AMD Zen platforms
	ARM: dts: imx: align SPI NOR node name with dtschema
	ARM: dts: imx6qdl-kontron-samx6i: fix spi-flash compatible
	ARM: dts: at91: fix low limit for CPU regulator
	ARM: dts: at91: sama7g5ek: specify proper regulator output ranges
	lockdep: Fix -Wunused-parameter for _THIS_IP_
	x86/mm: Force-inline __phys_addr_nodebug()
	task_stack, x86/cea: Force-inline stack helpers
	tracing: hold caller_addr to hardirq_{enable,disable}_ip
	tracefs: Only clobber mode/uid/gid on remount if asked
	iommu/vt-d: Fix kdump kernels boot failure with scalable mode
	Input: goodix - add support for GT1158
	platform/surface: aggregator_registry: Add support for Surface Laptop Go 2
	drm/msm/rd: Fix FIFO-full deadlock
	dt-bindings: iio: gyroscope: bosch,bmg160: correct number of pins
	HID: ishtp-hid-clientHID: ishtp-hid-client: Fix comment typo
	hid: intel-ish-hid: ishtp: Fix ishtp client sending disordered message
	tg3: Disable tg3 device on system reboot to avoid triggering AER
	gpio: mockup: remove gpio debugfs when remove device
	ieee802154: cc2520: add rc code in cc2520_tx()
	Input: iforce - add support for Boeder Force Feedback Wheel
	nvmet-tcp: fix unhandled tcp states in nvmet_tcp_state_change()
	drm/amd/amdgpu: skip ucode loading if ucode_size == 0
	net: dsa: hellcreek: Print warning only once
	perf/arm_pmu_platform: fix tests for platform_get_irq() failure
	platform/x86: acer-wmi: Acer Aspire One AOD270/Packard Bell Dot keymap fixes
	usb: storage: Add ASUS <0x0b05:0x1932> to IGNORE_UAS
	mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region()
	soc: fsl: select FSL_GUTS driver for DPIO
	usb: gadget: f_uac2: clean up some inconsistent indenting
	usb: gadget: f_uac2: fix superspeed transfer
	RDMA/irdma: Use s/g array in post send only when its valid
	Input: goodix - add compatible string for GT1158
	Linux 5.15.69

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8bf545a2cbe110fc6c49e4b41f80fea8736f6f54
2022-09-30 13:27:38 +02:00
Suren Baghdasaryan
42a3fbd63e ANDROID: Revert "psi: allow unprivileged users with CAP_SYS_RESOURCE to write psi files"
This reverts commit 6db12ee045.

In Android, system_server registers psi trigger to detect memory
pressure. This commit requires processes registering new triggers to
have CAP_SYS_RESOURCE capability, which system_server does not have.
Reverting this change until a solution can be found to fix the breakage
of functionality in Android T using 5.15 kernels.

Bug: 243781242
Bug: 244148051
Reported-by: liuhailong <liuhailong@oppo.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If6c8580af8734f3b765d48c782a536aad357e6f0
(cherry picked from commit e1b8ef44fd)
2022-09-29 18:21:22 +00:00
deyaoren@google.com
22e5d45a69 Merge android13-5.15-2022-08_r8
* commit '393962a0c7aac14547b7df9ef16a39069b0a7c19':
  ANDROID: Expand user_struct size.
  ANDROID: vendor_hooks:vendor hook for mmput
  ANDROID: vendor_hooks:vendor hook for __alloc_pages_slowpath.

Signed-off-by: deyaoren@google.com <deyaoren@google.com>
Change-Id: I6bc8400bb6246558545d6fcb1423649e82499d8a
2022-09-29 16:26:07 +00:00
Tetsuo Handa
c71ec39be4 workqueue: don't skip lockdep work dependency in cancel_work_sync()
[ Upstream commit c0feea594e058223973db94c1c32a830c9807c86 ]

Like Hillf Danton mentioned

  syzbot should have been able to catch cancel_work_sync() in work context
  by checking lockdep_map in __flush_work() for both flush and cancel.

in [1], being unable to report an obvious deadlock scenario shown below is
broken. From locking dependency perspective, sync version of cancel request
should behave as if flush request, for it waits for completion of work if
that work has already started execution.

  ----------
  #include <linux/module.h>
  #include <linux/sched.h>
  static DEFINE_MUTEX(mutex);
  static void work_fn(struct work_struct *work)
  {
    schedule_timeout_uninterruptible(HZ / 5);
    mutex_lock(&mutex);
    mutex_unlock(&mutex);
  }
  static DECLARE_WORK(work, work_fn);
  static int __init test_init(void)
  {
    schedule_work(&work);
    schedule_timeout_uninterruptible(HZ / 10);
    mutex_lock(&mutex);
    cancel_work_sync(&work);
    mutex_unlock(&mutex);
    return -EINVAL;
  }
  module_init(test_init);
  MODULE_LICENSE("GPL");
  ----------

The check this patch restores was added by commit 0976dfc1d0
("workqueue: Catch more locking problems with flush_work()").

Then, lockdep's crossrelease feature was added by commit b09be676e0
("locking/lockdep: Implement the 'crossrelease' feature"). As a result,
this check was once removed by commit fd1a5b04df ("workqueue: Remove
now redundant lock acquisitions wrt. workqueue flushes").

But lockdep's crossrelease feature was removed by commit e966eaeeb6
("locking/lockdep: Remove the cross-release locking checks"). At this
point, this check should have been restored.

Then, commit d6e89786be ("workqueue: skip lockdep wq dependency in
cancel_work_sync()") introduced a boolean flag in order to distinguish
flush_work() and cancel_work_sync(), for checking "struct workqueue_struct"
dependency when called from cancel_work_sync() was causing false positives.

Then, commit 87915adc3f ("workqueue: re-add lockdep dependencies for
flushing") tried to restore "struct work_struct" dependency check, but by
error checked this boolean flag. Like an example shown above indicates,
"struct work_struct" dependency needs to be checked for both flush_work()
and cancel_work_sync().

Link: https://lkml.kernel.org/r/20220504044800.4966-1-hdanton@sina.com [1]
Reported-by: Hillf Danton <hdanton@sina.com>
Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com>
Fixes: 87915adc3f ("workqueue: re-add lockdep dependencies for flushing")
Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-28 11:11:56 +02:00
Peifeng Li
ac82d34706 ANDROID: vendor_hook: rename the the name of hooks
Renamed trace_android_vh_record_percpu_rwsem_lock_starttime to
trace_android_vh_record_pcpu_rwsem_starttime.

Because the orignal name is too long, which results to the
compile-err of .ko that uses the symbol:

ERROR: modpost:
too long symbol "__tracepoint_android_vh_record_percpu_rwsem_lock_starttime"

There is not any users of the the orignal hooks so that it is safe to
rename it.

Bug: 241191475
Signed-off-by: Peifeng Li <lipeifeng@oppo.com>
Change-Id: Ie246a933414db5e9e28a65a4c280fae3a1cbefe3
2022-09-28 17:07:23 +08:00
guchao
393962a0c7 ANDROID: Expand user_struct size.
Expand some fields outside user_struct for oem to use

Bug: 247468958
Bug: 248195516
Change-Id: I754e71d8a53a46fa1743a7364c2987af6b8e9205
Signed-off-by: guchao <guchao1@xiaomi.corp-partner.google.com>
(cherry picked from commit be69ad8227)
2022-09-28 05:36:21 +00:00
liuhailong
54f780d093 ANDROID: signal: Add vendor hook for memory reaping
Since commit 3bcdb496f496 ("ANDROID: signal: Add vendor hook
for memory reaping"), but *current* does not need.  Add another
hook here to determine killed process need be reaped.
(e.g. get_mm_counter)

Bug: 232062955
Change-Id: Ide13d3a2e8c199b063f41e071a2bf2fd60a91b04
Signed-off-by: liuhailong <liuhailong@oppo.com>
(cherry picked from commit e27ad1d21124c25259911574775fafb5f60bb53e)
2022-09-27 20:02:43 +00:00
Greg Kroah-Hartman
8c09081fc0 Revert "ANDROID: sched: Add vendor hook for util-update related functions"
This reverts commit eb994f2728.

The hooks android_rvh_attach_entity_load_avg,
android_rvh_detach_entity_load_avg, android_rvh_update_load_avg,
android_rvh_remove_entity_load_avg, and android_rvh_update_blocked_fair
are not used by any vendor, so remove it to help with merge issues with
future LTS releases.

If they are needed by any real user, it can easily be reverted to add
them back and then the symbols should be added to the abi list at the
same time to prevent it from being removed again later.

Bug: 203756332
Bug: 201260585
Cc: Rick Yiu <rickyiu@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ibb6f0d2b71e830092f8d3452fdf34c5d1975e71e
2022-09-27 11:41:40 +02:00
Peter Collingbourne
273bbfc4d7 BACKPORT: printk: stop including cache.h from printk.h
An inclusion of cache.h in printk.h was added in 2014 in commit
c28aa1f0a8 ("printk/cache: mark printk_once test variable
__read_mostly") in order to bring in the definition of __read_mostly.  The
usage of __read_mostly was later removed in commit 3ec25826ae ("printk:
Tie printk_once / printk_deferred_once into .data.once for reset") which
made the inclusion of cache.h unnecessary, so remove it.

We have a small amount of code that depended on the inclusion of cache.h
from printk.h; fix that code to include the appropriate header.

This fixes a circular inclusion on arm64 (linux/printk.h -> linux/cache.h
-> asm/cache.h -> linux/kasan-enabled.h -> linux/static_key.h ->
linux/jump_label.h -> linux/bug.h -> asm/bug.h -> linux/printk.h) that
would otherwise be introduced by the next patch.

Build tested using {allyesconfig,defconfig} x {arm64,x86_64}.

Link: https://linux-review.googlesource.com/id/I8fd51f72c9ef1f2d6afd3b2cbc875aa4792c1fba
Link: https://lkml.kernel.org/r/20220427195820.1716975-1-pcc@google.com
Signed-off-by: Peter Collingbourne <pcc@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 238956478
(cherry picked from commit 534aa1dc975ac883ad89110534585a96630802a0)
Change-Id: I46182e781b64561a1ebd5405628a317d4f6cb789
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
2022-09-26 23:57:43 +00:00
xiaofeng
2ecf8fb58d ANDROID: vendor_hooks:vendor hook for mmput
add vendor hook in mmput while mm_users decreased to 0.

Bug: 238821038
Bug: 248221914
Change-Id: I42a717cbeeb3176bac14b4b2391fdb2366c972d3
Signed-off-by: xiaofeng <xiaofeng5@xiaomi.com>
2022-09-26 12:37:17 +00:00
Greg Kroah-Hartman
35fc902dbd Merge 5.15.68 into android13-5.15-lts
Changes in 5.15.68
	net: wwan: iosm: remove pointless null check
	efi: libstub: Disable struct randomization
	efi: capsule-loader: Fix use-after-free in efi_capsule_write
	wifi: iwlegacy: 4965: corrected fix for potential off-by-one overflow in il4965_rs_fill_link_cmd()
	fs: only do a memory barrier for the first set_buffer_uptodate()
	Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()"
	scsi: qla2xxx: Disable ATIO interrupt coalesce for quad port ISP27XX
	scsi: megaraid_sas: Fix double kfree()
	drm/gem: Fix GEM handle release errors
	drm/amdgpu: Move psp_xgmi_terminate call from amdgpu_xgmi_remove_device to psp_hw_fini
	drm/amdgpu: Check num_gfx_rings for gfx v9_0 rb setup.
	drm/radeon: add a force flush to delay work when radeon
	scsi: ufs: core: Reduce the power mode change timeout
	Revert "parisc: Show error if wrong 32/64-bit compiler is being used"
	parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources()
	parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines
	arm64: cacheinfo: Fix incorrect assignment of signed error value to unsigned fw_level
	netfilter: conntrack: work around exceeded receive window
	cpufreq: check only freq_table in __resolve_freq()
	net/core/skbuff: Check the return value of skb_copy_bits()
	md: Flush workqueue md_rdev_misc_wq in md_alloc()
	fbdev: fbcon: Destroy mutex on freeing struct fb_info
	fbdev: chipsfb: Add missing pci_disable_device() in chipsfb_pci_init()
	drm/amdgpu: mmVM_L2_CNTL3 register not initialized correctly
	ALSA: pcm: oss: Fix race at SNDCTL_DSP_SYNC
	ALSA: emu10k1: Fix out of bounds access in snd_emu10k1_pcm_channel_alloc()
	ALSA: aloop: Fix random zeros in capture data when using jiffies timer
	ALSA: usb-audio: Split endpoint setups for hw_params and prepare
	ALSA: usb-audio: Fix an out-of-bounds bug in __snd_usb_parse_audio_interface()
	tracing: Fix to check event_mutex is held while accessing trigger list
	btrfs: zoned: set pseudo max append zone limit in zone emulation mode
	vfio/type1: Unpin zero pages
	kprobes: Prohibit probes in gate area
	debugfs: add debugfs_lookup_and_remove()
	sched/debug: fix dentry leak in update_sched_domain_debugfs
	drm/amd/display: fix memory leak when using debugfs_lookup()
	nvmet: fix a use-after-free
	drm/i915: Implement WaEdpLinkRateDataReload
	scsi: mpt3sas: Fix use-after-free warning
	scsi: lpfc: Add missing destroy_workqueue() in error path
	NFS: Further optimisations for 'ls -l'
	NFS: Save some space in the inode
	NFS: Fix another fsync() issue after a server reboot
	cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
	cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
	ASoC: qcom: sm8250: add missing module owner
	RDMA/rtrs-clt: Use the right sg_cnt after ib_dma_map_sg
	RDMA/rtrs-srv: Pass the correct number of entries for dma mapped SGL
	ARM: dts: imx6qdl-kontron-samx6i: remove duplicated node
	soc: imx: gpcv2: Assert reset before ungating clock
	regulator: core: Clean up on enable failure
	tee: fix compiler warning in tee_shm_register()
	RDMA/cma: Fix arguments order in net device validation
	soc: brcmstb: pm-arm: Fix refcount leak and __iomem leak bugs
	RDMA/hns: Fix supported page size
	RDMA/hns: Fix wrong fixed value of qp->rq.wqe_shift
	wifi: wilc1000: fix DMA on stack objects
	ARM: at91: pm: fix self-refresh for sama7g5
	ARM: at91: pm: fix DDR recalibration when resuming from backup and self-refresh
	ARM: dts: at91: sama5d27_wlsom1: specify proper regulator output ranges
	ARM: dts: at91: sama5d2_icp: specify proper regulator output ranges
	ARM: dts: at91: sama5d27_wlsom1: don't keep ldo2 enabled all the time
	ARM: dts: at91: sama5d2_icp: don't keep vdd_other enabled all the time
	netfilter: br_netfilter: Drop dst references before setting.
	netfilter: nf_tables: clean up hook list when offload flags check fails
	netfilter: nf_conntrack_irc: Fix forged IP logic
	RDMA/srp: Set scmnd->result only when scmnd is not NULL
	ALSA: usb-audio: Inform the delayed registration more properly
	ALSA: usb-audio: Register card again for iface over delayed_register option
	rxrpc: Fix ICMP/ICMP6 error handling
	rxrpc: Fix an insufficiently large sglist in rxkad_verify_packet_2()
	afs: Use the operation issue time instead of the reply time for callbacks
	Revert "net: phy: meson-gxl: improve link-up behavior"
	sch_sfb: Don't assume the skb is still around after enqueueing to child
	tipc: fix shift wrapping bug in map_get()
	net: introduce __skb_fill_page_desc_noacc
	tcp: TX zerocopy should not sense pfmemalloc status
	ice: use bitmap_free instead of devm_kfree
	i40e: Fix kernel crash during module removal
	iavf: Detach device during reset task
	xen-netback: only remove 'hotplug-status' when the vif is actually destroyed
	RDMA/siw: Pass a pointer to virt_to_page()
	ipv6: sr: fix out-of-bounds read when setting HMAC data.
	IB/core: Fix a nested dead lock as part of ODP flow
	RDMA/mlx5: Set local port to one when accessing counters
	erofs: fix pcluster use-after-free on UP platforms
	nvme-tcp: fix UAF when detecting digest errors
	nvme-tcp: fix regression that causes sporadic requests to time out
	tcp: fix early ETIMEDOUT after spurious non-SACK RTO
	nvmet: fix mar and mor off-by-one errors
	RDMA/irdma: Report the correct max cqes from query device
	RDMA/irdma: Return correct WC error for bind operation failure
	RDMA/irdma: Report RNR NAK generation in device caps
	sch_sfb: Also store skb len before calling child enqueue
	perf script: Fix Cannot print 'iregs' field for hybrid systems
	hwmon: (tps23861) fix byte order in resistance register
	ASoC: mchp-spdiftx: remove references to mchp_i2s_caps
	ASoC: mchp-spdiftx: Fix clang -Wbitfield-constant-conversion
	MIPS: loongson32: ls1c: Fix hang during startup
	kbuild: disable header exports for UML in a straightforward way
	i40e: Refactor tc mqprio checks
	i40e: Fix ADQ rate limiting for PF
	swiotlb: avoid potential left shift overflow
	iommu/amd: use full 64-bit value in build_completion_wait()
	s390/boot: fix absolute zero lowcore corruption on boot
	hwmon: (mr75203) fix VM sensor allocation when "intel,vm-map" not defined
	hwmon: (mr75203) update pvt->v_num and vm_num to the actual number of used sensors
	hwmon: (mr75203) fix voltage equation for negative source input
	hwmon: (mr75203) fix multi-channel voltage reading
	hwmon: (mr75203) enable polling for all VM channels
	Revert "arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags""
	arm64/bti: Disable in kernel BTI when cross section thunks are broken
	iommu/vt-d: Correctly calculate sagaw value of IOMMU
	arm64: errata: add detection for AMEVCNTR01 incrementing incorrectly
	drm/bridge: display-connector: implement bus fmts callbacks
	perf machine: Use path__join() to compose a path instead of snprintf(dir, '/', filename)
	ARM: at91: ddr: remove CONFIG_SOC_SAMA7 dependency
	Linux 5.15.68

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ie37701b41d9c35632876034bcdd0029594170af9
2022-09-23 14:45:07 +02:00
Tetsuo Handa
5db17805b6 cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()
commit 43626dade36fa74d3329046f4ae2d7fdefe401c6 upstream.

syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at
cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix
threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that
cpuset_attach() is also called from cgroup_attach_task_all().
Add cpus_read_lock() like what cgroup_procs_write_start() does.

Link: https://syzkaller.appspot.com/bug?extid=29d3a3b4d86c8136ad9e [1]
Reported-by: syzbot <syzbot+29d3a3b4d86c8136ad9e@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock")
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-23 14:15:52 +02:00
Greg Kroah-Hartman
affdbc37bd Merge 5.15.66 into android13-5.15-lts
Changes in 5.15.66
	drm/msm/dsi: fix the inconsistent indenting
	drm/msm/dp: delete DP_RECOVERED_CLOCK_OUT_EN to fix tps4
	drm/msm/dsi: Fix number of regulators for msm8996_dsi_cfg
	drm/msm/dsi: Fix number of regulators for SDM660
	platform/x86: pmc_atom: Fix SLP_TYPx bitfield mask
	iio: adc: mcp3911: make use of the sign bit
	skmsg: Fix wrong last sg check in sk_msg_recvmsg()
	bpf: Restrict bpf_sys_bpf to CAP_PERFMON
	bpf, cgroup: Fix kernel BUG in purge_effective_progs
	ieee802154/adf7242: defer destroy_workqueue call
	drm/i915/backlight: extract backlight code to a separate file
	drm/i915/display: avoid warnings when registering dual panel backlight
	ALSA: hda: intel-nhlt: remove use of __func__ in dev_dbg
	ALSA: hda: intel-nhlt: Correct the handling of fmt_config flexible array
	wifi: cfg80211: debugfs: fix return type in ht40allow_map_read()
	Revert "xhci: turn off port power in shutdown"
	net: sparx5: fix handling uneven length packets in manual extraction
	net: smsc911x: Stop and start PHY during suspend and resume
	openvswitch: fix memory leak at failed datapath creation
	net: dsa: xrs700x: Use irqsave variant for u64 stats update
	net: sched: tbf: don't call qdisc_put() while holding tree lock
	net/sched: fix netdevice reference leaks in attach_default_qdiscs()
	ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler
	mlxbf_gige: compute MDIO period based on i1clk
	kcm: fix strp_init() order and cleanup
	sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb
	tcp: annotate data-race around challenge_timestamp
	Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"
	net/smc: Remove redundant refcount increase
	soundwire: qcom: fix device status array range
	serial: fsl_lpuart: RS485 RTS polariy is inverse
	staging: rtl8712: fix use after free bugs
	staging: r8188eu: add firmware dependency
	powerpc: align syscall table for ppc32
	vt: Clear selection before changing the font
	musb: fix USB_MUSB_TUSB6010 dependency
	tty: serial: lpuart: disable flow control while waiting for the transmit engine to complete
	Input: iforce - wake up after clearing IFORCE_XMIT_RUNNING flag
	iio: ad7292: Prevent regulator double disable
	iio: adc: mcp3911: use correct formula for AD conversion
	misc: fastrpc: fix memory corruption on probe
	misc: fastrpc: fix memory corruption on open
	USB: serial: ftdi_sio: add Omron CS1W-CIF31 device id
	mmc: core: Fix UHS-I SD 1.8V workaround branch
	mmc: core: Fix inconsistent sd3_bus_mode at UHS-I SD voltage switch failure
	binder: fix UAF of ref->proc caused by race condition
	binder: fix alloc->vma_vm_mm null-ptr dereference
	cifs: fix small mempool leak in SMB2_negotiate()
	KVM: VMX: Heed the 'msr' argument in msr_write_intercepted()
	drm/i915/reg: Fix spelling mistake "Unsupport" -> "Unsupported"
	clk: core: Honor CLK_OPS_PARENT_ENABLE for clk gate ops
	Revert "clk: core: Honor CLK_OPS_PARENT_ENABLE for clk gate ops"
	clk: core: Fix runtime PM sequence in clk_core_unprepare()
	Input: rk805-pwrkey - fix module autoloading
	clk: bcm: rpi: Fix error handling of raspberrypi_fw_get_rate
	clk: bcm: rpi: Use correct order for the parameters of devm_kcalloc()
	clk: bcm: rpi: Prevent out-of-bounds access
	clk: bcm: rpi: Add missing newline
	hwmon: (gpio-fan) Fix array out of bounds access
	gpio: pca953x: Add mutex_lock for regcache sync in PM
	KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES
	xen/grants: prevent integer overflow in gnttab_dma_alloc_pages()
	mm: pagewalk: Fix race between unmap and page walker
	xen-blkback: Advertise feature-persistent as user requested
	xen-blkfront: Advertise feature-persistent as user requested
	xen-blkfront: Cache feature_persistent value before advertisement
	thunderbolt: Use the actual buffer in tb_async_error()
	usb: dwc3: pci: Add support for Intel Raptor Lake
	media: mceusb: Use new usb_control_msg_*() routines
	xhci: Add grace period after xHC start to prevent premature runtime suspend.
	USB: serial: cp210x: add Decagon UCA device id
	USB: serial: option: add support for OPPO R11 diag port
	USB: serial: option: add Quectel EM060K modem
	USB: serial: option: add support for Cinterion MV32-WA/WB RmNet mode
	usb: typec: altmodes/displayport: correct pin assignment for UFP receptacles
	usb: typec: intel_pmc_mux: Add new ACPI ID for Meteor Lake IOM device
	usb: typec: tcpm: Return ENOTSUPP for power supply prop writes
	usb: dwc2: fix wrong order of phy_power_on and phy_init
	usb: cdns3: fix issue with rearming ISO OUT endpoint
	usb: cdns3: fix incorrect handling TRB_SMM flag for ISOC transfer
	USB: cdc-acm: Add Icom PMR F3400 support (0c26:0020)
	usb-storage: Add ignore-residue quirk for NXP PN7462AU
	s390/hugetlb: fix prepare_hugepage_range() check for 2 GB hugepages
	s390: fix nospec table alignments
	USB: core: Prevent nested device-reset calls
	usb: xhci-mtk: relax TT periodic bandwidth allocation
	usb: xhci-mtk: fix bandwidth release issue
	usb: gadget: mass_storage: Fix cdrom data transfers on MAC-OS
	driver core: Don't probe devices after bus_type.match() probe deferral
	wifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnected
	wifi: mac80211: Fix UAF in ieee80211_scan_rx()
	ip: fix triggering of 'icmp redirect'
	net: Use u64_stats_fetch_begin_irq() for stats fetch.
	net: mac802154: Fix a condition in the receive path
	ALSA: hda/realtek: Add speaker AMP init for Samsung laptops with ALC298
	ALSA: seq: oss: Fix data-race for max_midi_devs access
	ALSA: seq: Fix data-race at module auto-loading
	drm/i915/glk: ECS Liva Q2 needs GLK HDMI port timing quirk
	drm/i915: Skip wm/ddb readout for disabled pipes
	tty: n_gsm: add sanity check for gsm->receive in gsm_receive_buf()
	kbuild: Unify options for BTF generation for vmlinux and modules
	kbuild: Add skip_encoding_btf_enum64 option to pahole
	usb: dwc3: fix PHY disable sequence
	usb: dwc3: qcom: fix use-after-free on runtime-PM wakeup
	usb: dwc3: disable USB core PHY management
	USB: serial: ch341: fix lost character on LCR updates
	USB: serial: ch341: fix disabled rx timer on older devices
	Linux 5.15.66

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifa86fa6b6b75a7f51ca178f139355c7688eabf2a
2022-09-23 14:07:17 +02:00
Greg Kroah-Hartman
049f90ecb7 Merge 5.15.65 into android13-5.15-lts
Changes in 5.15.65
	mm: Force TLB flush for PFNMAP mappings before unlink_file_vma()
	drm/bridge: Add stubs for devm_drm_of_get_bridge when OF is disabled
	ACPI: thermal: drop an always true check
	drm/vc4: hdmi: Rework power up
	drm/vc4: hdmi: Depends on CONFIG_PM
	firmware: tegra: bpmp: Do only aligned access to IPC memory area
	crypto: lib - remove unneeded selection of XOR_BLOCKS
	Drivers: hv: balloon: Support status report for larger page sizes
	mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte
	arm64: errata: Add Cortex-A510 to the repeat tlbi list
	io_uring: correct fill events helpers types
	io_uring: clean cqe filling functions
	io_uring: refactor poll update
	io_uring: move common poll bits
	io_uring: kill poll linking optimisation
	io_uring: inline io_poll_complete
	io_uring: poll rework
	io_uring: Remove unused function req_ref_put
	io_uring: remove poll entry from list when canceling all
	io_uring: bump poll refs to full 31-bits
	io_uring: fail links when poll fails
	io_uring: fix wrong arm_poll error handling
	io_uring: fix UAF due to missing POLLFREE handling
	kbuild: Fix include path in scripts/Makefile.modpost
	Bluetooth: L2CAP: Fix build errors in some archs
	Revert "PCI/portdrv: Don't disable AER reporting in get_port_device_capability()"
	HID: steam: Prevent NULL pointer dereference in steam_{recv,send}_report
	udmabuf: Set the DMA mask for the udmabuf device (v2)
	media: pvrusb2: fix memory leak in pvr_probe
	HID: hidraw: fix memory leak in hidraw_release()
	net: fix refcount bug in sk_psock_get (2)
	fbdev: fb_pm2fb: Avoid potential divide by zero error
	ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
	bpf: Don't redirect packets with invalid pkt_len
	mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse
	ALSA: usb-audio: Add quirk for LH Labs Geek Out HD Audio 1V5
	HID: add Lenovo Yoga C630 battery quirk
	HID: AMD_SFH: Add a DMI quirk entry for Chromebooks
	HID: asus: ROG NKey: Ignore portion of 0x5a report
	HID: thrustmaster: Add sparco wheel and fix array length
	drm/i915/gt: Skip TLB invalidations once wedged
	mmc: mtk-sd: Clear interrupts when cqe off/disable
	mmc: sdhci-of-dwcmshc: add reset call back for rockchip Socs
	mmc: sdhci-of-dwcmshc: rename rk3568 to rk35xx
	mmc: sdhci-of-dwcmshc: Re-enable support for the BlueField-3 SoC
	btrfs: remove root argument from btrfs_unlink_inode()
	btrfs: remove no longer needed logic for replaying directory deletes
	btrfs: add and use helper for unlinking inode during log replay
	btrfs: fix warning during log replay when bumping inode link count
	fs/ntfs3: Fix work with fragmented xattr
	ASoC: sh: rz-ssi: Improve error handling in rz_ssi_probe() error path
	drm/amd/display: Avoid MPC infinite loop
	drm/amd/display: Fix HDMI VSIF V3 incorrect issue
	drm/amd/display: For stereo keep "FLIP_ANY_FRAME"
	drm/amd/display: clear optc underflow before turn off odm clock
	ksmbd: return STATUS_BAD_NETWORK_NAME error status if share is not configured
	neigh: fix possible DoS due to net iface start/stop loop
	s390/hypfs: avoid error message under KVM
	ksmbd: don't remove dos attribute xattr on O_TRUNC open
	drm/amd/pm: add missing ->fini_microcode interface for Sienna Cichlid
	drm/amd/display: Fix pixel clock programming
	drm/amdgpu: Increase tlb flush timeout for sriov
	drm/amd/display: avoid doing vm_init multiple time
	netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y
	testing: selftests: nft_flowtable.sh: use random netns names
	btrfs: move lockdep class helpers to locking.c
	btrfs: fix lockdep splat with reloc root extent buffers
	btrfs: tree-checker: check for overlapping extent items
	kprobes: don't call disarm_kprobe() for disabled kprobes
	btrfs: fix space cache corruption and potential double allocations
	android: binder: fix lockdep check on clearing vma
	net/af_packet: check len when min_header_len equals to 0
	net: neigh: don't call kfree_skb() under spin_lock_irqsave()
	Linux 5.15.65

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I20608962fa7d031aa14ade6388b826c5b51c1693
2022-09-23 13:30:15 +02:00
keystone-kernel-automerger
b76e115e36 Merge remote-tracking branch into HEAD
* keystone/mirror-android13-5.15-2022-08:
  ANDROID: abi_gki_aarch64_qcom: Add iio symbol list for qcom
  ANDROID: GKI: Update Symbol List for Vendor
  ANDROID: kernel/sched: rebuild_sched_domains export
  ANDROID: gki_defconfig: Enable CONFIG_HIBERNATION flag
  UPSTREAM: kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22}

Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com>
Change-Id: I16cb1a5bc1e455e01114d570f9ab30ae97d8ab1d
2022-09-23 06:20:33 +00:00
Stephen Dickey
d3dc8c7590 ANDROID: kernel/sched: rebuild_sched_domains export
Vendor module needs to rebuild sched domains at boot, in the
event that cpufreq initializes the energy model too late.

Bug: 242898038
Change-Id: Ifaf1223366ac81c3f3c382dd0f61110fce9c1b20
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2022-09-22 22:48:56 +00:00
Yipeng Zou
3c90af5a77 tracing: hold caller_addr to hardirq_{enable,disable}_ip
[ Upstream commit 54c3931957f6a6194d5972eccc36d052964b2abe ]

Currently, The arguments passing to lockdep_hardirqs_{on,off} was fixed
in CALLER_ADDR0.
The function trace_hardirqs_on_caller should have been intended to use
caller_addr to represent the address that caller wants to be traced.

For example, lockdep log in riscv showing the last {enabled,disabled} at
__trace_hardirqs_{on,off} all the time(if called by):
[   57.853175] hardirqs last  enabled at (2519): __trace_hardirqs_on+0xc/0x14
[   57.853848] hardirqs last disabled at (2520): __trace_hardirqs_off+0xc/0x14

After use trace_hardirqs_xx_caller, we can get more effective information:
[   53.781428] hardirqs last  enabled at (2595): restore_all+0xe/0x66
[   53.782185] hardirqs last disabled at (2596): ret_from_exception+0xa/0x10

Link: https://lkml.kernel.org/r/20220901104515.135162-2-zouyipeng@huawei.com

Cc: stable@vger.kernel.org
Fixes: c3bc8fd637 ("tracing: Centralize preemptirq tracepoints and unify their usage")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-20 12:39:43 +02:00
Nick Desaulniers
f9571a9699 lockdep: Fix -Wunused-parameter for _THIS_IP_
[ Upstream commit 8b023accc8df70e72f7704d29fead7ca914d6837 ]

While looking into a bug related to the compiler's handling of addresses
of labels, I noticed some uses of _THIS_IP_ seemed unused in lockdep.
Drive by cleanup.

-Wunused-parameter:
kernel/locking/lockdep.c:1383:22: warning: unused parameter 'ip'
kernel/locking/lockdep.c:4246:48: warning: unused parameter 'ip'
kernel/locking/lockdep.c:4844:19: warning: unused parameter 'ip'

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20220314221909.2027027-1-ndesaulniers@google.com
Stable-dep-of: 54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-20 12:39:42 +02:00
Chao Gao
4f8d658848 swiotlb: avoid potential left shift overflow
[ Upstream commit 3f0461613ebcdc8c4073e235053d06d5aa58750f ]

The second operand passed to slot_addr() is declared as int or unsigned int
in all call sites. The left-shift to get the offset of a slot can overflow
if swiotlb size is larger than 4G.

Convert the macro to an inline function and declare the second argument as
phys_addr_t to avoid the potential overflow.

Fixes: 26a7e09478 ("swiotlb: refactor swiotlb_tbl_map_single")
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15 11:30:07 +02:00
Yishai Hadas
819110054b IB/core: Fix a nested dead lock as part of ODP flow
[ Upstream commit 85eaeb5058f0f04dffb124c97c86b4f18db0b833 ]

Fix a nested dead lock as part of ODP flow by using mmput_async().

From the below call trace [1] can see that calling mmput() once we have
the umem_odp->umem_mutex locked as required by
ib_umem_odp_map_dma_and_lock() might trigger in the same task the
exit_mmap()->__mmu_notifier_release()->mlx5_ib_invalidate_range() which
may dead lock when trying to lock the same mutex.

Moving to use mmput_async() will solve the problem as the above
exit_mmap() flow will be called in other task and will be executed once
the lock will be available.

[1]
[64843.077665] task:kworker/u133:2  state:D stack:    0 pid:80906 ppid:
2 flags:0x00004000
[64843.077672] Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
[64843.077719] Call Trace:
[64843.077722]  <TASK>
[64843.077724]  __schedule+0x23d/0x590
[64843.077729]  schedule+0x4e/0xb0
[64843.077735]  schedule_preempt_disabled+0xe/0x10
[64843.077740]  __mutex_lock.constprop.0+0x263/0x490
[64843.077747]  __mutex_lock_slowpath+0x13/0x20
[64843.077752]  mutex_lock+0x34/0x40
[64843.077758]  mlx5_ib_invalidate_range+0x48/0x270 [mlx5_ib]
[64843.077808]  __mmu_notifier_release+0x1a4/0x200
[64843.077816]  exit_mmap+0x1bc/0x200
[64843.077822]  ? walk_page_range+0x9c/0x120
[64843.077828]  ? __cond_resched+0x1a/0x50
[64843.077833]  ? mutex_lock+0x13/0x40
[64843.077839]  ? uprobe_clear_state+0xac/0x120
[64843.077860]  mmput+0x5f/0x140
[64843.077867]  ib_umem_odp_map_dma_and_lock+0x21b/0x580 [ib_core]
[64843.077931]  pagefault_real_mr+0x9a/0x140 [mlx5_ib]
[64843.077962]  pagefault_mr+0xb4/0x550 [mlx5_ib]
[64843.077992]  pagefault_single_data_segment.constprop.0+0x2ac/0x560
[mlx5_ib]
[64843.078022]  mlx5_ib_eqe_pf_action+0x528/0x780 [mlx5_ib]
[64843.078051]  process_one_work+0x22b/0x3d0
[64843.078059]  worker_thread+0x53/0x410
[64843.078065]  ? process_one_work+0x3d0/0x3d0
[64843.078073]  kthread+0x12a/0x150
[64843.078079]  ? set_kthread_struct+0x50/0x50
[64843.078085]  ret_from_fork+0x22/0x30
[64843.078093]  </TASK>

Fixes: 36f30e486d ("IB/core: Improve ODP to use hmm_range_fault()")
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Link: https://lore.kernel.org/r/74d93541ea533ef7daec6f126deb1072500aeb16.1661251841.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15 11:30:06 +02:00
Tejun Heo
3bf4bf5406 cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
[ Upstream commit 4f7e7236435ca0abe005c674ebd6892c6e83aeb3 ]

Bringing up a CPU may involve creating and destroying tasks which requires
read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside
cpus_read_lock(). However, cpuset's ->attach(), which may be called with
thredagroup_rwsem write-locked, also wants to disable CPU hotplug and
acquires cpus_read_lock(), leading to a deadlock.

Fix it by guaranteeing that ->attach() is always called with CPU hotplug
disabled and removing cpus_read_lock() call from cpuset_attach().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-and-tested-by: Imran Khan <imran.f.khan@oracle.com>
Reported-and-tested-by: Xuewen Yan <xuewen.yan@unisoc.com>
Fixes: 05c7b7a92cc8 ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15 11:30:03 +02:00
Tejun Heo
509e3456d3 cgroup: Elide write-locking threadgroup_rwsem when updating csses on an empty subtree
[ Upstream commit 671c11f0619e5ccb380bcf0f062f69ba95fc974a ]

cgroup_update_dfl_csses() write-lock the threadgroup_rwsem as updating the
csses can trigger process migrations. However, if the subtree doesn't
contain any tasks, there aren't gonna be any cgroup migrations. This
condition can be trivially detected by testing whether
mgctx.preloaded_src_csets is empty. Elide write-locking threadgroup_rwsem if
the subtree is empty.

After this optimization, the usage pattern of creating a cgroup, enabling
the necessary controllers, and then seeding it with CLONE_INTO_CGROUP and
then removing the cgroup after it becomes empty doesn't need to write-lock
threadgroup_rwsem at all.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-09-15 11:30:03 +02:00
Greg Kroah-Hartman
26e9a1ded8 sched/debug: fix dentry leak in update_sched_domain_debugfs
commit c2e406596571659451f4b95e37ddfd5a8ef1d0dc upstream.

Kuyo reports that the pattern of using debugfs_remove(debugfs_lookup())
leaks a dentry and with a hotplug stress test, the machine eventually
runs out of memory.

Fix this up by using the newly created debugfs_lookup_and_remove() call
instead which properly handles the dentry reference counting logic.

Cc: Major Chen <major.chen@samsung.com>
Cc: stable <stable@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Reported-by: Kuyo Chang <kuyo.chang@mediatek.com>
Tested-by: Kuyo Chang <kuyo.chang@mediatek.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220902123107.109274-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-15 11:30:02 +02:00