msm-5.15

Author	SHA1	Message	Date
Andrey Ryabinin	95c7ba0035	sched/cpuacct: Fix user/system in shown cpuacct.usage* commit dd02d4234c9a2214a81c57a16484304a1a51872a upstream. cpuacct has 2 different ways of accounting and showing user and system times. The first one uses cpuacct_account_field() to account times and cpuacct.stat file to expose them. And this one seems to work ok. The second one is uses cpuacct_charge() function for accounting and set of cpuacct.usage* files to show times. Despite some attempts to fix it in the past it still doesn't work. Sometimes while running KVM guest the cpuacct_charge() accounts most of the guest time as system time. This doesn't match with user&system times shown in cpuacct.stat or proc/<pid>/stat. Demonstration: # git clone https://github.com/aryabinin/kvmsample # make # mkdir /sys/fs/cgroup/cpuacct/test # echo $$ > /sys/fs/cgroup/cpuacct/test/tasks # ./kvmsample & # for i in {1..5}; do cat /sys/fs/cgroup/cpuacct/test/cpuacct.usage_sys; sleep 1; done 1976535645 2979839428 3979832704 4983603153 5983604157 Use cpustats accounted in cpuacct_account_field() as the source of user/sys times for cpuacct.usage* files. Make cpuacct_charge() to account only summary execution time. Fixes: `d740037fac` ("sched/cpuacct: Split usage accounting into user_usage and sys_usage") Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211115164607.23784-3-arbn@yandex-team.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-27 11:05:09 +01:00
Andrey Ryabinin	b48450843f	cputime, cpuacct: Include guest time in user time in cpuacct.stat commit 9731698ecb9c851f353ce2496292ff9fcea39dff upstream. cpuacct.stat in no-root cgroups shows user time without guest time included int it. This doesn't match with user time shown in root cpuacct.stat and /proc/<pid>/stat. This also affects cgroup2's cpu.stat in the same way. Make account_guest_time() to add user time to cgroup's cpustat to fix this. Fixes: `ef12fefabf` ("cpuacct: add per-cgroup utime/stime statistics") Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211115164607.23784-1-arbn@yandex-team.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-27 11:05:09 +01:00
Paul Moore	d978295bb5	audit: ensure userspace is penalized the same as the kernel when under pressure [ Upstream commit 8f110f530635af44fff1f4ee100ecef0bac62510 ] Due to the audit control mutex necessary for serializing audit userspace messages we haven't been able to block/penalize userspace processes that attempt to send audit records while the system is under audit pressure. The result is that privileged userspace applications have a priority boost with respect to audit as they are not bound by the same audit queue throttling as the other tasks on the system. This patch attempts to restore some balance to the system when under audit pressure by blocking these privileged userspace tasks after they have finished their audit processing, and dropped the audit control mutex, but before they return to userspace. Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com> Tested-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:43 +01:00
Wander Lairson Costa	bcf404b305	rcutorture: Avoid soft lockup during cpu stall [ Upstream commit 5ff7c9f9d7e3e0f6db5b81945fa11b69d62f433a ] If we use the module stall_cpu option, we may get a soft lockup warning in case we also don't pass the stall_cpu_block option. Introduce the stall_no_softlockup option to avoid a soft lockup on cpu stall even if we don't use the stall_cpu_block option. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:37 +01:00
Brian Chen	d168123f13	psi: Fix PSI_MEM_FULL state when tasks are in memstall and doing reclaim [ Upstream commit cb0e52b7748737b2cf6481fdd9b920ce7e1ebbdf ] We've noticed cases where tasks in a cgroup are stalled on memory but there is little memory FULL pressure since tasks stay on the runqueue in reclaim. A simple example involves a single threaded program that keeps leaking and touching large amounts of memory. It runs in a cgroup with swap enabled, memory.high set at 10M and cpu.max ratio set at 5%. Though there is significant CPU pressure and memory SOME, there is barely any memory FULL since the task enters reclaim and stays on the runqueue. However, this memory-bound task is effectively stalled on memory and we expect memory FULL to match memory SOME in this scenario. The code is confused about memstall && running, thinking there is a stalled task and a productive task when there's only one task: a reclaimer that's counted as both. To fix this, we redefine the condition for PSI_MEM_FULL to check that all running tasks are in an active memstall instead of checking that there are no running tasks. case PSI_MEM_FULL: - return unlikely(tasks[NR_MEMSTALL] && !tasks[NR_RUNNING]); + return unlikely(tasks[NR_MEMSTALL] && + tasks[NR_RUNNING] == tasks[NR_MEMSTALL_RUNNING]); This will capture reclaimers. It will also capture tasks that called psi_memstall_enter() and are about to sleep, but this should be negligible noise. Signed-off-by: Brian Chen <brianchen118@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20211110213312.310243-1-brianchen118@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:27 +01:00
Waiman Long	cf9b8de201	clocksource: Avoid accidental unstable marking of clocksources [ Upstream commit c86ff8c55b8ae68837b2fa59dc0c203907e9a15f ] Since commit `db3a34e174` ("clocksource: Retry clock read if long delays detected") and commit `2e27e793e2` ("clocksource: Reduce clocksource-skew threshold"), it is found that tsc clocksource fallback to hpet can sometimes happen on both Intel and AMD systems especially when they are running stressful benchmarking workloads. Of the 23 systems tested with a v5.14 kernel, 10 of them have switched to hpet clock source during the test run. The result of falling back to hpet is a drastic reduction of performance when running benchmarks. For example, the fio performance tests can drop up to 70% whereas the iperf3 performance can drop up to 80%. 4 hpet fallbacks happened during bootup. They were: [ 8.749399] clocksource: timekeeping watchdog on CPU13: hpet read-back delay of 263750ns, attempt 4, marking unstable [ 12.044610] clocksource: timekeeping watchdog on CPU19: hpet read-back delay of 186166ns, attempt 4, marking unstable [ 17.336941] clocksource: timekeeping watchdog on CPU28: hpet read-back delay of 182291ns, attempt 4, marking unstable [ 17.518565] clocksource: timekeeping watchdog on CPU34: hpet read-back delay of 252196ns, attempt 4, marking unstable Other fallbacks happen when the systems were running stressful benchmarks. For example: [ 2685.867873] clocksource: timekeeping watchdog on CPU117: hpet read-back delay of 57269ns, attempt 4, marking unstable [46215.471228] clocksource: timekeeping watchdog on CPU8: hpet read-back delay of 61460ns, attempt 4, marking unstable Commit `2e27e793e2` ("clocksource: Reduce clocksource-skew threshold"), changed the skew margin from 100us to 50us. I think this is too small and can easily be exceeded when running some stressful workloads on a thermally stressed system. So it is switched back to 100us. Even a maximum skew margin of 100us may be too small in for some systems when booting up especially if those systems are under thermal stress. To eliminate the case that the large skew is due to the system being too busy slowing down the reading of both the watchdog and the clocksource, an extra consecutive read of watchdog clock is being done to check this. The consecutive watchdog read delay is compared against WATCHDOG_MAX_SKEW/2. If the delay exceeds the limit, we assume that the system is just too busy. A warning will be printed to the console and the clock skew check is skipped for this round. Fixes: `db3a34e174` ("clocksource: Retry clock read if long delays detected") Fixes: `2e27e793e2` ("clocksource: Reduce clocksource-skew threshold") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:04:08 +01:00
Kris Van Hees	2fbd466952	bpf: Fix verifier support for validation of async callbacks [ Upstream commit a5bebc4f00dee47113eed48098c68e88b5ba70e8 ] Commit `bfc6bb74e4` ("bpf: Implement verifier support for validation of async callbacks.") added support for BPF_FUNC_timer_set_callback to the __check_func_call() function. The test in __check_func_call() is flaweed because it can mis-interpret a regular BPF-to-BPF pseudo-call as a BPF_FUNC_timer_set_callback callback call. Consider the conditional in the code: if (insn->code == (BPF_JMP \| BPF_CALL) && insn->imm == BPF_FUNC_timer_set_callback) { The BPF_FUNC_timer_set_callback has value 170. This means that if you have a BPF program that contains a pseudo-call with an instruction delta of 170, this conditional will be found to be true by the verifier, and it will interpret the pseudo-call as a callback. This leads to a mess with the verification of the program because it makes the wrong assumptions about the nature of this call. Solution: include an explicit check to ensure that insn->src_reg == 0. This ensures that calls cannot be mis-interpreted as an async callback call. Fixes: `bfc6bb74e4` ("bpf: Implement verifier support for validation of async callbacks.") Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220105210150.GH1559@oracle.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:03:51 +01:00
Daniel Borkmann	a65df848db	bpf: Don't promote bogus looking registers after null check. [ Upstream commit e60b0d12a95dcf16a63225cead4541567f5cb517 ] If we ever get to a point again where we convert a bogus looking <ptr>_or_null typed register containing a non-zero fixed or variable offset, then lets not reset these bounds to zero since they are not and also don't promote the register to a <ptr> type, but instead leave it as <ptr>_or_null. Converting to a unknown register could be an avenue as well, but then if we run into this case it would allow to leak a kernel pointer this way. Fixes: `f1174f77b5` ("bpf/verifier: rework value tracking") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:03:51 +01:00
Frederic Weisbecker	ef93cc02ca	rcu/exp: Mark current CPU as exp-QS in IPI loop second pass [ Upstream commit 81f6d49cce2d2fe507e3fddcc4a6db021d9c2e7b ] Expedited RCU grace periods invoke sync_rcu_exp_select_node_cpus(), which takes two passes over the leaf rcu_node structure's CPUs. The first pass gathers up the current CPU and CPUs that are in dynticks idle mode. The workqueue will report a quiescent state on their behalf later. The second pass sends IPIs to the rest of the CPUs, but excludes the current CPU, incorrectly assuming it has been included in the first pass's list of CPUs. Unfortunately the current CPU may have changed between the first and second pass, due to the fact that the various rcu_node structures' ->lock fields have been dropped, thus momentarily enabling preemption. This means that if the second pass's CPU was not on the first pass's list, it will be ignored completely. There will be no IPI sent to it, and there will be no reporting of quiescent states on its behalf. Unfortunately, the expedited grace period will nevertheless be waiting for that CPU to report a quiescent state, but with that CPU having no reason to believe that such a report is needed. The result will be an expedited grace period stall. Fix this by no longer excluding the current CPU from consideration during the second pass. Fixes: `b9ad4d6ed1` ("rcu: Avoid self-IPI in sync_rcu_exp_select_node_cpus()") Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:03:32 +01:00
Li Hua	378723bd01	sched/rt: Try to restart rt period timer when rt runtime exceeded [ Upstream commit 9b58e976b3b391c0cf02e038d53dd0478ed3013c ] When rt_runtime is modified from -1 to a valid control value, it may cause the task to be throttled all the time. Operations like the following will trigger the bug. E.g: 1. echo -1 > /proc/sys/kernel/sched_rt_runtime_us 2. Run a FIFO task named A that executes while(1) 3. echo 950000 > /proc/sys/kernel/sched_rt_runtime_us When rt_runtime is -1, The rt period timer will not be activated when task A enqueued. And then the task will be throttled after setting rt_runtime to 950,000. The task will always be throttled because the rt period timer is not activated. Fixes: `d0b27fa778` ("sched: rt-group: synchonised bandwidth period") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Li Hua <hucool.lihua@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211203033618.11895-1-hucool.lihua@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:03:30 +01:00
Kajol Jain	33fcd00e0a	bpf: Remove config check to enable bpf support for branch records [ Upstream commit db52f57211b4e45f0ebb274e2c877b211dc18591 ] Branch data available to BPF programs can be very useful to get stack traces out of userspace application. Commit `fff7b64355` ("bpf: Add bpf_read_branch_records() helper") added BPF support to capture branch records in x86. Enable this feature also for other architectures as well by removing checks specific to x86. If an architecture doesn't support branch records, bpf_read_branch_records() still has appropriate checks and it will return an -EINVAL in that scenario. Based on UAPI helper doc in include/uapi/linux/bpf.h, unsupported architectures should return -ENOENT in such case. Hence, update the appropriate check to return -ENOENT instead. Selftest 'perf_branches' result on power9 machine which has the branch stacks support: - Before this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:FAIL #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:FAIL Summary: 0/1 PASSED, 0 SKIPPED, 1 FAILED - After this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:OK #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Selftest 'perf_branches' result on power9 machine which doesn't have branch stack report: - After this patch: [command]# ./test_progs -t perf_branches #88/1 perf_branches/perf_branches_hw:SKIP #88/2 perf_branches/perf_branches_no_hw:OK #88 perf_branches:OK Summary: 1/1 PASSED, 1 SKIPPED, 0 FAILED Fixes: `fff7b64355` ("bpf: Add bpf_read_branch_records() helper") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20211206073315.77432-1-kjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:03:28 +01:00
Hou Tao	832d478ccd	bpf: Disallow BPF_LOG_KERNEL log level for bpf(BPF_BTF_LOAD) [ Upstream commit 866de407444398bc8140ea70de1dba5f91cc34ac ] BPF_LOG_KERNEL is only used internally, so disallow bpf_btf_load() to set log level as BPF_LOG_KERNEL. The same checking has already been done in bpf_check(), so factor out a helper to check the validity of log attributes and use it in both places. Fixes: `8580ac9404` ("bpf: Process in-kernel BTF") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20211203053001.740945-1-houtao1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:03:27 +01:00
Alexei Starovoitov	2571173d3e	bpf: Adjust BTF log size limit. [ Upstream commit c5a2d43e998a821701029f23e25b62f9188e93ff ] Make BTF log size limit to be the same as the verifier log size limit. Otherwise tools that progressively increase log size and use the same log for BTF loading and program loading will be hitting hard to debug EINVAL. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211201181040.23337-7-alexei.starovoitov@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:03:27 +01:00
Vincent Donnefort	d3c4b3c801	sched/fair: Fix per-CPU kthread and wakee stacking for asym CPU capacity [ Upstream commit 014ba44e8184e1acf93e0cbb7089ee847802f8f0 ] select_idle_sibling() has a special case for tasks woken up by a per-CPU kthread where the selected CPU is the previous one. For asymmetric CPU capacity systems, the assumption was that the wakee couldn't have a bigger utilization during task placement than it used to have during the last activation. That was not considering uclamp.min which can completely change between two task activations and as a consequence mandates the fitness criterion asym_fits_capacity(), even for the exit path described above. Fixes: `b4c9c9f156` ("sched/fair: Prefer prev cpu in asymmetric wakeup path") Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20211129173115.4006346-1-vincent.donnefort@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:03:27 +01:00
Vincent Donnefort	00c1051953	sched/fair: Fix detection of per-CPU kthreads waking a task [ Upstream commit 8b4e74ccb582797f6f0b0a50372ebd9fd2372a27 ] select_idle_sibling() has a special case for tasks woken up by a per-CPU kthread, where the selected CPU is the previous one. However, the current condition for this exit path is incomplete. A task can wake up from an interrupt context (e.g. hrtimer), while a per-CPU kthread is running. A such scenario would spuriously trigger the special case described above. Also, a recent change made the idle task like a regular per-CPU kthread, hence making that situation more likely to happen (is_per_cpu_kthread(swapper) being true now). Checking for task context makes sure select_idle_sibling() will not interpret a wake up from any other context as a wake up by a per-CPU kthread. Fixes: `52262ee567` ("sched/fair: Allow a per-CPU kthread waking a task to stack on the same CPU, to fix XFS performance regression") Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lore.kernel.org/r/20211201143450.479472-1-vincent.donnefort@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2022-01-27 11:03:27 +01:00
Baoquan He	bcf64fb327	dma/pool: create dma atomic pool only if dma zone has managed pages commit a674e48c5443d12a8a43c3ac42367aa39505d506 upstream. Currently three dma atomic pools are initialized as long as the relevant kernel codes are built in. While in kdump kernel of x86_64, this is not right when trying to create atomic_pool_dma, because there's no managed pages in DMA zone. In the case, DMA zone only has low 1M memory presented and locked down by memblock allocator. So no pages are added into buddy of DMA zone. Please check commit `f1d4d47c58` ("x86/setup: Always reserve the first 1M of RAM"). Then in kdump kernel of x86_64, it always prints below failure message: DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL\|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-0.rc5.20210611git929d931f2b40.42.fc35.x86_64 #1 Hardware name: Dell Inc. PowerEdge R910/0P658H, BIOS 2.12.0 06/04/2018 Call Trace: dump_stack+0x7f/0xa1 warn_alloc.cold+0x72/0xd6 __alloc_pages_slowpath.constprop.0+0xf29/0xf50 __alloc_pages+0x24d/0x2c0 alloc_page_interleave+0x13/0xb0 atomic_pool_expand+0x118/0x210 __dma_atomic_pool_init+0x45/0x93 dma_atomic_pool_init+0xdb/0x176 do_one_initcall+0x67/0x320 kernel_init_freeable+0x290/0x2dc kernel_init+0xa/0x111 ret_from_fork+0x22/0x30 Mem-Info: ...... DMA: failed to allocate 128 KiB GFP_KERNEL\|GFP_DMA pool for atomic allocation DMA: preallocated 128 KiB GFP_KERNEL\|GFP_DMA32 pool for atomic allocations Here, let's check if DMA zone has managed pages, then create atomic_pool_dma if yes. Otherwise just skip it. Link: https://lkml.kernel.org/r/20211223094435.248523-3-bhe@redhat.com Fixes: `6f599d8423` ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: John Donnelly <john.p.donnelly@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-27 11:03:00 +01:00
Stephen Dickey	dd78c3263a	ANDROID: remove extra !SMP inline for __migrate_task __migrate_task will not be present in non-smp builds, no need to provide an inline function for that case. Bug: 213581038 Fixes: `50f5345c87` ("ANDROID: __migrate_task header") Change-Id: Ie8b8e07e4beaad7df169ac52169bd1799e610686 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>	2022-01-26 17:11:44 +00:00
keystone-kernel-automerger	c4b9026e94	Merge remote-tracking branch into HEAD * keystone/mirror-android13-5.15: ANDROID: mm, oom: add vendor hook to prevent oom panic ANDROID: rproc: Add vendor hook for recovery ANDROID: Re-apply vendor hooks for rt_mutex information of blocked tasks ANDROID: Re-apply vendor hooks for information of blocked tasks Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: I5b50323b319843715c91d17c48abb659c31e0b2b	2022-01-26 06:16:05 +00:00
keystone-kernel-automerger	3d84dbc351	Merge remote-tracking branch into HEAD * keystone/mirror-android13-5.15: (39 commits) UPSTREAM: arm64: cpufeature: Export this_cpu_has_cap helper UPSTREAM: arm64: errata: Enable TRBE workaround for write to out-of-range address UPSTREAM: arm64: errata: Enable workaround for TRBE overwrite in FILL mode UPSTREAM: arm64: errata: Add detection for TRBE write to out-of-range UPSTREAM: arm64: errata: Add workaround for TSB flush failures UPSTREAM: arm64: errata: Add detection for TRBE overwrite in FILL mode UPSTREAM: arm64: Add Neoverse-N2, Cortex-A710 CPU part definition UPSTREAM: coresight: trbe: Work around write to out of range UPSTREAM: coresight: trbe: Make sure we have enough space UPSTREAM: coresight: trbe: Add a helper to determine the minimum buffer size UPSTREAM: coresight: trbe: Workaround TRBE errata overwrite in FILL mode UPSTREAM: coresight: trbe: Add infrastructure for Errata handling UPSTREAM: coresight: trbe: Allow driver to choose a different alignment UPSTREAM: coresight: trbe: Decouple buffer base from the hardware base UPSTREAM: coresight: trbe: Add a helper to pad a given buffer area UPSTREAM: coresight: trbe: Add a helper to calculate the trace generated UPSTREAM: coresight: etm4x: Add ETM PID for Kryo-5XX UPSTREAM: coresight: trbe: Prohibit trace before disabling TRBE UPSTREAM: coresight: trbe: End the AUX handle on truncation UPSTREAM: coresight: trbe: Do not truncate buffer on IRQ ... Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com> Change-Id: Id90dbf02a2dc2e140e42259aec63402c1a234df1	2022-01-25 06:15:15 +00:00
Sangmoon Kim	b35a3d1bad	ANDROID: Re-apply vendor hooks for rt_mutex information of blocked tasks This reverts commit `bf2290a48a` (Revert "ANDROID: vendor_hooks: set debugging data when rt_mutex is working") The original patch has been reverted to resolve merge issues. This patch adds again the vendor hooks for the original purpose. Bug: 216016261 Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com> Change-Id: I00162d88e2a446e9ece4804def098fcdc63fceb9 (cherry picked from commit d497887b00ac3e5e380123cac4a303d009b570ee)	2022-01-25 12:05:42 +09:00
Sangmoon Kim	3aabda4d52	ANDROID: Re-apply vendor hooks for information of blocked tasks This reverts commit `31c9ccb138` (Revert "ANDROID: vendor_hooks: add waiting information for blocked tasks") And also revert portions of 396a501b1743 (Revert "ANDROID: rwsem: Add vendor hook to the rw-semaphore") The original patch has been reverted to resolve merge issues. This patch adds again the vendor hooks for the original purpose. Bug: 216016261 Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com> Change-Id: I04ed7b055eee40f7975bd5d74fb73dd080cd76bf (cherry picked from commit c23da05eac0d35441695caff0b7d0220f92ca8a0)	2022-01-25 12:04:38 +09:00
Stephen Dickey	1050e6e021	ANDROID: sched: core: hook for get_nohz_timer_target Allow module to control behavior of get_nohz_timer_target. Bug: 205164003 Change-Id: I38cb201ebf06db7bbce0d6cb68dbbe3729355be8 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>	2022-01-24 19:25:26 +00:00
Stephen Dickey	a243208877	ANDROID: kernel: sched: tracehook for is_cpu_allowed To support the replacement of pause, is_cpu_allowed is the best place to hook into the code to restrict CPUs for a module based implementation. This restricts select_fallback_rq, select_task_rq, and __migate_task, to ensure the cpu is allowed. Include a hook in is_cpu_allowed to allow the module to control which cpu is allowed during a migration event. Bug: 205164003 Change-Id: I665e4d39318079bdb99bd248969ecb9eb528f9df Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>	2022-01-24 19:25:13 +00:00
Stephen Dickey	50f5345c87	ANDROID: __migrate_task header __migrate_task is used by modules to move tasks between cpus. This function is needed by modules and is currently exported, allowing it to be used. As part of this, there was a change so this is no longer static. This causes a warning, due to a missed extern available in the scheduler header file. Correct the issue to cleanup the warning, and properly reference __migrate_task through the appropriate header file. Bug: 205164003 Change-Id: Ifb194108cec34467315f43858ebeae428b2e34f0 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>	2022-01-24 18:28:23 +00:00
deyaoren@google.com	f080ffe2d9	Merge keystone/mirror-android13-5.15 into keystone/android13-5.15-keystone-qcom-dev * keystone/mirror-android13-5.15: (31 commits) ANDROID: f2fs: fix fscrypt direct I/O support ANDROID: GKI: update virtual_device symbol list Linux 5.15.16 mtd: fixup CFI on ixp4xx ALSA: hda/realtek: Re-order quirk entries for Lenovo ALSA: hda/realtek: Add quirk for Legion Y9000X 2020 ALSA: hda/tegra: Fix Tegra194 HDA reset failure ALSA: hda: ALC287: Add Lenovo IdeaPad Slim 9i 14ITL5 speaker quirk ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master after reboot from Windows ALSA: hda/realtek: Use ALC285_FIXUP_HP_GPIO_LED on another HP laptop ALSA: hda/realtek: Add speaker fixup for some Yoga 15ITL5 devices KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all perf annotate: Avoid TUI crash when navigating in the annotation of recursive functions firmware: qemu_fw_cfg: fix kobject leak in probe error path firmware: qemu_fw_cfg: fix NULL-pointer deref on duplicate entries firmware: qemu_fw_cfg: fix sysfs information leak rtlwifi: rtl8192cu: Fix WARNING when calling local_irq_restore() with interrupts enabled media: uvcvideo: fix division by zero at stream start video: vga16fb: Only probe for EGA and VGA 16 color graphic cards 9p: only copy valid iattrs in 9P2000.L setattr implementation ... Signed-off-by: deyaoren@google.com <deyaoren@google.com> Change-Id: I23bd2feb36ae8fa184a8826cca19e38aa7733db3	2022-01-21 21:21:47 +00:00
Greg Kroah-Hartman	16ea584702	Merge 5.15.16 into android13-5.15 Changes in 5.15.16 devtmpfs regression fix: reconfigure on each mount drm/amd/display: explicitly set is_dsc_supported to false before use orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc() remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided vfs: fs_context: fix up param length parsing in legacy_parse_param perf: Protect perf_guest_cbs with RCU KVM: x86: Register perf callbacks after calling vendor's hardware_setup() KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest KVM: x86: don't print when fail to read/write pv eoi memory KVM: s390: Clarify SIGP orders versus STOP/RESTART remoteproc: qcom: pas: Add missing power-domain "mxc" for CDSP 9p: only copy valid iattrs in 9P2000.L setattr implementation video: vga16fb: Only probe for EGA and VGA 16 color graphic cards media: uvcvideo: fix division by zero at stream start rtlwifi: rtl8192cu: Fix WARNING when calling local_irq_restore() with interrupts enabled firmware: qemu_fw_cfg: fix sysfs information leak firmware: qemu_fw_cfg: fix NULL-pointer deref on duplicate entries firmware: qemu_fw_cfg: fix kobject leak in probe error path perf annotate: Avoid TUI crash when navigating in the annotation of recursive functions KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all ALSA: hda/realtek: Add speaker fixup for some Yoga 15ITL5 devices ALSA: hda/realtek: Use ALC285_FIXUP_HP_GPIO_LED on another HP laptop ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Master after reboot from Windows ALSA: hda: ALC287: Add Lenovo IdeaPad Slim 9i 14ITL5 speaker quirk ALSA: hda/tegra: Fix Tegra194 HDA reset failure ALSA: hda/realtek: Add quirk for Legion Y9000X 2020 ALSA: hda/realtek: Re-order quirk entries for Lenovo mtd: fixup CFI on ixp4xx Linux 5.15.16 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I7b58cca52113c774bd78b2d231378bde8258f757	2022-01-21 08:36:28 +01:00
deyaoren@google.com	8fbbc658ae	Merge keystone/mirror-android13-5.15 into keystone/android13-5.15-keystone-qcom-dev * keystone/mirror-android13-5.15: (87 commits) ANDROID: GKI: enable test_stackinit kernel module ANDROID: GKI: defconfig: enable BTF debug info f2fs: do not allow partial truncation on pinned file ANDROID: Change anon vma name limit from 80 to 256 ANDROID: GKI: enable CONFIG_ANON_VMA_NAME to support anonymous vma names UPSTREAM: mm: move anon_vma declarations to linux/mm_inline.h UPSTREAM: mm: add anonymous vma name refcounting UPSTREAM: mm: add a field to store names for private anonymous memory UPSTREAM: mm: rearrange madvise code to allow for reuse Revert "ANDROID: mm: add a field to store names for private anonymous memory" Revert "ANDROID: mm: fix up new call to vma_merge()" Revert "ANDROID: fix up `60500a4228` ("ANDROID: mm: add a field to store names for private anonymous memory")" FROMGIT: tools/resolve_btfids: Build with host flags ANDROID: rwsem: Export rwsem_waiter struct for loadable modules ANDROID: GKI: Enable TRACE_MMIO_ACCESS config for gki_defconfig FROMLIST: asm-generic/io: Add logging support for MMIO accessors FROMLIST: tracing: Add register read/write tracing support Linux 5.15.15 staging: greybus: fix stack size warning with UBSAN drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk() ... Signed-off-by: deyaoren@google.com <deyaoren@google.com> Change-Id: I08e12e5c240391baa0bb2bf2e070b935563e7127	2022-01-20 17:48:02 +00:00
Sean Christopherson	18c16cef81	perf: Protect perf_guest_cbs with RCU commit ff083a2d972f56bebfd82409ca62e5dfce950961 upstream. Protect perf_guest_cbs with RCU to fix multiple possible errors. Luckily, all paths that read perf_guest_cbs already require RCU protection, e.g. to protect the callback chains, so only the direct perf_guest_cbs touchpoints need to be modified. Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure perf_guest_cbs isn't reloaded between a !NULL check and a dereference. Fixed via the READ_ONCE() in rcu_dereference(). Bug #2 is that on weakly-ordered architectures, updates to the callbacks themselves are not guaranteed to be visible before the pointer is made visible to readers. Fixed by the smp_store_release() in rcu_assign_pointer() when the new pointer is non-NULL. Bug #3 is that, because the callbacks are global, it's possible for readers to run in parallel with an unregisters, and thus a module implementing the callbacks can be unloaded while readers are in flight, resulting in a use-after-free. Fixed by a synchronize_rcu() call when unregistering callbacks. Bug #1 escaped notice because it's extremely unlikely a compiler will reload perf_guest_cbs in this sequence. perf_guest_cbs does get reloaded for future derefs, e.g. for ->is_user_mode(), but the ->is_in_guest() guard all but guarantees the consumer will win the race, e.g. to nullify perf_guest_cbs, KVM has to completely exit the guest and teardown down all VMs before KVM start its module unload / unregister sequence. This also makes it all but impossible to encounter bug #3. Bug #2 has not been a problem because all architectures that register callbacks are strongly ordered and/or have a static set of callbacks. But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming kvm_intel module load/unload leads to: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:perf_misc_flags+0x1c/0x70 Call Trace: perf_prepare_sample+0x53/0x6b0 perf_event_output_forward+0x67/0x160 __perf_event_overflow+0x52/0xf0 handle_pmi_common+0x207/0x300 intel_pmu_handle_irq+0xcf/0x410 perf_event_nmi_handler+0x28/0x50 nmi_handle+0xc7/0x260 default_do_nmi+0x6b/0x170 exc_nmi+0x103/0x130 asm_exc_nmi+0x76/0xbf Fixes: `39447b386c` ("perf: Enhance perf to allow for guest statistic collection from host") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211111020738.2512932-2-seanjc@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-20 09:13:14 +01:00
Greg Kroah-Hartman	f6b5584fbd	Merge 5.15.15 into android13-5.15 Changes in 5.15.15 s390/kexec: handle R_390_PLT32DBL rela in arch_kexec_apply_relocations_add() workqueue: Fix unbind_workers() VS wq_worker_running() race staging: r8188eu: switch the led off during deinit bpf: Fix out of bounds access from invalid *_or_null type verification Bluetooth: btusb: Add protocol for MediaTek bluetooth devices(MT7922) Bluetooth: btusb: Add the new support ID for Realtek RTL8852A Bluetooth: btusb: Add support for IMC Networks Mediatek Chip(MT7921) Bbluetooth: btusb: Add another Bluetooth part for Realtek 8852AE Bluetooth: btusb: fix memory leak in btusb_mtk_submit_wmt_recv_urb() Bluetooth: btusb: enable Mediatek to support AOSP extension Bluetooth: btusb: Add one more Bluetooth part for the Realtek RTL8852AE Bluetooth: btusb: Add the new support IDs for WCN6855 fget: clarify and improve __fget_files() implementation Bluetooth: btusb: Add one more Bluetooth part for WCN6855 Bluetooth: btusb: Add two more Bluetooth parts for WCN6855 Bluetooth: btusb: Add support for Foxconn MT7922A Bluetooth: btintel: Fix broken LED quirk for legacy ROM devices Bluetooth: btusb: Add support for Foxconn QCA 0xe0d0 Bluetooth: bfusb: fix division by zero in send path ARM: dts: exynos: Fix BCM4330 Bluetooth reset polarity in I9100 USB: core: Fix bug in resuming hub's handling of wakeup requests USB: Fix "slab-out-of-bounds Write" bug in usb_hcd_poll_rh_status ath11k: Fix buffer overflow when scanning with extraie mmc: sdhci-pci: Add PCI ID for Intel ADL Bluetooth: add quirk disabling LE Read Transmit Power Bluetooth: btbcm: disable read tx power for some Macs with the T2 Security chip Bluetooth: btbcm: disable read tx power for MacBook Air 8,1 and 8,2 veth: Do not record rx queue hint in veth_xmit mfd: intel-lpss: Fix too early PM enablement in the ACPI ->probe() can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data can: isotp: convert struct tpcon::{idx,len} to unsigned int can: gs_usb: gs_can_start_xmit(): zero-initialize hf->{flags,reserved} random: fix data race on crng_node_pool random: fix data race on crng init time random: fix crash on multiple early calls to add_bootloader_randomness() platform/x86/intel: hid: add quirk to support Surface Go 3 media: Revert "media: uvcvideo: Set unique vdev name based in type" staging: wlan-ng: Avoid bitwise vs logical OR warning in hfa384x_usb_throttlefn() drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk() staging: greybus: fix stack size warning with UBSAN Linux 5.15.15 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifa6951cf000da0688308451bf462c075699ba836	2022-01-19 07:29:34 +01:00
Suren Baghdasaryan	2c37ba0309	ANDROID: Change anon vma name limit from 80 to 256 Android uses vma names of up to 256 characters. Change the max limit for anonymous vma names to support Android legacy use cases. Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ia0126ab3919281ce4d5c597a43a47de80eadf71a	2022-01-18 16:03:55 -08:00
Arnd Bergmann	049413278d	UPSTREAM: mm: move anon_vma declarations to linux/mm_inline.h The patch to add anonymous vma names causes a build failure in some configurations: include/linux/mm_types.h: In function 'is_same_vma_anon_name': include/linux/mm_types.h:924:37: error: implicit declaration of function 'strcmp' [-Werror=implicit-function-declaration] 924 \| return name && vma_name && !strcmp(name, vma_name); \| ^~~~~~ include/linux/mm_types.h:22:1: note: 'strcmp' is defined in header '<string.h>'; did you forget to '#include <string.h>'? This should not really be part of linux/mm_types.h in the first place, as that header is meant to only contain structure defintions and need a minimum set of indirect includes itself. While the header clearly includes more than it should at this point, let's not make it worse by including string.h as well, which would pull in the expensive (compile-speed wise) fortify-string logic. Move the new functions into a separate header that only needs to be included in a couple of locations. Link: https://lkml.kernel.org/r/20211207125710.2503446-1-arnd@kernel.org Fixes: "mm: add a field to store names for private anonymous memory" Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Colin Cross <ccross@google.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 17fca131cee21724ee953a17c185c14e9533af5b) Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I54719d7ea27d3cf53ef7245b2af88d2a2bc9bafe	2022-01-18 16:01:48 -08:00
Colin Cross	301c56064d	UPSTREAM: mm: add a field to store names for private anonymous memory In many userspace applications, and especially in VM based applications like Android uses heavily, there are multiple different allocators in use. At a minimum there is libc malloc and the stack, and in many cases there are libc malloc, the stack, direct syscalls to mmap anonymous memory, and multiple VM heaps (one for small objects, one for big objects, etc.). Each of these layers usually has its own tools to inspect its usage; malloc by compiling a debug version, the VM through heap inspection tools, and for direct syscalls there is usually no way to track them. On Android we heavily use a set of tools that use an extended version of the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped in userspace and slice their usage by process, shared (COW) vs. unique mappings, backing, etc. This can account for real physical memory usage even in cases like fork without exec (which Android uses heavily to share as many private COW pages as possible between processes), Kernel SamePage Merging, and clean zero pages. It produces a measurement of the pages that only exist in that process (USS, for unique), and a measurement of the physical memory usage of that process with the cost of shared pages being evenly split between processes that share them (PSS). If all anonymous memory is indistinguishable then figuring out the real physical memory usage (PSS) of each heap requires either a pagemap walking tool that can understand the heap debugging of every layer, or for every layer's heap debugging tools to implement the pagemap walking logic, in which case it is hard to get a consistent view of memory across the whole system. Tracking the information in userspace leads to all sorts of problems. It either needs to be stored inside the process, which means every process has to have an API to export its current heap information upon request, or it has to be stored externally in a filesystem that somebody needs to clean up on crashes. It needs to be readable while the process is still running, so it has to have some sort of synchronization with every layer of userspace. Efficiently tracking the ranges requires reimplementing something like the kernel vma trees, and linking to it from every layer of userspace. It requires more memory, more syscalls, more runtime cost, and more complexity to separately track regions that the kernel is already tracking. This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a userspace-provided name for anonymous vmas. The names of named anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:<name>]. Userspace can set the name for a region of memory by calling prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name) Setting the name to NULL clears it. The name length limit is 80 bytes including NUL-terminator and is checked to contain only printable ascii characters (including space), except '[',']','\','$' and '`'. Ascii strings are being used to have a descriptive identifiers for vmas, which can be understood by the users reading /proc/pid/maps or /proc/pid/smaps. Names can be standardized for a given system and they can include some variable parts such as the name of the allocator or a library, tid of the thread using it, etc. The name is stored in a pointer in the shared union in vm_area_struct that points to a null terminated string. Anonymous vmas with the same name (equivalent strings) and are otherwise mergeable will be merged. The name pointers are not shared between vmas even if they contain the same name. The name pointer is stored in a union with fields that are only used on file-backed mappings, so it does not increase memory usage. CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this feature. It keeps the feature disabled by default to prevent any additional memory overhead and to avoid confusing procfs parsers on systems which are not ready to support named anonymous vmas. The patch is based on the original patch developed by Colin Cross, more specifically on its latest version [1] posted upstream by Sumit Semwal. It used a userspace pointer to store vma names. In that design, name pointers could be shared between vmas. However during the last upstreaming attempt, Kees Cook raised concerns [2] about this approach and suggested to copy the name into kernel memory space, perform validity checks [3] and store as a string referenced from vm_area_struct. One big concern is about fork() performance which would need to strdup anonymous vma names. Dave Hansen suggested experimenting with worst-case scenario of forking a process with 64k vmas having longest possible names [4]. I ran this experiment on an ARM64 Android device and recorded a worst-case regression of almost 40% when forking such a process. This regression is addressed in the followup patch which replaces the pointer to a name with a refcounted structure that allows sharing the name pointer between vmas of the same name. Instead of duplicating the string during fork() or when splitting a vma it increments the refcount. [1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/ [2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/ [3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/ [4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/ Changes for prctl(2) manual page (in the options section): PR_SET_VMA Sets an attribute specified in arg2 for virtual memory areas starting from the address specified in arg3 and spanning the size specified in arg4. arg5 specifies the value of the attribute to be set. Note that assigning an attribute to a virtual memory area might prevent it from being merged with adjacent virtual memory areas due to the difference in that attribute's value. Currently, arg2 must be one of: PR_SET_VMA_ANON_NAME Set a name for anonymous virtual memory areas. arg5 should be a pointer to a null-terminated string containing the name. The name length including null byte cannot exceed 80 bytes. If arg5 is NULL, the name of the appropriate anonymous virtual memory areas will be reset. The name can contain only printable ascii characters (including space), except '[',']','\','$' and '`'. This feature is available only if the kernel is built with the CONFIG_ANON_VMA_NAME option enabled. [surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table] Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com [surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy, added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the work here was done by Colin Cross, therefore, with his permission, keeping him as the author] Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com Signed-off-by: Colin Cross <ccross@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Glauber <jan.glauber@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rob Landley <rob@landley.net> Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com> Cc: Shaohua Li <shli@fusionio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 9a10064f5625d5572c3626c1516e0bebc6c9fe9b) Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I53d56d551a7d62f75341304751814294b447c04e	2022-01-18 15:30:27 -08:00
Suren Baghdasaryan	f355f9635d	Revert "ANDROID: mm: add a field to store names for private anonymous memory" This reverts commit `60500a4228`. Replacing out-of-tree implementation with the upstream one. Bug: 120441514 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ic34c8e16d51ccf9f00cb59d2de341e911bcb2828	2022-01-18 14:54:47 -08:00
Huang Yiwei	82b3ce0bcc	ANDROID: rwsem: Export rwsem_waiter struct for loadable modules The rwsem_waiter struct is needed in vendor hook alter_rwsem_list_add. It has parameter sem which is a struct rw_semaphore (already export in rwsem.h), inside the structure there is a wait_list to link "struct rwsem_waiter" items. The task information in each item of the wait_list is needed to be referenced in vendor loadable modules. Bug: 174902706 Change-Id: Ic7d21ffdd795eaa203989751d26f8b1f32134d8b Signed-off-by: Huang Yiwei <hyiwei@codeaurora.org> Signed-off-by: Vamsi Krishna Lanka <quic_vamslank@quicinc.com>	2022-01-18 20:21:17 +00:00
deyaoren@google.com	1bc3e76e7d	Merge keystone/mirror-android13-5.15 into keystone/android13-5.15-keystone-qcom-dev * keystone/mirror-android13-5.15: ANDROID: gic: Add vendor hook to GIC BACKPORT: scsi: ufs: Add quirk to enable host controller without PH configuration BACKPORT: scsi: ufs: Add quirk to handle broken UIC command ANDROID: GKI: Disable security lockdown for unsigned modules ANDROID: GKI: Enable system_dlkm build for gki ANDROID: GKI: Enable config for module signing ANDROID: GKI: Do not force select MODULE_SIG_ALL Signed-off-by: deyaoren@google.com <deyaoren@google.com> Change-Id: I961038170137945b4b98386364762ee52ed2e692	2022-01-18 18:28:09 +00:00
Vamsi Krishna Lanka	c7b6c40553	FROMLIST: tracing: Add register read/write tracing support Generic MMIO read/write i.e., __raw_{read,write}{b,l,w,q} accessors are typically used to read/write from/to memory mapped registers and can cause hangs or some undefined behaviour in following few cases, * If the access to the register space is unclocked, for example: if there is an access to multimedia(MM) block registers without MM clocks. * If the register space is protected and not set to be accessible from non-secure world, for example: only EL3 (EL: Exception level) access is allowed and any EL2/EL1 access is forbidden. * If xPU(memory/register protection units) is controlling access to certain memory/register space for specific clients. and more... Such cases usually results in instant reboot/SErrors/NOC or interconnect hangs and tracing these register accesses can be very helpful to debug such issues during initial development stages and also in later stages. So use ftrace trace events to log such MMIO register accesses which provides rich feature set such as early enablement of trace events, filtering capability, dumping ftrace logs on console and many more. Sample output: rwmmio_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700 rwmmio_post_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700 rwmmio_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 addr=0xfffffbfffdbff610 rwmmio_post_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 val=0x0 addr=0xfffffbfffdbff610 Bug: 169045115 Link: https://lore.kernel.org/lkml/76983c26d889df7252a17017a48754163fb6b0d5.1638858747.git.quic_saipraka@quicinc.com/ Change-Id: Ia21f54f8ce8f11a5613c7218dc7c9f7248766273 Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Co-developed-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com> Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com> Signed-off-by: Vamsi Krishna Lanka <quic_vamslank@quicinc.com>	2022-01-18 16:54:12 +00:00
Daniel Borkmann	e8efe83699	bpf: Fix out of bounds access from invalid _or_null type verification [ no upstream commit given implicitly fixed through the larger refactoring in c25b2ae136039ffa820c26138ed4a5e5f3ab3841 ] While auditing some other code, I noticed missing checks inside the pointer arithmetic simulation, more specifically, adjust_ptr_min_max_vals(). Several _OR_NULL types are not rejected whereas they are _required_ to be rejected given the expectation is that they get promoted into a 'real' pointer type for the success case, that is, after an explicit != NULL check. One case which stands out and is accessible from unprivileged (iff enabled given disabled by default) is BPF ring buffer. From crafting a PoC, the NULL check can be bypassed through an offset, and its id marking will then lead to promotion of mem_or_null to a mem type. bpf_ringbuf_reserve() helper can trigger this case through passing of reserved flags, for example. func#0 @0 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (7a) (u64 )(r10 -8) = 0 1: R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm 1: (18) r1 = 0x0 3: R1_w=map_ptr(id=0,off=0,ks=0,vs=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm 3: (b7) r2 = 8 4: R1_w=map_ptr(id=0,off=0,ks=0,vs=0,imm=0) R2_w=invP8 R10=fp0 fp-8_w=mmmmmmmm 4: (b7) r3 = 0 5: R1_w=map_ptr(id=0,off=0,ks=0,vs=0,imm=0) R2_w=invP8 R3_w=invP0 R10=fp0 fp-8_w=mmmmmmmm 5: (85) call bpf_ringbuf_reserve#131 6: R0_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2 6: (bf) r6 = r0 7: R0_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R6_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2 7: (07) r0 += 1 8: R0_w=mem_or_null(id=2,ref_obj_id=2,off=1,imm=0) R6_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2 8: (15) if r0 == 0x0 goto pc+4 R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2 9: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2 9: (62) (u32 )(r6 +0) = 0 R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2 10: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2 10: (bf) r1 = r6 11: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R1_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2 11: (b7) r2 = 0 12: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R1_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R2_w=invP0 R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2 12: (85) call bpf_ringbuf_submit#132 13: R6=invP(id=0) R10=fp0 fp-8=mmmmmmmm 13: (b7) r0 = 0 14: R0_w=invP0 R6=invP(id=0) R10=fp0 fp-8=mmmmmmmm 14: (95) exit from 8 to 13: safe processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0 OK All three commits, that is `b121b341e5` ("bpf: Add PTR_TO_BTF_ID_OR_NULL support"), `457f44363a` ("bpf: Implement BPF ring buffer and verifier support for it"), and the `afbf21dce6` ("bpf: Support readonly/readwrite buffers in verifier") suffer the same cause and their _OR_NULL type pendants must be rejected in adjust_ptr_min_max_vals(). Make the test more robust by reusing reg_type_may_be_null() helper such that we catch all _OR_NULL types we have today and in future. Note that pointer arithmetic on PTR_TO_BTF_ID, PTR_TO_RDONLY_BUF, and PTR_TO_RDWR_BUF is generally allowed. Fixes: `b121b341e5` ("bpf: Add PTR_TO_BTF_ID_OR_NULL support") Fixes: `457f44363a` ("bpf: Implement BPF ring buffer and verifier support for it") Fixes: `afbf21dce6` ("bpf: Support readonly/readwrite buffers in verifier") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-16 09:12:41 +01:00
Frederic Weisbecker	cf5b6bd2c7	workqueue: Fix unbind_workers() VS wq_worker_running() race commit 07edfece8bcb0580a1828d939e6f8d91a8603eb2 upstream. At CPU-hotplug time, unbind_worker() may preempt a worker while it is waking up. In that case the following scenario can happen: unbind_workers() wq_worker_running() -------------- ------------------- if (!(worker->flags & WORKER_NOT_RUNNING)) //PREEMPTED by unbind_workers worker->flags \|= WORKER_UNBOUND; [...] atomic_set(&pool->nr_running, 0); //resume to worker atomic_inc(&worker->pool->nr_running); After unbind_worker() resets pool->nr_running, the value is expected to remain 0 until the pool ever gets rebound in case cpu_up() is called on the target CPU in the future. But here the race leaves pool->nr_running with a value of 1, triggering the following warning when the worker goes idle: WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 Modules linked in: CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Workqueue: 0x0 (rcu_par_gp) RIP: 0010:worker_enter_idle+0x95/0xc0 Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> worker_thread+0x89/0x3d0 ? process_one_work+0x400/0x400 kthread+0x162/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 </TASK> Also due to this incorrect "nr_running == 1", further queued work may end up not being served, because no worker is awaken at work insert time. This raises rcutorture writer stalls for example. Fix this with disabling preemption in the right place in wq_worker_running(). It's worth noting that if the worker migrates and runs concurrently with unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update due to set_cpus_allowed_ptr() acquiring/releasing rq->lock. Fixes: `6d25be5782` ("sched/core, workqueues: Distangle worker accounting from rq lock") Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-16 09:12:41 +01:00
Ramji Jiyani	1694ef383e	ANDROID: GKI: Disable security lockdown for unsigned modules By default with SELinux enabled behavior for unsigned module loading is same as sig_enforce=1. This causes loading of unsigned modules fail. All modules in Android GKI are unsigned except GKI modules. Do not prevent module loading in case of CONFIG_SIG_MODULE_PROTECT; which was introduced to change behavior of sig_enforce to allow unsigned modules but not access to protected symbols. Bug: 200082547 Bug: 214445388 Fixes: 9ab6a242258a ("ANDROID: GKI: Add module load time protected symbol lookup") Test: TreeHugger Signed-off-by: Ramji Jiyani <ramjiyani@google.com> Change-Id: Iab3113d706cbd7db7a5684897bcafd5671a6d424	2022-01-14 20:01:55 +00:00
deyaoren@google.com	ff1e56b557	Merge keystone/mirror-android13-5.15 into keystone/android13-5.15-keystone-qcom-dev * keystone/mirror-android13-5.15: (73 commits) Linux 5.15.14 drm/amd/pm: keep the BACO feature enabled for suspend Revert "drm/amdgpu: stop scheduler when calling hw_fini (v2)" Input: zinitix - make sure the IRQ is allocated before it gets enabled ARM: dts: gpio-ranges property is now required userfaultfd/selftests: fix hugetlb area allocations ipv6: raw: check passed optlen before reading drm/amd/display: Added power down for DCN10 drm/amd/display: fix B0 TMDS deepcolor no dislay issue mISDN: change function names to avoid conflicts drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform drm/amdgpu: always reset the asic in suspend (v2) drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume atlantic: Fix buff_ring OOB in aq_ring_rx_clean net: udp: fix alignment problem in udp4_seq_show() ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown() usb: mtu3: fix interval value for intr and isoc drm/amd/pm: Fix xgmi link control on aldebaran drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notify ... Signed-off-by: deyaoren@google.com <deyaoren@google.com> Change-Id: Id88dd8f40a2dd4c3498ad0ea8305504ee86fa941	2022-01-13 17:27:43 +00:00
deyaoren@google.com	aec5cf2b81	Merge keystone/mirror-android13-5.15 into keystone/android13-5.15-keystone-qcom-dev * keystone/mirror-android13-5.15: ANDROID: kleaf: drop toolchain_version = CLANG_VERSION ANDROID: arm64/mm: Add command line option to make ZONE_DMA32 empty Signed-off-by: deyaoren@google.com <deyaoren@google.com> Change-Id: I36a90b7d14aae25e1eafe85af40302ca2330de36	2022-01-12 17:45:38 +00:00
Greg Kroah-Hartman	173de0c81d	Merge 5.15.14 into android13-5.15 Changes in 5.15.14 fscache_cookie_enabled: check cookie is valid before accessing it selftests: x86: fix [-Wstringop-overread] warn in test_process_vm_readv() tracing: Fix check for trace_percpu_buffer validity in get_trace_buf() tracing: Tag trace_percpu_buffer as a percpu pointer Revert "RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow" ieee802154: atusb: fix uninit value in atusb_set_extended_addr i40e: Fix to not show opcode msg on unsuccessful VF MAC change iavf: Fix limit of total number of queues to active queues of VF RDMA/core: Don't infoleak GRH fields Revert "net: usb: r8152: Add MAC passthrough support for more Lenovo Docks" netrom: fix copying in user data in nr_setsockopt RDMA/uverbs: Check for null return of kmalloc_array mac80211: initialize variable have_higher_than_11mbit mac80211: mesh: embedd mesh_paths and mpp_paths into ieee80211_if_mesh sfc: The RX page_ring is optional i40e: fix use-after-free in i40e_sync_filters_subtask() i40e: Fix for displaying message regarding NVM version i40e: Fix incorrect netdev's real number of RX/TX queues ftrace/samples: Add missing prototypes direct functions ipv4: Check attribute length for RTA_GATEWAY in multipath route ipv4: Check attribute length for RTA_FLOW in multipath route ipv6: Check attribute length for RTA_GATEWAY in multipath route ipv6: Check attribute length for RTA_GATEWAY when deleting multipath route lwtunnel: Validate RTA_ENCAP_TYPE attribute length selftests: net: udpgro_fwd.sh: explicitly checking the available ping feature sctp: hold endpoint before calling cb in sctp_transport_lookup_process batman-adv: mcast: don't send link-local multicast to mcast routers sch_qfq: prevent shift-out-of-bounds in qfq_init_qdisc net: ena: Fix undefined state when tx request id is out of bounds net: ena: Fix wrong rx request id by resetting device net: ena: Fix error handling when calculating max IO queues number md/raid1: fix missing bitmap update w/o WriteMostly devices EDAC/i10nm: Release mdev/mbase when failing to detect HBM KVM: x86: Check for rmaps allocation cgroup: Use open-time credentials for process migraton perm checks cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv cgroup: Use open-time cgroup namespace for process migration perm checks Revert "i2c: core: support bus regulator controlling in adapter" i2c: mpc: Avoid out of bounds memory access xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate power: supply: core: Break capacity loop power: reset: ltc2952: Fix use of floating point literals reset: renesas: Fix Runtime PM usage rndis_host: support Hytera digital radios gpio: gpio-aspeed-sgpio: Fix wrong hwirq base in irq handler net ticp:fix a kernel-infoleak in __tipc_sendmsg() phonet: refcount leak in pep_sock_accep fbdev: fbmem: add a helper to determine if an aperture is used by a fw fb drm/amdgpu: disable runpm if we are the primary adapter power: bq25890: Enable continuous conversion for ADC at charging ipv6: Continue processing multipath route even if gateway attribute is invalid ipv6: Do cleanup if attribute validation fails in multipath route auxdisplay: charlcd: checking for pointer reference before dereferencing drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notify drm/amd/pm: Fix xgmi link control on aldebaran usb: mtu3: fix interval value for intr and isoc scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown() ip6_vti: initialize __ip6_tnl_parm struct in vti6_siocdevprivate net: udp: fix alignment problem in udp4_seq_show() atlantic: Fix buff_ring OOB in aq_ring_rx_clean drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume drm/amdgpu: always reset the asic in suspend (v2) drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform mISDN: change function names to avoid conflicts drm/amd/display: fix B0 TMDS deepcolor no dislay issue drm/amd/display: Added power down for DCN10 ipv6: raw: check passed optlen before reading userfaultfd/selftests: fix hugetlb area allocations ARM: dts: gpio-ranges property is now required Input: zinitix - make sure the IRQ is allocated before it gets enabled Revert "drm/amdgpu: stop scheduler when calling hw_fini (v2)" drm/amd/pm: keep the BACO feature enabled for suspend Linux 5.15.14 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ifc22d4db0c3aa2164c4769981847e0634f2ad463	2022-01-12 09:00:42 +01:00
deyaoren@google.com	879b5f6d0a	Merge keystone/mirror-android13-5.15 into keystone/android13-5.15-keystone-qcom-dev * keystone/mirror-android13-5.15: (28 commits) Revert "ANDROID: KVM: arm64: Unmap S2MPU MMIO regions in MPT" ANDROID: KVM: arm64: Initialize pkvm_pgtable.mm_ops earlier ANDROID: KVM: arm64: Mark select_iommu_ops static ANDROID: Enable KVM_S2MPU in gki_defconfig ANDROID: KVM: arm64: Unmap S2MPU MMIO registers from host stage-2 ANDROID: KVM: arm64: Implement MMIO handler in S2MPU driver ANDROID: KVM: arm64: Unmap S2MPU MMIO regions in MPT ANDROID: KVM: arm64: Add S2MPU kselftest ANDROID: KVM: arm64: Modify S2MPU MPT in 'host_stage2_set_owner' ANDROID: KVM: arm64: Set up S2MPU Memory Protection Table ANDROID: KVM: arm64: Reprogram S2MPUs in 'host_smc_handler' ANDROID: KVM: arm64: Enable S2MPUs in __pkvm_init_stage2_iommu ANDROID: KVM: arm64: Copy S2MPU configuration to hyp ANDROID: KVM: arm64: Implement IRQ handler for S2MPU faults ANDROID: KVM: arm64: Allocate context IDs for valid VIDs ANDROID: KVM: arm64: Read and check S2MPU_VERSION ANDROID: KVM: arm64: Parse S2MPU MMIO region ANDROID: KVM: arm64: Create empty S2MPU driver ANDROID: dt-bindings: iommu: Add Google S2MPU ANDROID: KVM: arm64: Add 'host_stage2_adjust_mmio_range' to kvm_iommu_ops ... Signed-off-by: deyaoren@google.com <deyaoren@google.com> Change-Id: I4be8a2202f92d58c07febcfcaaf2e774b5397326	2022-01-11 17:10:39 +00:00
Tejun Heo	43fa0b3639	cgroup: Use open-time cgroup namespace for process migration perm checks commit e57457641613fef0d147ede8bd6a3047df588b95 upstream. cgroup process migration permission checks are performed at write time as whether a given operation is allowed or not is dependent on the content of the write - the PID. This currently uses current's cgroup namespace which is a potential security weakness as it may allow scenarios where a less privileged process tricks a more privileged one into writing into a fd that it created. This patch makes cgroup remember the cgroup namespace at the time of open and uses it for migration permission checks instad of current's. Note that this only applies to cgroup2 as cgroup1 doesn't have namespace support. This also fixes a use-after-free bug on cgroupns reported in https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com Note that backporting this fix also requires the preceding patch. Reported-by: "Eric W. Biederman" <ebiederm@xmission.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Reported-by: syzbot+50f5cf33a284ce738b62@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com Fixes: `5136f6365c` ("cgroup: implement "nsdelegate" mount option") Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-11 15:35:15 +01:00
Tejun Heo	50273128d6	cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv commit 0d2b5955b36250a9428c832664f2079cbf723bec upstream. of->priv is currently used by each interface file implementation to store private information. This patch collects the current two private data usages into struct cgroup_file_ctx which is allocated and freed by the common path. This allows generic private data which applies to multiple files, which will be used to in the following patch. Note that cgroup_procs iterator is now embedded as procs.iter in the new cgroup_file_ctx so that it doesn't need to be allocated and freed separately. v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in cgroup_file_ctx as suggested by Linus. v3: Michal pointed out that cgroup1's procs pidlist uses of->priv too. Converted. Didn't change to embedded allocation as cgroup1 pidlists get stored for caching. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-11 15:35:15 +01:00
Tejun Heo	c6ebc35298	cgroup: Use open-time credentials for process migraton perm checks commit 1756d7994ad85c2479af6ae5a9750b92324685af upstream. cgroup process migration permission checks are performed at write time as whether a given operation is allowed or not is dependent on the content of the write - the PID. This currently uses current's credentials which is a potential security weakness as it may allow scenarios where a less privileged process tricks a more privileged one into writing into a fd that it created. This patch makes both cgroup2 and cgroup1 process migration interfaces to use the credentials saved at the time of open (file->f_cred) instead of current's. Reported-by: "Eric W. Biederman" <ebiederm@xmission.com> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Fixes: `187fe84067` ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy") Reviewed-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-11 15:35:15 +01:00
Naveen N. Rao	21f8a3b110	tracing: Tag trace_percpu_buffer as a percpu pointer commit f28439db470cca8b6b082239314e9fd10bd39034 upstream. Tag trace_percpu_buffer as a percpu pointer to resolve warnings reported by sparse: /linux/kernel/trace/trace.c:3218:46: warning: incorrect type in initializer (different address spaces) /linux/kernel/trace/trace.c:3218:46: expected void const [noderef] __percpu __vpp_verify /linux/kernel/trace/trace.c:3218:46: got struct trace_buffer_struct /linux/kernel/trace/trace.c:3234:9: warning: incorrect type in initializer (different address spaces) /linux/kernel/trace/trace.c:3234:9: expected void const [noderef] __percpu __vpp_verify /linux/kernel/trace/trace.c:3234:9: got int Link: https://lkml.kernel.org/r/ebabd3f23101d89cb75671b68b6f819f5edc830b.1640255304.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Fixes: `07d777fe8c` ("tracing: Add percpu buffers for trace_printk()") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-11 15:35:12 +01:00
Naveen N. Rao	be134e7c5b	tracing: Fix check for trace_percpu_buffer validity in get_trace_buf() commit 823e670f7ed616d0ce993075c8afe0217885f79d upstream. With the new osnoise tracer, we are seeing the below splat: Kernel attempted to read user page (c7d880000) - exploit attempt? (uid: 0) BUG: Unable to handle kernel data access on read at 0xc7d880000 Faulting instruction address: 0xc0000000002ffa10 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries ... NIP [c0000000002ffa10] __trace_array_vprintk.part.0+0x70/0x2f0 LR [c0000000002ff9fc] __trace_array_vprintk.part.0+0x5c/0x2f0 Call Trace: [c0000008bdd73b80] [c0000000001c49cc] put_prev_task_fair+0x3c/0x60 (unreliable) [c0000008bdd73be0] [c000000000301430] trace_array_printk_buf+0x70/0x90 [c0000008bdd73c00] [c0000000003178b0] trace_sched_switch_callback+0x250/0x290 [c0000008bdd73c90] [c000000000e70d60] __schedule+0x410/0x710 [c0000008bdd73d40] [c000000000e710c0] schedule+0x60/0x130 [c0000008bdd73d70] [c000000000030614] interrupt_exit_user_prepare_main+0x264/0x270 [c0000008bdd73de0] [c000000000030a70] syscall_exit_prepare+0x150/0x180 [c0000008bdd73e10] [c00000000000c174] system_call_vectored_common+0xf4/0x278 osnoise tracer on ppc64le is triggering osnoise_taint() for negative duration in get_int_safe_duration() called from trace_sched_switch_callback()->thread_exit(). The problem though is that the check for a valid trace_percpu_buffer is incorrect in get_trace_buf(). The check is being done after calculating the pointer for the current cpu, rather than on the main percpu pointer. Fix the check to be against trace_percpu_buffer. Link: https://lkml.kernel.org/r/a920e4272e0b0635cf20c444707cbce1b2c8973d.1640255304.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: `e2ace00117` ("tracing: Choose static tp_printk buffer by explicit nesting count") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2022-01-11 15:35:12 +01:00
Chris Goldsworthy	83a84a5782	ANDROID: arm64/mm: Add command line option to make ZONE_DMA32 empty ZONE_DMA32 is enabled by default on android13-5.15, yet it is not needed for all devices, nor is it desirable to have if not needed. For instance, if a partner in GKI 1.0 did not use ZONE_DMA32, memory can be lower for ZONE_NORMAL relative to older targets, such that memory would run out more quickly in ZONE_NORMAL leading kswapd to be invoked unnecessarily. Correspondingly, provide a means of making ZONE_DMA32 empty via the kernel command line when it is compiled in via CONFIG_ZONE_DMA32. Bug: 199917449 Change-Id: I70ec76914b92e518d61a61072f0b3cb41cb28646 Signed-off-by: Chris Goldsworthy <quic_cgoldswo@quicinc.com> Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>	2022-01-10 11:40:44 -08:00
Park Bumgyu	9e280ea43e	ANDROID: sched: export task_rq_lock Declare task_rq_lock as EXPORT_SYMBOL_GPL needed by vendor module. Bug: 178340230 Signed-off-by: Park Bumgyu <bumgyu.park@samsung.com> Change-Id: I4afc2d67bd208b00e6c43590782196cb4ee07937	2022-01-10 17:11:44 +00:00

... 15 16 17 18 19 ...

38873 Commits