commit 3e35142ef99fe6b4fe5d834ad43ee13cca10a2dc upstream.
Since commit d1bcae833b32f1 ("ELF: Don't generate unused section
symbols") [1], binutils (v2.36+) started dropping section symbols that
it thought were unused. This isn't an issue in general, but with
kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a
separate .text.unlikely section and the section symbol ".text.unlikely"
is being dropped. Due to this, recordmcount is unable to find a non-weak
symbol in .text.unlikely to generate a relocation record against.
Address this by dropping the weak attribute from these functions.
Instead, follow the existing pattern of having architectures #define the
name of the function they want to override in their headers.
[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1
[akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h]
Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 99696a2592bca641eb88cc9a80c90e591afebd0f upstream.
In create_var_ref(), init_var_ref() is called to initialize the fields
of variable ref_field, which is allocated in the previous function call
to create_hist_field(). Function init_var_ref() allocates the
corresponding fields such as ref_field->system, but frees these fields
when the function encounters an error. The caller later calls
destroy_hist_field() to conduct error handling, which frees the fields
and the variable itself. This results in double free of the fields which
are already freed in the previous function.
Fix this by storing NULL to the corresponding fields when they are freed
in init_var_ref().
Link: https://lkml.kernel.org/r/20220425063739.3859998-1-keitasuzuki.park@sslab.ics.keio.ac.jp
Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
CC: stable@vger.kernel.org
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4a37f3dd9a83186cb88d44808ab35b78375082c9 ]
The original x86 sev_alloc() only called set_memory_decrypted() on
memory returned by alloc_pages_node(), so the page order calculation
fell out of that logic. However, the common dma-direct code has several
potential allocators, not all of which are guaranteed to round up the
underlying allocation to a power-of-two size, so carrying over that
calculation for the encryption/decryption size was a mistake. Fix it by
rounding to a *number* of pages, rather than an order.
Until recently there was an even worse interaction with DMA_DIRECT_REMAP
where we could have ended up decrypting part of the next adjacent
vmalloc area, only averted by no architecture actually supporting both
configs at once. Don't ask how I found that one out...
Fixes: c10f07aa27 ("dma/direct: Handle force decryption for DMA coherent buffers in common code")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a90cf30437489343b8386ae87b4827b6d6c3ed50 ]
We must never let unencrypted memory go back into the general page pool.
So if we fail to set it back to encrypted when freeing DMA memory, leak
the memory instead and warn the user.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5570449b6876f215d49ac4db9ccce6ff7aa1e20a ]
Remapped allocations handle the encrypted bit through the pgprot passed
to vmap, so there is no call dma_set_decrypted. Note that this case is
currently entirely theoretical as no valid kernel configuration supports
remapped allocations and memory encryption currently.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4d0564785bb03841e4b5c5b31aa4ecd1eb0d01bb ]
Factor out helpers the make dealing with memory encryption a little less
cumbersome.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 92826e967535db2eb117db227b1191aaf98e4bb3 ]
When dma_direct_alloc_pages encounters a highmem page it just gives up
currently. But what we really should do is to try memory using the
page allocator instead - without this platforms with a global highmem
CMA pool will fail all dma_alloc_pages allocations.
Fixes: efa70f2fdc ("dma-mapping: add a new dma_alloc_pages API")
Reported-by: Mark O'Neill <mao@tumblingdice.co.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d541ae55d538265861ef729a64d2d816d34ef1e2 ]
Split the code for DMA_ATTR_NO_KERNEL_MAPPING allocations into a separate
helper to make dma_direct_alloc a little more readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5341b93dea8c39d7612f7a227015d4b1d5cf30db ]
When printk() is called from safe or NMI contexts, it will directly
store the record (vprintk_store()) and then defer the console output.
However, defer_console_output() only causes console printing and does
not wake any waiters of new records.
Wake waiters from defer_console_output() so that they also are aware
of the new records from safe and NMI contexts.
Fixes: 03fc7f9c99 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-6-john.ogness@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1f5d783094cf28b4905f51cad846eb5d1db6673e ]
It is important that any new records are visible to preparing
waiters before the waker checks if the wait queue is empty.
Otherwise it is possible that:
- there are new records available
- the waker sees an empty wait queue and does not wake
- the preparing waiter sees no new records and begins to wait
This is exactly the problem that the function description of
waitqueue_active() warns about.
Use wq_has_sleeper() instead of waitqueue_active() because it
includes the necessary full memory barrier.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220421212250.565456-4-john.ogness@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ba3673d70178bf07fb75ff25c54bc478add4021 ]
The per-cpu @printk_pending variable can be updated from
sleepable contexts, such as:
get_random_bytes()
warn_unseeded_randomness()
printk_deferred()
defer_console_output()
and can be updated from interrupt contexts, such as:
handle_irq_event_percpu()
__irq_wake_thread()
wake_up_process()
try_to_wake_up()
select_task_rq()
select_fallback_rq()
printk_deferred()
defer_console_output()
and can be updated from NMI contexts, such as:
vprintk()
if (in_nmi()) defer_console_output()
Therefore the atomic variant of the updating functions must be used.
Replace __this_cpu_xchg() with this_cpu_xchg().
Replace __this_cpu_or() with this_cpu_or().
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/87iltld4ue.fsf@jogness.linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 890d550d7dbac7a31ecaa78732aa22be282bb6b8 ]
Martin find it confusing when look at the /proc/pressure/cpu output,
and found no hint about that CPU "full" line in psi Documentation.
% cat /proc/pressure/cpu
some avg10=0.92 avg60=0.91 avg300=0.73 total=933490489
full avg10=0.22 avg60=0.23 avg300=0.16 total=358783277
The PSI_CPU_FULL state is introduced by commit e7fcd76228
("psi: Add PSI_CPU_FULL state"), which mainly for cgroup level,
but also counted at the system level as a side effect.
Naturally, the FULL state doesn't exist for the CPU resource at
the system level. These "full" numbers can come from CPU idle
schedule latency. For example, t1 is the time when task wakeup
on an idle CPU, t2 is the time when CPU pick and switch to it.
The delta of (t2 - t1) will be in CPU_FULL state.
Another case all processes can be stalled is when all cgroups
have been throttled at the same time, which unlikely to happen.
Anyway, CPU_FULL metric is meaningless and confusing at the
system level. So this patch will report zeroes for CPU full
at the system level, and update psi Documentation accordingly.
Fixes: e7fcd76228 ("psi: Add PSI_CPU_FULL state")
Reported-by: Martin Steigerwald <Martin.Steigerwald@proact.de>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20220408121914.82855-1-zhouchengming@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64eaf50731ac0a8c76ce2fedd50ef6652aabc5ff ]
Since commit 2312729688 ("sched/fair: Update scale invariance of PELT")
change to use rq_clock_pelt() instead of rq_clock_task(), we should also
use rq_clock_pelt() for throttled_clock_task_time and throttled_clock_task
accounting to get correct cfs_rq_clock_pelt() of throttled cfs_rq. And
rename throttled_clock_task(_time) to be clock_pelt rather than clock_task.
Fixes: 2312729688 ("sched/fair: Update scale invariance of PELT")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ben Segall <bsegall@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20220408115309.81603-1-zhouchengming@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 78ed93d72ded679e3caf0758357209887bda885f ]
With SIGTRAP on perf events, we have encountered termination of
processes due to user space attempting to block delivery of SIGTRAP.
Consider this case:
<set up SIGTRAP on a perf event>
...
sigset_t s;
sigemptyset(&s);
sigaddset(&s, SIGTRAP | <and others>);
sigprocmask(SIG_BLOCK, &s, ...);
...
<perf event triggers>
When the perf event triggers, while SIGTRAP is blocked, force_sig_perf()
will force the signal, but revert back to the default handler, thus
terminating the task.
This makes sense for error conditions, but not so much for explicitly
requested monitoring. However, the expectation is still that signals
generated by perf events are synchronous, which will no longer be the
case if the signal is blocked and delivered later.
To give user space the ability to clearly distinguish synchronous from
asynchronous signals, introduce siginfo_t::si_perf_flags and
TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is
required in future).
The resolution to the problem is then to (a) no longer force the signal
(avoiding the terminations), but (b) tell user space via si_perf_flags
if the signal was synchronous or not, so that such signals can be
handled differently (e.g. let user space decide to ignore or consider
the data imprecise).
The alternative of making the kernel ignore SIGTRAP on perf events if
the signal is blocked may work for some usecases, but likely causes
issues in others that then have to revert back to interception of
sigprocmask() (which we want to avoid). [ A concrete example: when using
breakpoint perf events to track data-flow, in a region of code where
signals are blocked, data-flow can no longer be tracked accurately.
When a relevant asynchronous signal is received after unblocking the
signal, the data-flow tracking logic needs to know its state is
imprecise. ]
Fixes: 97ba62b278 ("perf: Add support for SIGTRAP on perf events")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/20220404111204.935357-1-elver@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8106bddbab5f0ba180e6d693c7c1fc6926d57caa ]
The scftorture test module's scf_handler() function is supposed to provide
three different distributions of short delays (including "no delay") and
one distribution of long delays, if specified by the scftorture.longwait
module parameter. However, the second of the two non-zero-wait short delays
is disabled due to the first such delay's "goto out" not being enclosed in
the "then" clause with the "udelay()".
This commit therefore adjusts the code to provide the intended set of
delays.
Fixes: e9d338a0b1 ("scftorture: Add smp_call_function() torture test")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 84bc4f1dbbbb5f8aa68706a96711dccb28b518e5 ]
We observed the error "cacheline tracking ENOMEM, dma-debug disabled"
during a light system load (copying some files). The reason for this error
is that the dma_active_cacheline radix tree uses GFP_NOWAIT allocation -
so it can't access the emergency memory reserves and it fails as soon as
anybody reaches the watermark.
This patch changes GFP_NOWAIT to GFP_ATOMIC, so that it can access the
emergency memory reserves.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2679a83731d51a744657f718fc02c3b077e47562 ]
When we use raw_spin_rq_lock() to acquire the rq lock and have to
update the rq clock while holding the lock, the kernel may issue
a WARN_DOUBLE_CLOCK warning.
Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
rq_lock(), there is no corresponding change to rq->clock_update_flags.
In particular, we have obtained the rq lock of other CPUs, the
rq->clock_update_flags of this CPU may be RQCF_UPDATED at this time, and
then calling update_rq_clock() will trigger the WARN_DOUBLE_CLOCK warning.
So we need to clear RQCF_UPDATED of rq->clock_update_flags to avoid
the WARN_DOUBLE_CLOCK warning.
For the sched_rt_period_timer() and migrate_task_rq_dl() cases
we simply replace raw_spin_rq_lock()/raw_spin_rq_unlock() with
rq_lock()/rq_unlock().
For the {pull,push}_{rt,dl}_task() cases, we add the
double_rq_clock_clear_update() function to clear RQCF_UPDATED of
rq->clock_update_flags, and call double_rq_clock_clear_update()
before double_lock_balance()/double_rq_lock() returns to avoid the
WARN_DOUBLE_CLOCK warning.
Some call trace reports:
Call Trace 1:
<IRQ>
sched_rt_period_timer+0x10f/0x3a0
? enqueue_top_rt_rq+0x110/0x110
__hrtimer_run_queues+0x1a9/0x490
hrtimer_interrupt+0x10b/0x240
__sysvec_apic_timer_interrupt+0x8a/0x250
sysvec_apic_timer_interrupt+0x9a/0xd0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x12/0x20
Call Trace 2:
<TASK>
activate_task+0x8b/0x110
push_rt_task.part.108+0x241/0x2c0
push_rt_tasks+0x15/0x30
finish_task_switch+0xaa/0x2e0
? __switch_to+0x134/0x420
__schedule+0x343/0x8e0
? hrtimer_start_range_ns+0x101/0x340
schedule+0x4e/0xb0
do_nanosleep+0x8e/0x160
hrtimer_nanosleep+0x89/0x120
? hrtimer_init_sleeper+0x90/0x90
__x64_sys_nanosleep+0x96/0xd0
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Call Trace 3:
<TASK>
deactivate_task+0x93/0xe0
pull_rt_task+0x33e/0x400
balance_rt+0x7e/0x90
__schedule+0x62f/0x8e0
do_task_dead+0x3f/0x50
do_exit+0x7b8/0xbb0
do_group_exit+0x2d/0x90
get_signal+0x9df/0x9e0
? preempt_count_add+0x56/0xa0
? __remove_hrtimer+0x35/0x70
arch_do_signal_or_restart+0x36/0x720
? nanosleep_copyout+0x39/0x50
? do_nanosleep+0x131/0x160
? audit_filter_inodes+0xf5/0x120
exit_to_user_mode_prepare+0x10f/0x1e0
syscall_exit_to_user_mode+0x17/0x30
do_syscall_64+0x40/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Call Trace 4:
update_rq_clock+0x128/0x1a0
migrate_task_rq_dl+0xec/0x310
set_task_cpu+0x84/0x1e4
try_to_wake_up+0x1d8/0x5c0
wake_up_process+0x1c/0x30
hrtimer_wakeup+0x24/0x3c
__hrtimer_run_queues+0x114/0x270
hrtimer_interrupt+0xe8/0x244
arch_timer_handler_phys+0x30/0x50
handle_percpu_devid_irq+0x88/0x140
generic_handle_domain_irq+0x40/0x60
gic_handle_irq+0x48/0xe0
call_on_irq_stack+0x2c/0x60
do_interrupt_handler+0x80/0x84
Steps to reproduce:
1. Enable CONFIG_SCHED_DEBUG when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched/features
echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched/features
3. Run some rt/dl tasks that periodically work and sleep, e.g.
Create 2*n rt or dl (90% running) tasks via rt-app (on a system
with n CPUs), and Dietmar Eggemann reports Call Trace 4 when running
on PREEMPT_RT kernel.
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20220430085843.62939-2-jiahao.os@bytedance.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46e861be589881e0905b9ade3d8439883858721c ]
The TASKS_RUDE_RCU does not select IRQ_WORK, which can result in build
failures for kernels that do not otherwise select IRQ_WORK. This commit
therefore causes the TASKS_RUDE_RCU Kconfig option to select IRQ_WORK.
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f75fd4b9221d93177c50dcfde671b2e907f53e86 ]
While booting secondary CPUs, cpus_read_[lock/unlock] is not keeping
online cpumask stable. The transient online mask results in below
calltrace.
[ 0.324121] CPU1: Booted secondary processor 0x0000000001 [0x410fd083]
[ 0.346652] Detected PIPT I-cache on CPU2
[ 0.347212] CPU2: Booted secondary processor 0x0000000002 [0x410fd083]
[ 0.377255] Detected PIPT I-cache on CPU3
[ 0.377823] CPU3: Booted secondary processor 0x0000000003 [0x410fd083]
[ 0.379040] ------------[ cut here ]------------
[ 0.383662] WARNING: CPU: 0 PID: 10 at kernel/workqueue.c:3084 __flush_work+0x12c/0x138
[ 0.384850] Modules linked in:
[ 0.385403] CPU: 0 PID: 10 Comm: rcu_tasks_rude_ Not tainted 5.17.0-rc3-v8+ #13
[ 0.386473] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[ 0.387289] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.388308] pc : __flush_work+0x12c/0x138
[ 0.388970] lr : __flush_work+0x80/0x138
[ 0.389620] sp : ffffffc00aaf3c60
[ 0.390139] x29: ffffffc00aaf3d20 x28: ffffffc009c16af0 x27: ffffff80f761df48
[ 0.391316] x26: 0000000000000004 x25: 0000000000000003 x24: 0000000000000100
[ 0.392493] x23: ffffffffffffffff x22: ffffffc009c16b10 x21: ffffffc009c16b28
[ 0.393668] x20: ffffffc009e53861 x19: ffffff80f77fbf40 x18: 00000000d744fcc9
[ 0.394842] x17: 000000000000000b x16: 00000000000001c2 x15: ffffffc009e57550
[ 0.396016] x14: 0000000000000000 x13: ffffffffffffffff x12: 0000000100000000
[ 0.397190] x11: 0000000000000462 x10: ffffff8040258008 x9 : 0000000100000000
[ 0.398364] x8 : 0000000000000000 x7 : ffffffc0093c8bf4 x6 : 0000000000000000
[ 0.399538] x5 : 0000000000000000 x4 : ffffffc00a976e40 x3 : ffffffc00810444c
[ 0.400711] x2 : 0000000000000004 x1 : 0000000000000000 x0 : 0000000000000000
[ 0.401886] Call trace:
[ 0.402309] __flush_work+0x12c/0x138
[ 0.402941] schedule_on_each_cpu+0x228/0x278
[ 0.403693] rcu_tasks_rude_wait_gp+0x130/0x144
[ 0.404502] rcu_tasks_kthread+0x220/0x254
[ 0.405264] kthread+0x174/0x1ac
[ 0.405837] ret_from_fork+0x10/0x20
[ 0.406456] irq event stamp: 102
[ 0.406966] hardirqs last enabled at (101): [<ffffffc0093c8468>] _raw_spin_unlock_irq+0x78/0xb4
[ 0.408304] hardirqs last disabled at (102): [<ffffffc0093b8270>] el1_dbg+0x24/0x5c
[ 0.409410] softirqs last enabled at (54): [<ffffffc0081b80c8>] local_bh_enable+0xc/0x2c
[ 0.410645] softirqs last disabled at (50): [<ffffffc0081b809c>] local_bh_disable+0xc/0x2c
[ 0.411890] ---[ end trace 0000000000000000 ]---
[ 0.413000] smp: Brought up 1 node, 4 CPUs
[ 0.413762] SMP: Total of 4 processors activated.
[ 0.414566] CPU features: detected: 32-bit EL0 Support
[ 0.415414] CPU features: detected: 32-bit EL1 Support
[ 0.416278] CPU features: detected: CRC32 instructions
[ 0.447021] Callback from call_rcu_tasks_rude() invoked.
[ 0.506693] Callback from call_rcu_tasks() invoked.
This commit therefore fixes this issue by applying a single-CPU
optimization to the RCU Tasks Rude grace-period process. The key point
here is that the purpose of this RCU flavor is to force a schedule on
each online CPU since some past event. But the rcu_tasks_rude_wait_gp()
function runs in the context of the RCU Tasks Rude's grace-period kthread,
so there must already have been a context switch on the current CPU since
the call to either synchronize_rcu_tasks_rude() or call_rcu_tasks_rude().
So if there is only a single CPU online, RCU Tasks Rude's grace-period
kthread does not need to anything at all.
It turns out that the rcu_tasks_rude_wait_gp() function's call to
schedule_on_each_cpu() causes problems during early boot. During that
time, there is only one online CPU, namely the boot CPU. Therefore,
applying this single-CPU optimization fixes early-boot instances of
this problem.
Link: https://lore.kernel.org/lkml/20220210184319.25009-1-treasure4paddy@gmail.com/T/
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Padmanabha Srinivasaiah <treasure4paddy@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 6a2d90ba027adba528509ffa27097cffd3879257 upstream.
The current implementation of PTRACE_KILL is buggy and has been for
many years as it assumes it's target has stopped in ptrace_stop. At a
quick skim it looks like this assumption has existed since ptrace
support was added in linux v1.0.
While PTRACE_KILL has been deprecated we can not remove it as
a quick search with google code search reveals many existing
programs calling it.
When the ptracee is not stopped at ptrace_stop some fields would be
set that are ignored except in ptrace_stop. Making the userspace
visible behavior of PTRACE_KILL a noop in those case.
As the usual rules are not obeyed it is not clear what the
consequences are of calling PTRACE_KILL on a running process.
Presumably userspace does not do this as it achieves nothing.
Replace the implementation of PTRACE_KILL with a simple
send_sig_info(SIGKILL) followed by a return 0. This changes the
observable user space behavior only in that PTRACE_KILL on a process
not stopped in ptrace_stop will also kill it. As that has always
been the intent of the code this seems like a reasonable change.
Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20220505182645.497868-7-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* keystone/mirror-android13-5.15:
ANDROID: umh: Enable usermode helper for required use cases
ANDROID: vendor_hooks: Add hooks to dup_task_struct
ANDROID: GKI: add symbol list file for xiaomi
ANDROID: ABI: Update symbols to unisoc whitelist for the 7th
ANDROID: GKI: Update abi_gki_aarch64_qcom for pm flag set tracepoint
ANDROID: vendor_hooks: Add hook in wakeup functionality
ANDROID: gki_defconfig: enable CONFIG_KFENCE_STATIC_KEYS
ANDROID: vendor_hooks: Add hooks for account irqtime process tick
Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com>
Change-Id: Ia6d7b495c18d30d0ddb9fa3f158a8f4723177c98
For disabling usermode helper programs config STATIC_USERMODEHELPER_PATH
was set to NULL string. There are few use cases where usermode helper
programs support is needed, such as reboot, poweroff use cases.
So for these supported use cases, dont set the sub_info->path
to null to get usermode helper program support.
Bug: 202192667
Change-Id: I3e4cec94d091b23eda9d2be839cc8f960127575f
Signed-off-by: Prasad Sodagudi <quic_psodagud@quicinc.com>
Add hook to dup_task_struct for vendor data fields initialisation.
Bug: 188004638
Change-Id: I4b58604ee822fb8d1e0cc37bec72e820e7318427
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
(cherry picked from commit f66d96b14aab5051fdf6b5054d87362c17a7b365)
Add a hook in irqtime_account_process_tick, which helps to get
information about the high load task.
Bug: 187904818
Change-Id: I644f7d66b09d047ca6b0a0fbd2915a6387c8c007
Signed-off-by: Liangliang Li <liliangliang@vivo.com>
(cherry picked from commit fe580539f6cec43ddb0d6ecfd39aa2f4e45754ca)
Changes in 5.15.42
usb: gadget: fix race when gadget driver register via ioctl
io_uring: arm poll for non-nowait files
floppy: use a statically allocated error counter
kernel/resource: Introduce request_mem_region_muxed()
i2c: piix4: Replace hardcoded memory map size with a #define
i2c: piix4: Move port I/O region request/release code into functions
i2c: piix4: Move SMBus controller base address detect into function
i2c: piix4: Move SMBus port selection into function
i2c: piix4: Add EFCH MMIO support to region request and release
i2c: piix4: Add EFCH MMIO support to SMBus base address detect
i2c: piix4: Add EFCH MMIO support for SMBus port select
i2c: piix4: Enable EFCH MMIO for Family 17h+
Watchdog: sp5100_tco: Move timer initialization into function
Watchdog: sp5100_tco: Refactor MMIO base address initialization
Watchdog: sp5100_tco: Add initialization using EFCH MMIO
Watchdog: sp5100_tco: Enable Family 17h+ CPUs
mm/kfence: reset PG_slab and memcg_data before freeing __kfence_pool
Revert "drm/i915/opregion: check port number bounds for SWSCI display power state"
rtc: fix use-after-free on device removal
rtc: pcf2127: fix bug when reading alarm registers
um: Cleanup syscall_handler_t definition/cast, fix warning
Input: add bounds checking to input_set_capability()
Input: stmfts - fix reference leak in stmfts_input_open
nvme-pci: add quirks for Samsung X5 SSDs
gfs2: Disable page faults during lockless buffered reads
rtc: sun6i: Fix time overflow handling
crypto: stm32 - fix reference leak in stm32_crc_remove
crypto: x86/chacha20 - Avoid spurious jumps to other functions
ALSA: hda/realtek: Enable headset mic on Lenovo P360
s390/traps: improve panic message for translation-specification exception
s390/pci: improve zpci_dev reference counting
vhost_vdpa: don't setup irq offloading when irq_num < 0
tools/virtio: compile with -pthread
nvmet: use a private workqueue instead of the system workqueue
nvme-multipath: fix hang when disk goes live over reconnect
rtc: mc146818-lib: Fix the AltCentury for AMD platforms
fs: fix an infinite loop in iomap_fiemap
MIPS: lantiq: check the return value of kzalloc()
drbd: remove usage of list iterator variable after loop
platform/chrome: cros_ec_debugfs: detach log reader wq from devm
ARM: 9191/1: arm/stacktrace, kasan: Silence KASAN warnings in unwind_frame()
nilfs2: fix lockdep warnings in page operations for btree nodes
nilfs2: fix lockdep warnings during disk space reclamation
ALSA: usb-audio: Restore Rane SL-1 quirk
ALSA: wavefront: Proper check of get_user() error
ALSA: hda/realtek: Add quirk for TongFang devices with pop noise
perf: Fix sys_perf_event_open() race against self
selinux: fix bad cleanup on error in hashtab_duplicate()
Fix double fget() in vhost_net_set_backend()
PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold
Revert "can: m_can: pci: use custom bit timings for Elkhart Lake"
KVM: x86/mmu: Update number of zapped pages even if page list is stable
arm64: paravirt: Use RCU read locks to guard stolen_time
arm64: mte: Ensure the cleared tags are visible before setting the PTE
crypto: qcom-rng - fix infinite loop on requests not multiple of WORD_SZ
libceph: fix potential use-after-free on linger ping and resends
drm/amd: Don't reset dGPUs if the system is going to s2idle
drm/i915/dmc: Add MMIO range restrictions
drm/dp/mst: fix a possible memory leak in fetch_monitor_name()
dma-buf: fix use of DMA_BUF_SET_NAME_{A,B} in userspace
dma-buf: ensure unique directory name for dmabuf stats
ARM: dts: aspeed-g6: remove FWQSPID group in pinctrl dtsi
pinctrl: pinctrl-aspeed-g6: remove FWQSPID group in pinctrl
ARM: dts: aspeed-g6: fix SPI1/SPI2 quad pin group
ARM: dts: aspeed: Add ADC for AST2600 and enable for Rainier and Everest
ARM: dts: aspeed: Add secure boot controller node
ARM: dts: aspeed: Add video engine to g6
pinctrl: mediatek: mt8365: fix IES control pins
ALSA: hda - fix unused Realtek function when PM is not enabled
net: ipa: record proper RX transaction count
net: macb: Increment rx bd head after allocating skb and buffer
xfrm: rework default policy structure
xfrm: fix "disable_policy" flag use when arriving from different devices
net/sched: act_pedit: sanitize shift argument before usage
netfilter: flowtable: fix excessive hw offload attempts after failure
netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices
net: fix dev_fill_forward_path with pppoe + bridge
netfilter: nft_flow_offload: fix offload with pppoe + vlan
Revert "PCI: aardvark: Rewrite IRQ code to chained IRQ handler"
net: systemport: Fix an error handling path in bcm_sysport_probe()
net: vmxnet3: fix possible use-after-free bugs in vmxnet3_rq_alloc_rx_buf()
net: vmxnet3: fix possible NULL pointer dereference in vmxnet3_rq_cleanup()
ice: fix crash when writing timestamp on RX rings
ice: fix possible under reporting of ethtool Tx and Rx statistics
ice: move ice_container_type onto ice_ring_container
ice: Fix interrupt moderation settings getting cleared
clk: at91: generated: consider range when calculating best rate
net/qla3xxx: Fix a test in ql_reset_work()
NFC: nci: fix sleep in atomic context bugs caused by nci_skb_alloc
net/mlx5: DR, Fix missing flow_source when creating multi-destination FW table
net/mlx5e: Properly block LRO when XDP is enabled
net: af_key: add check for pfkey_broadcast in function pfkey_process
ARM: 9196/1: spectre-bhb: enable for Cortex-A15
ARM: 9197/1: spectre-bhb: fix loop8 sequence for Thumb2
mptcp: change the parameter of __mptcp_make_csum
mptcp: reuse __mptcp_make_csum in validate_data_csum
mptcp: fix checksum byte order
igb: skip phy status check where unavailable
netfilter: flowtable: fix TCP flow teardown
netfilter: flowtable: pass flowtable to nf_flow_table_iterate()
netfilter: flowtable: move dst_check to packet path
net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.
riscv: dts: sifive: fu540-c000: align dma node name with dtschema
scsi: ufs: core: Fix referencing invalid rsp field
perf build: Fix check for btf__load_from_kernel_by_id() in libbpf
gpio: gpio-vf610: do not touch other bits when set the target bit
gpio: mvebu/pwm: Refuse requests with inverted polarity
perf regs x86: Fix arch__intr_reg_mask() for the hybrid platform
perf bench numa: Address compiler error on s390
scsi: scsi_dh_alua: Properly handle the ALUA transitioning state
scsi: qla2xxx: Fix missed DMA unmap for aborted commands
mac80211: fix rx reordering with non explicit / psmp ack policy
nl80211: validate S1G channel width
selftests: add ping test with ping_group_range tuned
Revert "fbdev: Make fb_release() return -ENODEV if fbdev was unregistered"
fbdev: Prevent possible use-after-free in fb_release()
net: fix wrong network header length
nl80211: fix locking in nl80211_set_tx_bitrate_mask()
ethernet: tulip: fix missing pci_disable_device() on error in tulip_init_one()
net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()
net: atlantic: fix "frag[0] not initialized"
net: atlantic: reduce scope of is_rsc_complete
net: atlantic: add check for MAX_SKB_FRAGS
net: atlantic: verify hw_head_ lies within TX buffer ring
arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUs
Input: ili210x - fix reset timing
dt-bindings: pinctrl: aspeed-g6: remove FWQSPID group
mt76: mt7921e: fix possible probe failure after reboot
lockdown: also lock down previous kgdb use
i2c: mt7621: fix missing clk_disable_unprepare() on error in mtk_i2c_probe()
afs: Fix afs_getattr() to refetch file status if callback break occurred
Linux 5.15.42
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ifad49f172050c7f8d07f9432a48766cfd5ddf2ca
commit 97e6d7dab1ca4648821c790a2b7913d6d5d549db upstream.
The commit being fixed was aiming to disallow users from incorrectly
obtaining writable pointer to memory that is only meant to be read. This
is enforced now using a MEM_RDONLY flag.
For instance, in case of global percpu variables, when the BTF type is
not struct (e.g. bpf_prog_active), the verifier marks register type as
PTR_TO_MEM | MEM_RDONLY from bpf_this_cpu_ptr or bpf_per_cpu_ptr
helpers. However, when passing such pointer to kfunc, global funcs, or
BPF helpers, in check_helper_mem_access, there is no expectation
MEM_RDONLY flag will be set, hence it is checked as pointer to writable
memory. Later, verifier sets up argument type of global func as
PTR_TO_MEM | PTR_MAYBE_NULL, so user can use a global func to get around
the limitations imposed by this flag.
This check will also cover global non-percpu variables that may be
introduced in kernel BTF in future.
Also, we update the log message for PTR_TO_BUF case to be similar to
PTR_TO_MEM case, so that the reason for error is clear to user.
Fixes: 34d3a78c681e ("bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.")
Reviewed-by: Hao Luo <haoluo@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7b3552d3f9f6897851fc453b5131a967167e43c2 upstream.
It is not permitted to write to PTR_TO_MAP_KEY, but the current code in
check_helper_mem_access would allow for it, reject this case as well, as
helpers taking ARG_PTR_TO_UNINIT_MEM also take PTR_TO_MAP_KEY.
Fixes: 69c087ba62 ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b45043192b3e481304062938a6561da2ceea46a6 upstream.
The 'n_buckets * (value_size + sizeof(struct stack_map_bucket))' part of the
allocated memory for 'smap' is never used after the memlock accounting was
removed, thus get rid of it.
[ Note, Daniel:
Commit b936ca643a ("bpf: rework memlock-based memory accounting for maps")
moved `cost += n_buckets * (value_size + sizeof(struct stack_map_bucket))`
up and therefore before the bpf_map_area_alloc() allocation, sigh. In a later
step commit c85d69135a ("bpf: move memory size checks to bpf_map_charge_init()"),
and the overflow checks of `cost >= U32_MAX - PAGE_SIZE` moved into
bpf_map_charge_init(). And then 370868107b ("bpf: Eliminate rlimit-based
memory accounting for stackmap maps") finally removed the bpf_map_charge_init().
Anyway, the original code did the allocation same way as /after/ this fix. ]
Fixes: b936ca643a ("bpf: rework memlock-based memory accounting for maps")
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220407130423.798386-1-ytcoode@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2aa95b71c9bbec793b5c5fa50f0a80d882b3e8d upstream.
The cnt value in the 'cnt >= BPF_MAX_TRAMP_PROGS' check does not
include BPF_TRAMP_MODIFY_RETURN bpf programs, so the number of
the attached BPF_TRAMP_MODIFY_RETURN bpf programs in a trampoline
can exceed BPF_MAX_TRAMP_PROGS.
When this happens, the assignment '*progs++ = aux->prog' in
bpf_trampoline_get_progs() will cause progs array overflow as the
progs field in the bpf_tramp_progs struct can only hold at most
BPF_MAX_TRAMP_PROGS bpf programs.
Fixes: 88fd9e5352 ("bpf: Refactor trampoline update code")
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Link: https://lore.kernel.org/r/20220430130803.210624-1-ytcoode@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
RCU_NONIDLE usage during __cfi_slowpath_diag can result in an invalid
RCU state in the cpuidle code path:
WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter+0xe4/0x138
...
Call trace:
rcu_eqs_enter+0xe4/0x138
rcu_idle_enter+0xa8/0x100
cpuidle_enter_state+0x154/0x3a8
cpuidle_enter+0x3c/0x58
do_idle.llvm.6590768638138871020+0x1f4/0x2ec
cpu_startup_entry+0x28/0x2c
secondary_start_kernel+0x1b8/0x220
__secondary_switched+0x94/0x98
Instead, call rcu_irq_enter/exit to wake up RCU only when needed and
disable interrupts for the entire CFI shadow/module check when we do.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20220531175910.890307-1-samitolvanen@google.com
Fixes: cf68fffb66 ("add support for Clang CFI")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
(cherry picked from commit e1d3373352077f3be9cc1c8adb5fd59d0aa96e7a
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git
for-next/hardening)
Bug: 230582614
Bug: 231734842
Bug: 233021097
Change-Id: I78cc7ece46e3d8fc6699bcd7a0d8d6074b6a05fe
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
For power and performance monitoring, need to known tasks' runtime for
loading estimation.
But now, other modules can't get task_scehd_runtime.
Export task_sched_runtime to let other modules get task_scehd_runtime.
Bug: 233862809
Signed-off-by: Poting Chen <poting.chen@mediatek.com>
Signed-off-by: Cheng Jui Wang <cheng-jui.wang@mediatek.com>
Change-Id: Ida5caf8ed0a32954fc0b0ed950f163c7ca493fef
(cherry picked from commit fdc8f778e23dfb41b58f87edccf419eb53627ea3)
Without initialization, it will be random data and hard for
vendor hook to decide.
Bug: 207739506
Change-Id: I278772d87eea38c03a40d4f0bef20ac8644e2ecd
Signed-off-by: Maria Yu <quic_aiquny@quicinc.com>
(cherry picked from commit 898e7ec950c168e37ce8c27f6ca1d2cdea66b078)
* keystone/mirror-android13-5.15: (98 commits)
ANDROID: GKI: build damon reclaim
FROMLIST: mm/damon/reclaim: Fix the timer always stays active
BACKPORT: treewide: Add missing includes masked by cgroup -> bpf dependency
UPSTREAM: mm/damon: modify damon_rand() macro to static inline function
UPSTREAM: mm/damon: add 'age' of region tracepoint support
UPSTREAM: mm/damon: hide kernel pointer from tracepoint event
UPSTREAM: mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
UPSTREAM: mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
UPSTREAM: mm/damon/dbgfs: remove an unnecessary variable
UPSTREAM: mm/damon: move the implementation of damon_insert_region to damon.h
UPSTREAM: mm/damon: add access checking for hugetlb pages
UPSTREAM: mm/damon/dbgfs: support all DAMOS stats
UPSTREAM: mm/damon/reclaim: provide reclamation statistics
UPSTREAM: mm/damon/schemes: account how many times quota limit has exceeded
UPSTREAM: mm/damon/schemes: account scheme actions that successfully applied
UPSTREAM: mm/damon: convert macro functions to static inline functions
UPSTREAM: mm/damon: move damon_rand() definition into damon.h
UPSTREAM: mm/damon/schemes: add the validity judgment of thresholds
UPSTREAM: mm/damon/vaddr: remove swap_ranges() and replace it with swap()
UPSTREAM: mm/damon: remove some unneeded function definitions in damon.h
...
Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com>
Change-Id: I80d21861ab96c5107de6b7e098e43213d4e15e74
Allow the vendor module to know the target cpu for better decisions on
whether to enforce __ttwu_queue_wakelist() based wakeup.
Bug: 234483895
Change-Id: Ic27054a5f6adc040fa3cadbd57d37608bf353c5f
Signed-off-by: Abhijeet Dharmapurikar <quic_adharmap@quicinc.com>
commit 1366992e16bddd5e2d9a561687f367f9f802e2e4 upstream.
The addition of random_get_entropy_fallback() provides access to
whichever time source has the highest frequency, which is useful for
gathering entropy on platforms without available cycle counters. It's
not necessarily as good as being able to quickly access a cycle counter
that the CPU has, but it's still something, even when it falls back to
being jiffies-based.
In the event that a given arch does not define get_cycles(), falling
back to the get_cycles() default implementation that returns 0 is really
not the best we can do. Instead, at least calling
random_get_entropy_fallback() would be preferable, because that always
needs to return _something_, even falling back to jiffies eventually.
It's not as though random_get_entropy_fallback() is super high precision
or guaranteed to be entropic, but basically anything that's not zero all
the time is better than returning zero all the time.
Finally, since random_get_entropy_fallback() is used during extremely
early boot when randomizing freelists in mm_init(), it can be called
before timekeeping has been initialized. In that case there really is
nothing we can do; jiffies hasn't even started ticking yet. So just give
up and return 0.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3191dd5a1179ef0fad5a050a1702ae98b6251e8f upstream.
For the irq randomness fast pool, rather than having to use expensive
atomics, which were visibly the most expensive thing in the entire irq
handler, simply take care of the extreme edge case of resetting count to
zero in the cpuhp online handler, just after workqueues have been
reenabled. This simplifies the code a bit and lets us use vanilla
variables rather than atomics, and performance should be improved.
As well, very early on when the CPU comes up, while interrupts are still
disabled, we clear out the per-cpu crng and its batches, so that it
always starts with fresh randomness.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eadb2f47a3ced5c64b23b90fd2a3463f63726066 upstream.
KGDB and KDB allow read and write access to kernel memory, and thus
should be restricted during lockdown. An attacker with access to a
serial port (for example, via a hypervisor console, which some cloud
vendors provide over the network) could trigger the debugger so it is
important that the debugger respect the lockdown mode when/if it is
triggered.
Fix this by integrating lockdown into kdb's existing permissions
mechanism. Unfortunately kgdb does not have any permissions mechanism
(although it certainly could be added later) so, for now, kgdb is simply
and brutally disabled by immediately exiting the gdb stub without taking
any action.
For lockdowns established early in the boot (e.g. the normal case) then
this should be fine but on systems where kgdb has set breakpoints before
the lockdown is enacted than "bad things" will happen.
CVE: CVE-2022-21499
Co-developed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ac6487e584a1eb54071dbe1212e05b884136704 upstream.
Norbert reported that it's possible to race sys_perf_event_open() such
that the looser ends up in another context from the group leader,
triggering many WARNs.
The move_group case checks for races against itself, but the
!move_group case doesn't, seemingly relying on the previous
group_leader->ctx == ctx check. However, that check is racy due to not
holding any locks at that time.
Therefore, re-check the result after acquiring locks and bailing
if they no longer match.
Additionally, clarify the not_move_group case from the
move_group-vs-move_group race.
Fixes: f63a8daa58 ("perf: Fix event->ctx locking")
Reported-by: Norbert Slusarek <nslusarek@gmx.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* keystone/mirror-android13-5.15:
ANDROID: Use the notifier lock to perform file-backed vma teardown
ANDROID: Disable CFI on restricted vendor hooks in TRACE_HEADER_MULTI_READ
ANDROID: GKI: db845c: Update symbols list and ABI
Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com>
Change-Id: I6cca69c38f7c553daaec23053e43175174dd6c2f
When a file-backed vma is being released, the userspace can have an
expectation that the vma and the file it's pinning will be released
synchronously. This does not happen when SPF is enabled because vma
and associated file are released asynchronously after RCU grace
period. This is done to prevent pagefault handler from stepping on
a deleted object. Fix this issue by synchronizing the file-backed
pagefault handler with the vma tear-down using notifier lock.
Fixes: 48e35d053f "FROMLIST: mm: rcu safe vma->vm_file freeing"
Bug: 231394031
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idabf44b8e5a91805e99d79884af77a000dca7637
* keystone/mirror-android13-5.15: (619 commits)
ANDROID: ABI: Update symbols to unisoc whitelist for the 1st
ANDROID: include GKI_MODULES_LIST
ANDROID: ABI: Update symbols to unisoc whitelist
UPSTREAM: mm: kfence: fix objcgs vector allocation
UPSTREAM: ARM: dts: socfpga: change qspi to "intel,socfpga-qspi"
UPSTREAM: spi: cadence-quadspi: fix write completion support
Linux 5.15.41
usb: gadget: uvc: allow for application to cleanly shutdown
usb: gadget: uvc: rename function to be more consistent
ping: fix address binding wrt vrf
SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
mm/hwpoison: use pr_err() instead of dump_page() in get_any_page()
dma-buf: call dma_buf_stats_setup after dmabuf is in valid list
Revert "drm/amd/pm: keep the BACO feature enabled for suspend"
drm/vmwgfx: Initialize drm_mode_fb_cmd2
SUNRPC: Ensure that the gssproxy client can start in a connected state
net: phy: micrel: Pass .probe for KS8737
net: phy: micrel: Do not use kszphy_suspend/resume for KSZ8061
arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map
cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
...
Signed-off-by: keystone-kernel-automerger <keystone-kernel-automerger@google.com>
Change-Id: Ide9c801d94dc0086e418395bb4aba71ed46ff885
Changes in 5.15.41
batman-adv: Don't skb_split skbuffs with frag_list
iwlwifi: iwl-dbg: Use del_timer_sync() before freeing
hwmon: (tmp401) Add OF device ID table
mac80211: Reset MBSSID parameters upon connection
net: Fix features skip in for_each_netdev_feature()
net: mscc: ocelot: fix last VCAP IS1/IS2 filter persisting in hardware when deleted
net: mscc: ocelot: fix VCAP IS2 filters matching on both lookups
net: mscc: ocelot: restrict tc-trap actions to VCAP IS2 lookup 0
net: mscc: ocelot: avoid corrupting hardware counters when moving VCAP filters
fbdev: simplefb: Cleanup fb_info in .fb_destroy rather than .remove
fbdev: efifb: Cleanup fb_info in .fb_destroy rather than .remove
fbdev: vesafb: Cleanup fb_info in .fb_destroy rather than .remove
platform/surface: aggregator: Fix initialization order when compiling as builtin module
ice: Fix race during aux device (un)plugging
ice: fix PTP stale Tx timestamps cleanup
ipv4: drop dst in multicast routing path
drm/nouveau: Fix a potential theorical leak in nouveau_get_backlight_name()
netlink: do not reset transport header in netlink_recvmsg()
net: chelsio: cxgb4: Avoid potential negative array offset
fbdev: efifb: Fix a use-after-free due early fb_info cleanup
sfc: Use swap() instead of open coding it
net: sfc: fix memory leak due to ptp channel
mac80211_hwsim: call ieee80211_tx_prepare_skb under RCU protection
nfs: fix broken handling of the softreval mount option
ionic: fix missing pci_release_regions() on error in ionic_probe()
dim: initialize all struct fields
hwmon: (ltq-cputemp) restrict it to SOC_XWAY
procfs: prevent unprivileged processes accessing fdinfo dir
selftests: vm: Makefile: rename TARGETS to VMTARGETS
arm64: vdso: fix makefile dependency on vdso.so
virtio: fix virtio transitional ids
s390/ctcm: fix variable dereferenced before check
s390/ctcm: fix potential memory leak
s390/lcs: fix variable dereferenced before check
net/sched: act_pedit: really ensure the skb is writable
net: ethernet: mediatek: ppe: fix wrong size passed to memset()
net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral
drm/vc4: hdmi: Fix build error for implicit function declaration
net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()
net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending
net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()
tls: Fix context leak on tls_device_down
drm/vmwgfx: Fix fencing on SVGAv3
gfs2: Fix filesystem block deallocation for short writes
hwmon: (f71882fg) Fix negative temperature
RDMA/irdma: Fix deadlock in irdma_cleanup_cm_core()
iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu
ASoC: max98090: Reject invalid values in custom control put()
ASoC: max98090: Generate notifications on changes for custom control
ASoC: ops: Validate input values in snd_soc_put_volsw_range()
s390: disable -Warray-bounds
ASoC: SOF: Fix NULL pointer exception in sof_pci_probe callback
net: emaclite: Don't advertise 1000BASE-T and do auto negotiation
net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT
secure_seq: use the 64 bits of the siphash for port offset calculation
tcp: use different parts of the port_offset for index and offset
tcp: resalt the secret every 10 seconds
tcp: add small random increments to the source port
tcp: dynamically allocate the perturb table used by source ports
tcp: increase source port perturb table to 2^16
tcp: drop the hash_32() part from the index calculation
interconnect: Restore sync state by ignoring ipa-virt in provider count
firmware_loader: use kernel credentials when reading firmware
KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()
usb: xhci-mtk: fix fs isoc's transfer error
x86/mm: Fix marking of unused sub-pmd ranges
tty/serial: digicolor: fix possible null-ptr-deref in digicolor_uart_probe()
tty: n_gsm: fix buffer over-read in gsm_dlci_data()
tty: n_gsm: fix mux activation issues in gsm_config()
usb: cdc-wdm: fix reading stuck on device close
usb: typec: tcpci: Don't skip cleanup in .remove() on error
usb: typec: tcpci_mt6360: Update for BMC PHY setting
USB: serial: pl2303: add device id for HP LM930 Display
USB: serial: qcserial: add support for Sierra Wireless EM7590
USB: serial: option: add Fibocom L610 modem
USB: serial: option: add Fibocom MA510 modem
slimbus: qcom: Fix IRQ check in qcom_slim_probe
fsl_lpuart: Don't enable interrupts too early
serial: 8250_mtk: Fix UART_EFR register address
serial: 8250_mtk: Fix register address for XON/XOFF character
ceph: fix setting of xattrs on async created inodes
Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
mm/huge_memory: do not overkill when splitting huge_zero_page
drm/vmwgfx: Disable command buffers on svga3 without gbobjects
drm/nouveau/tegra: Stop using iommu_present()
i40e: i40e_main: fix a missing check on list iterator
net: atlantic: always deep reset on pm op, fixing up my null deref regression
net: phy: Fix race condition on link status change
writeback: Avoid skipping inode writeback
cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map
net: phy: micrel: Do not use kszphy_suspend/resume for KSZ8061
net: phy: micrel: Pass .probe for KS8737
SUNRPC: Ensure that the gssproxy client can start in a connected state
drm/vmwgfx: Initialize drm_mode_fb_cmd2
Revert "drm/amd/pm: keep the BACO feature enabled for suspend"
dma-buf: call dma_buf_stats_setup after dmabuf is in valid list
mm/hwpoison: use pr_err() instead of dump_page() in get_any_page()
SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
ping: fix address binding wrt vrf
usb: gadget: uvc: rename function to be more consistent
usb: gadget: uvc: allow for application to cleanly shutdown
Linux 5.15.41
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: Ia65cdbbddf553237d6a3a38efb9bcb2fcc3990ec